Datasets
This page acts as the technical reference for the datasets subpackage.
The datasets subpackage provides dataset classes, which can be used with the PyTorch Dataloader class for managing the data loading process in functions.
FlatFieldArtefactMapDataset
Bases: MultiFileArtefactMapDataset
Source code in clair_torch/datasets/image_dataset.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
__init__(files, copy_preloaded_data=True, missing_std_mode=MissingStdMode.CONSTANT, missing_std_value=0.0, attributes_to_match=None, cache_size=0, missing_val_mode=MissingValMode.ERROR, default_get_item_key='raw')
Dataset class for handling calibration images. Currently, mainly used for flat-field correction. Args: attributes_to_match: copy_preloaded_data: files: list of FrameSettings objects composing the dataset of calibration images. missing_std_mode: how missing uncertainty images should be dealt with. Read more in .enums.MissingStdMode. missing_std_value: a constant that is used in a manner defined by the missing_std_mode to deal with missing uncertainty images.
Source code in clair_torch/datasets/image_dataset.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
ImageMapDataset
Bases: MultiFileMapDataset
Source code in clair_torch/datasets/image_dataset.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
__init__(files, copy_preloaded_data=True, missing_std_mode=MissingStdMode.CONSTANT, missing_std_value=0.0, default_get_item_key='raw', missing_val_mode=MissingValMode.ERROR)
ImageDataset is the master image data object. The files attribute holds a list of FileSettings-based objects. The image tensors shapes are (C, H, W), that is number of channels, height and width. Through a DataLoader the shape is expanded into (N, C, H, W) with N standing for the number of images in the batch. Args: files: list of the FileSettings-based objects composing the dataset. copy_preloaded_data: whether preloaded data should be returned as a new copy or as a reference to the preloaded data contained in self._preloaded_dataset. missing_std_mode: how missing uncertainty images should be dealt with. Read more in .enums.MissingStdMode. missing_std_value: a constant that is used in a manner defined by the missing_std_mode to deal with missing uncertainty images.
Source code in clair_torch/datasets/image_dataset.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
MultiFileIterDataset
Bases: IterableDataset
, ABC
A generic base class for iterable-style Dataset classes. Dataset classes must manage files via a concrete implementation of the generic base FileSettings class.
Source code in clair_torch/datasets/base.py
299 300 301 302 303 304 305 306 307 308 309 310 311 |
|
MultiFileMapDataset
Bases: Dataset
, ABC
A generic base class for map-style Dataset classes. Dataset classes must manage files via a concrete implementation of the generic base FileSettings class.
Source code in clair_torch/datasets/base.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
__getitem__(key)
This method loads images from disk with OpenCV, converts them to PyTorch tensors, runs them through the given transformations, finally returning the image tensor and a scalar tensor of the exposure time. It also falls back on the preloaded tensors if they are available. This method should be used as the main way to access the tensors. Args: key: index of the item to get.
Returns:
Type | Description |
---|---|
int
|
A tuple (tensor, tensor | None, dict[str, float | int]), representing the value image, optional uncertainty |
Tensor
|
image and a numeric metadata dictionary. |
Source code in clair_torch/datasets/base.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
__len__()
The length of the dataset is defined as the number of files in it manages. Returns: int representing the number of files.
Source code in clair_torch/datasets/base.py
51 52 53 54 55 56 57 |
|
preload_dataset()
Loads all data from disk into memory and stores them as a tuple of lists of tensors in self._preloaded_dataset. This method utilizes the getitem method.
Source code in clair_torch/datasets/base.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
VideoIterableDataset
Bases: IterableDataset
Source code in clair_torch/datasets/video_frame_dataset.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
__init__(frame_settings, missing_std_mode=MissingStdMode.CONSTANT, missing_std_value=0.0)
TODO: add handling for std files along the main value files.
Dataset class for video files. Treats all encompassed files as a single dataset, jumping smoothly from one file to the next upon exhausting the frames from one file. Args: frame_settings: list of FrameSettings objects composing the dataset. missing_std_mode: enum flag determining how missing uncertainty images should be handled. missing_std_value: a constant that is used in a manner defined by the missing_std_mode to deal with missing uncertainty images.
Source code in clair_torch/datasets/video_frame_dataset.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
__iter__()
Access method for the frames of this dataset. Iterates through the files and frames, moving on to the next file upon exhausting a file. Returns:
Source code in clair_torch/datasets/video_frame_dataset.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
custom_collate(batch)
Custom collate function for handling possible None std images. If any Nones are found in the batch, the whole batch is set to None. Args: batch: the data batch from a Dataset as a tuple. Expects tuple of four items, similar to the return value.
Returns:
Type | Description |
---|---|
Tensor
|
Batched data in a tuple |
Tensor
|
|
Tensor | None
|
|
dict[str, Tensor]
|
|
tuple[Tensor, Tensor, Tensor | None, dict[str, Tensor]]
|
|
Source code in clair_torch/datasets/collate.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|