Dataset Format

Introduction to the dataset

The PyTorch Image Model framework comes with the following Dataset classes:

  • ImageDataset
  • IterableImageDataset
  • AugMixDataset

Our training data needs to be in the following structure:

<base_folder>
├── train
│   ├── class1
│   ├── class2
│   ├── class3
│   ├── ...
│   └── classN
└── val
    ├── class1
    ├── class2
    ├── class3
    ├── ...
    └── classN

Each subfolder represents the corresponding class and contains relevant images.

The ImageDataset class

We can use the ImageDataset classi to create the training, validation, and test datasets for our image classification model.

It accepts the following arguments:

class ImageDataset(root, parser, class_map, load_bytes, transform) -> Tuple[Any, Any]:
  • root (str): This is the path of our datasets.
  • parser (Union[ParserImageInTar, ParserImageFolder, str]): This is the parser for our datasets. It accepts either an image in a folder or a tar file.
  • class_map (Dict[str, str]): This is a dictionary containing the class mapping.
  • load_bytes (bool): This specifies whether to load as bytes.
  • transform (List): This is a list of image transformations when loading our datasets.

Parser

The ImageDataset contains a built-in parser object that’s built upon the create_parser factory method. The parser object will find all images defined by the train and val folders. The folder structure should be as follows:

train/class1/12345.png
train/class1/12346.png
train/class1/12347.png

...

train/class2/44122.png
train/class2/44123.png
train/class2/44124.png

The subfolders represent the label for the underlying images, such as if we train for a 5-class image classification model. We should have the following folder structure:

train/apple/...
train/banana/...
train/grape/...
train/orange/...
train/pear/...

The parser object provides the class_to_idx function to map the classes to integers. For example:

{'apple': 0, 'banana': 1, 'grape': 2, 'orange': 3, 'pear': 4}

There’s also an attribute called samples which returns a list of tuples:

[('train/apple/12345.png', 0), ('train/banana/22241.png', 1), ..., ('train/pear/44321.png', 4), ('train/pear/4479.png', 4), ...]

The parser object is subscribable, allowing us to get the items via index.

# syntax
parser[index]

# example
parser[0]
# returns ('train/apple/12345.png', 0)

Example

Let’s look at another example using Multi-class Weather Dataset for Image Classification. The license for these datasets is Creative Commons Attribution 4.0 International. We can download the datasets by doing the following:

Get hands-on with 1200+ tech skills courses.