Building a Dataset

Learn how you can build datasets using built-in Dataset and TensorDataset classes in PyTorch.

The Dataset class

In PyTorch, a dataset is represented by a regular Python class that inherits from the Dataset class. You can think of it as a list of tuples, each tuple corresponding to one point (features, label).

The most fundamental methods it needs to implement are:

  • __init__(self): It takes whatever arguments are needed to build a list of tuples; it may be the name of a CSV file that will be loaded and processed; it may be two tensors with one for features and another one for labels; or anything else, depending on the task at hand.

Get hands-on with 1200+ tech skills courses.