Building a Dataset
Explore how to build custom datasets in PyTorch by implementing the Dataset class methods. Understand how to manage data indexing, loading on demand, and why using TensorDataset simplifies handling tensors for training. This lesson equips you to structure datasets effectively for model training.
We'll cover the following...
We'll cover the following...
The Dataset class
In PyTorch, a dataset is represented by a regular Python class that inherits from the Dataset class. You can think of it as a list of tuples, each tuple corresponding to one point (features, label).
The most fundamental methods it needs to implement are:
__init__(self): It takes whatever arguments are needed to build a list of tuples; it may be the name of a CSV file that will be loaded and processed; it may be two tensors with one for features and another one for labels; or anything else, depending on the task at hand.