PyTorch cheatsheet: Basics of PyTorch

Key takeaways:
PyTorch is an open-source library for deep learning, using tensors for efficient computation.
Installation is possible via pip, conda, or directly from the source.
Task-specific libraries include torchvision (computer vision), torchtext (NLP), and torchaudio (audio).
Data loaders manage datasets efficiently with features like batching, shuffling, and multiprocessing.
Tensors can be created directly or using functions like torch.zeros() and torch.ones().
The DataLoader class offers customizable parameters like batch size, shuffle, and number of workers.
Neural networks are built with torch.nn.Module for custom layers and forward passes.
torch.autograd enables automatic differentiation for backpropagation.
Optimization is streamlined with optimizers like SGD and Adam from torch.optim.

PyTorch is a widely-used open-source machine learning library developed by Facebook's AI Research lab (FAIR). It provides a flexible and dynamic computational framework for building and training deep neural networks. This cheat sheet serves as a quick reference guide on how to get started with PyTorch.

# Import the core PyTorch library
import torch
# Import the neural network (nn) module                    
from torch import nn
# Import Dataset to store your dataset and DataLoader to load you dataset
from torch.utils.data import Dataset, DataLoader
# Import torchvision for computer vision tasks
import torchvision
from torchvision import datasets, models, transforms 
# Import torchtext for NLP tasks
import torchtext
from torchtext import datasets, models, transforms
# Import torchaudio for audio processing tasks
import torchaudio
from torchaudio import datasets, models, transforms

Importing necessary libraries

Tensors

Tensors are multi-dimensional arrays, similar to NumPy arrays, used in PyTorch for efficient computation and storage of numerical data, often employed in deep learning tasks.

Creation of tensors

With PyTorch's intuitive syntax and extensive functionality, creating tensors allows users to efficiently manipulate and analyze data for a wide range of machine learning tasks, from simple data preprocessing to complex model training.

Direct initialization

Direct initialization involves specifying the values of a tensor directly, either manually or by providing a list of values. This method is useful for creating tensors with specific values or patterns and offers flexibility in data manipulation.

import torch
# Create a scalar tensor
scalar = torch.tensor(10)
# Create a tensor of zeros
zeros_tensor = torch.zeros((3, 2))
# Create a tensor of ones
ones_tensor = torch.ones((2, 2))
# Create a tensor with random values from a uniform distribution
rand_tensor = torch.rand((3, 3))
# Create a tensor with values from 2 to 10 with a step of 2
range_tensor = torch.arange(2, 11, 2)
# Create a tensor with a specific shape filled with random values from a normal distribution
rand_like_tensor = torch.randn_like(rand_tensor)
# Create an identity matrix
eye_tensor = torch.eye(3)
# Create an uninitialized tensor (values may vary)
empty_tensor = torch.empty((2, 2))
# Create a tensor from a uniform distribution 
uniform_distribution_tensor = torch.rand(2, 2)
# Create a tensor from a nromal distribution 
normal_distribution_tensor = torch.randn(2, 2)

The parameters for the DataLoader in PyTorch:

dataset: The dataset that is to be loaded into batches.
batch_size: Number of samples per batch.
shuffle: Shuffles the data at every epoch if set to True.
num_workers: Number of subprocesses used for data loading.
pin_memory: Whether to pin memory for faster data transfer to CUDA-compatible devices.
drop_last: Drops the last incomplete batch if True when the dataset size is not divisible by the batch size.
timeout: Timeout value for data loading.
collate_fn: Function to customize the way batches are created from individual samples.

This answer aims to provide a quick guide to installing and importing PyTorch and getting started with it through tensor creation.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

How is PyTorch different from NumPy?

PyTorch and NumPy are used for numerical computations, but PyTorch’s Tensors are optimized to work on GPUs, making them ideal for deep learning. Additionally, PyTorch provides an automatic differentiation library (autograd), which is essential for backpropagation in training neural networks.

How can I check if PyTorch is using my GPU?

You can check if a GPU is available using torch.cuda.is_available(). If it returns True, then PyTorch can utilize the GPU. To use the GPU, move tensors or models to the device:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tensor = tensor.to(device)

What is Autograd in PyTorch?

Autograd is PyTorch’s automatic differentiation engine that tracks operations on tensors to calculate gradients for backpropagation. When you set requires_grad=True on a tensor, PyTorch will automatically compute gradients for it during backpropagation, which is essential for training neural networks.

PyTorch cheatsheet: Basics of PyTorch

Installation

Importing libraries