Search⌘ K
AI Features

Checkpointing

Explore how to implement checkpointing in PyTorch to save and resume model training. Understand managing model state dictionaries, loading checkpoints, and continuing training for flexible epochs, ensuring you can pause and resume model development without losing progress.

Saving checkpoints

To checkpoint the model to resume training later, we can use the save_checkpoint method, which handles the state dictionaries for us and saves them to a file:

Python 3.5
# Saving checkpoint of model
sbs.save_checkpoint('model_checkpoint.pth')

Resuming training

Remember, when we did this in the chapter, Rethinking the Training Loop, we had to set up the stage before actually loading the model, loading the data, and configuring the model. We ...