Let's say that we have found an accurate machine learning model. We have finally trained our model on the complete dataset and are ready for delivery. This is amazing, but it's not the end of our project. We need to save our trained model.
Overview
After all our efforts (whole data science and machine learning pipeline, including cross-validation to test the model's skills), we finally train the model on complete data and make it practically available for work (deployment). Here, we'll consider that the model trained on (X_train
, y_train
) is the final for the learning purpose.
A useful library called pickle provides a standard way of serializing objects in Python. We can use the pickle operation to serialize our trained model/algorithms and save this serialized format to a file with any name. Using the pickle library, we can load the saved model file at any time and deserialize it to make new predictions for unseen data. At this stage, it is also good to know that we usually schedule retraining and updating the serialized model files when a sufficient amount of new data is available.
Steps
So, let's move on and do the following steps:
Save the model with a name.
Load the saved model.
Get predictions for
X_test
using the saved model after loading.Cross-check if we get the same test data results as our previous results.
Get hands-on with 1200+ tech skills courses.