Baselines
Learn how to use baselines to help you better assess your models.
We'll cover the following
In ML, baselines serve as reference models that provide a benchmark for evaluating the performance of more complex models. Baseline models are typically simple and make minimal assumptions about the data.
They provide a starting point for model development and evaluation. They are relatively simple models that establish a minimum level of performance against which other models can be compared. Baselines are useful for assessing whether a more complex model provides significant improvements over a simple reference point.
Baseline models serve several purposes, including the following:
Performance evaluation: They provide a baseline against which the performance of more complex models can be compared.
Model complexity assessment: Comparing a complex model to a baseline helps determine if the additional complexity is justified by the performance gain.
Sanity check: Baselines allow us to verify if our more complex models are learning meaningful patterns in the data.
In real-life scenarios, they often help us decide if using complex ML algorithms is even a wise choice. If we cannot get better results than the simple mean, this suggests that building a complex ML pipeline might not be worth the effort.
In this lesson, we introduce two useful classes provided by scikit-learn: DummyClassifier
and DummyRegressor
. These classes allow us to create simple baseline models for classification and regression tasks, respectively.
Dummy classifier
The DummyClassifier
class in scikit-learn implements a simple baseline strategy for classification tasks. It allows us to create a classifier that makes predictions using simple rules or random guessing.
The DummyClassifier
class supports different strategies for generating predictions:
stratified
: It generates predictions by randomly guessing according to the class distribution in the training data.most_frequent
: It always predicts the most frequent class in the training data.uniform
: It generates predictions uniformly at random.constant
: It always predicts a constant class label specified by the user.
Let’s go through a quick example of how to use the DummyClassifier
class:
Get hands-on with 1400+ tech skills courses.