Search⌘ K
AI Features

Overfitting and Underfitting

Explore the concepts of overfitting and underfitting in supervised learning. Understand how model complexity impacts training and testing errors, and learn to balance bias and variance for better generalization. This lesson shows practical implementation using polynomial regression to visualize these effects and introduces strategies to prevent overfitting.

In machine learning, the ultimate goal isn't just to memorize past data; it's to make accurate predictions on new, unseen data. If a model performs perfectly on the data it was trained on but fails when deployed in the real world, it’s useless.

The concepts of overfitting and underfitting describe two major failure modes when attempting to achieve this crucial ability, which we refer to as generalization.

What is overfitting?

Overfitting is a modeling error where the model learns the training data (including its accidental irregularities or noise) too well, failing to capture the broad, underlying pattern. This results in a model that performs exceptionally well on the training data but poorly on any new data.

We can relate this to rote learning in a student: if a student only memorizes the solution to a specific practice problem, they might ace that problem, but if the problem is slightly changed (new data), they fail because they missed the underlying mathematical principle (the pattern).

  • Cause: We choose a model that is too flexible (has too many parameters) relative to the size and complexity of the training data. For example, a 10-degree polynomial is far more flexible than a 2-degree polynomial.

  • Result: High performance on training data, but low performance (high error) on testing data

The concept of model flexibility and its trade-off is often best seen through the lens of Polynomial Regression, where we use higher powers of the input feature (x,x2,x3,x, x^2, x^3, \dots) to fit a curve.

The illustration above shows the three stages as the complexity of the polynomial function increases:

Stage

Complexity (Polynomial Degree)

Description

Loss on Data

Underfitting

Low (e.g., degree 1)

The line is too simple and misses the fundamental trend of the data points.

High training and testing loss

Good fit (Generalization)

Medium (e.g., degree 7)

The curve matches the main trend without following every minor point.

Low training and testing loss

Overfitting

High (e.g., degree 10)

The curve becomes erratic, bending sharply to pass exactly through every training point, including the noise.

Very low training loss, high testing loss

Try it yourself

We will now implement a small model using Linear Regression and Polynomial Features to clearly visualize the effect of model complexity (polynomial degree) on training and testing error.

Our goal ...