Smoke Tests for the ML Pipeline

Learn the fastest way to check if the ML/DL/ETL pipeline works.

Overview

Smoke tests check if the code works at all and whether it’s testable in the first place. Smoke testing verifies the most basic assumptions, like a doctor verifying whether the patient is alive (do they have a pulse) before analyzing the state of their health.

Smoke tests scratch the surface of our piece of software. They are very high-level integration tests. Some people prefer calling them sanity checks. For script languages like Python, such tests can be compared to something as simple as “it compiles.”

“The phrase smoke test comes from electronic hardware testing. You plug in a new board and turn on the power. If you see smoke coming from the board, turn off the power. You don’t have to do any more testing.”—Cem Kaner, James Bach, Brett Pettichord, Lessons Learned in Software Testing.

When to use smoke tests?

The concept of a smoke test is useful when an engineer needs to start working with a complex codebase that is unknown to them and hardly covered by tests. Working with such code may initially sound complicated, but it’s doable.

To give you an intuitive sense of smoke tests, here are a couple of examples:

  • Check if all required imports work. This will test whether the paths are set correctly and whether modules are visible for the Python interpreter. Also, it ensures there is no crucial typo in our Python files and they are interpreted correctly.

  • Check if config is found and contains the required fields. This will prevent anyone from making unintentional changes to its structure.

  • Run some complex functions and ensure they return something. Through these tests, you can check if they don’t raise an error. Moreover, you can verify if the output format matches your expectations.

Testing end-to-end pipeline

When working with different pipelines, it’s a common practice to ensure the whole pipeline works by running it on some sample of data. Let’s look at some sample pipelines below.

Deep learning

In the deep learning world, we check if all variables are set correctly, the loss decreases over time, and it can overfit a tiny toy dataset (for example, on a single batch).

Get hands-on with 1200+ tech skills courses.