Search⌘ K
AI Features

Data Tests

Explore how to implement data tests in dbt to validate data model accuracy and integrity. Learn to configure generic and singular tests, interpret failures, and apply advanced settings like severity levels, thresholds, and failure storage. This lesson helps you build reliable data models and maintain data quality within your dbt projects.

What are data tests?

When performing data transformation, it’s crucial to make sure that data is accurate and reliable. As the number of models increases, it can be hard to check all of them manually. Fortunately, dbt provides a feature called data test.

A data test is a validation mechanism used to verify the expected behavior of a data model. Tests are configured in the project and are run with the test command.

dbt provides two types of data tests:

  • Generic tests

  • Singular tests

Generic tests

A generic test can be applied to different models. For example, checking that a column does not contain null values is a generic test that can be applied to several models but to different columns.

Configuring generic tests

Generic tests need to be set up in property files (such as schema.yml). This setup allows the user to apply predefined tests to their models and columns to ensure data quality and integrity.

Property files

A property file is a YAML file that is stored in the models directory and contains information about model properties.

YAML
version: 2
models:
- name: good_orders
description: A model with valid orders.
columns:
- name: order_id
description: A unique id for the order
- name: customer_id
data_type: integer
- name: bad_orders
columns:
- name: product_id
- name: order_status

It’s possible to store all model properties in a single file, but it’s usually more convenient to split the configuration into different files.

Generic tests have to be applied to a column in a particular model, under the test config:

YAML
version: 2
models:
- name: good_orders
columns:
- name: order_id
tests:
- unique
...