Introduction to the insert_overwrite strategy

Apart from the MERGE strategy, incremental models using the BigQuery adapter can use another strategy: the insert_overwrite strategy.

The need for another incremental strategy

Models with a MERGE strategy have to rescan the whole table, which can be time- and cost-consuming for big tables. insert_overwrite provides a solution to that issue because it does not rescan the whole destination table. Instead, it lets us partition our tables. This way, dbt only scans the relevant partitions, which is faster and more efficient.

The insert_overwrite strategy is more complex. It’s recommended to use it only if models underperform with the MERGE strategy.

Partitioned tables

In BigQuery, partitioned tables are a way to manage heavy tables by dividing them into smaller, more manageable segments based on a specific column called the partitioning column. This column’s values determine how the data is physically stored and queried.

Partitioning can significantly improve query performance and reduce costs by allowing BigQuery to scan and process only the relevant partitions instead of the entire table. A common use case of partitioning is time-based data, where we might partition by a timestamp column, enabling more efficient querying for specific time ranges.

To create a partitioned table in dbt, all we have to do is specify a partition_by configuration in our model file like this:

Get hands-on with 1400+ tech skills courses.