Local Stack with DuckDB and dbt

Get familiar with dbt and data transformation pipelines.

A typical analytics environment in the corporate world is built around (distributed) central storage and query technology optimized for analytics. Any other components like ETL, business intelligence, and entity resolution must be integrated to maintain the efficiency of this data stack.

The distributed nature of technologies like Snowflake, Databricks, and BigQuery is abstracted away, so it feels like there is one place to store and query data. Let’s make this idea concrete by replicating this kind of stack with open source on a single machine.

Configuring our data stack

The following image illustrates a technology that could consist of several proprietary (and costly) components. Here, we will replicate the basic functionality with open source and refer to the different components by the colors in the image below:

Get hands-on with 1200+ tech skills courses.