...

Batch Ingestion

Learn different batch ingestion patterns with a real-life example using BigQuery.

We'll cover the following...

Time-based vs. size-based batch ingestion
Full snapshot vs. incremental load
Ingest Bitcoin transactions into BigQuery
- Step 1: Create a partitioned destination table
- Step 2: Create a scheduled job

Press + to interact

The ingestion layer works with various data sources, which data engineers typically don't have full control of. A good practice is building a layer of data quality checks and a self-healing system to react to unexpected situations, such as data loss, corruption, system failure, etc. Let’s explore a traditional but widely used design pattern, batch ingestion, with a real-life example using BigQuery.

Batch ingestion is a commonly used way to ingest data. It processes data in bulk, meaning that a subset of data from the source system is extracted and loaded into the internal data storage based on the time interval or the size of the accumulated data.

Time-based vs. size-based batch ingestion

Time-based batch ingestion often processes data on a fixed time interval (e.g., once a day) to provide periodic reporting. It is often used in traditional business ETL or ELT for data warehousing, such as getting daily transactions ...

Getting Started

Data Team Structure

Data Engineering Life Cycle

Cloud Data Architecture

Data Ingestion

Data Modeling

Data Orchestration

Mastering Airflow: Building an ETL Pipeline

Data Quality

Build an End-to-End Data Pipeline for Formula 1 Analysis

Epilogue

Appendix

Batch Ingestion

Time-based vs. size-based batch ingestion