Data Schema: Avro and Protobuf

Explore the use of Avro and Protobuf schemas to maintain data quality by defining data structure and validation rules. Understand their roles in data serialization and deserialization within data engineering pipelines, including use cases in Apache Kafka and Google Cloud services. Learn to choose between these schemas based on schema evolution needs and performance requirements.

We'll cover the following...

Apache Avro
Protobuf
Make a choice

For example, in the context of data exchange between two applications, a schema defines the structure and constraints of data being passed between systems, including data format (XML, JSON, or CSV), field types (int, float, or string), and any rules such as the range of a numeric value and the date format. We will learn about two common data schema types—Avro and Protobuf—and how to incorporate them into data engineering pipelines.

Apache Avro

Apache Avro is an open-source data serialization system that exchanges and stores data between different applications in an efficient manner, independent of the programming languages they use.

1.Getting Started

2.Data Team Structure

3.Data Engineering Life Cycle

4.Cloud Data Architecture

5.Data Ingestion

6.Data Modeling

7.Data Orchestration

Project

8.Data Quality

Mini Project

9.Epilogue

10.Appendix

Mock Interview

Data Schema: Avro and Protobuf

Apache Avro