Introduction to Distributed Systems for Dummies/

...

Architecture of Apache Druid

Get to know the architecture of Apache Druid.

We'll cover the following...

High-level architecture of Druid
- Data ingestion
- Query flow
Key takeaways

As a case study of a distributed system, we will discuss the architecture of Apache Druid, a popular OLAP database.

High-level architecture of Druid

Druid is an OLAP database. For systems that need support for a large volume of data, Druid has to be deployed as a distributed system. In a distributed environment, Druid has multiple nodes that work together to achieve the common goal of efficient analytical data querying.

In a distributed Druid system we have five types of nodes, as depicted in the following table:

Type	Role
MiddleManagers	Ingest data into the Druid system
Historical nodes	Store the queryable data
Coordinators	Manage data availability in a Druid cluster by rebalancing and replication of data in Historical nodes
Overlords	Manage data ingestion by assigning ingestion tasks to MiddleManagers
Brokers	Handle queries from external systems or clients

Introduction

What Distributed Systems Achieve for Us

Data in Distributed Systems

Communication Between Nodes

Data Processing in Large Scale

Distributed System Architectural Patterns

Case Study 1: Apache Spark

Case Study 2: Apache Druid

Conclusion

Architecture of Apache Druid

High-level architecture of Druid