Architecture of Apache Druid

Get to know the architecture of Apache Druid.

As a case study of a distributed system, we will discuss the architecture of Apache Druid, a popular OLAP database.

High-level architecture of Druid

Druid is an OLAP database. For systems that need support for a large volume of data, Druid has to be deployed as a distributed system. In a distributed environment, Druid has multiple nodes that work together to achieve the common goal of efficient analytical data querying.

In a distributed Druid system we have five types of nodes, as depicted in the following table:

Type

Role

MiddleManagers

Ingest data into the Druid system

Historical nodes

Store the queryable data

Coordinators

Manage data availability in a Druid cluster by rebalancing and replication of data in Historical nodes

Overlords

Manage data ingestion by assigning ingestion tasks to MiddleManagers

Brokers

Handle queries from external systems or clients

The above five node types are internal to the Druid architecture. Apart from them, Druid also takes support from some external systems.

  • Deep storage: To ensure fault tolerance, Druid uses deep storage. Any data in the system is stored in deep storage like AWS S3. This ensures data durability, as well ...

Get hands-on with 1400+ tech skills courses.