Architecture of Apache Druid
Get to know the architecture of Apache Druid.
As a case study of a distributed system, we will discuss the architecture of Apache Druid, a popular OLAP database.
High-level architecture of Druid
Druid is an OLAP database. For systems that need support for a large volume of data, Druid has to be deployed as a distributed system. In a distributed environment, Druid has multiple nodes that work together to achieve the common goal of efficient analytical data querying.
In a distributed Druid system we have five types of nodes, as depicted in the following table:
Type | Role |
MiddleManagers | Ingest data into the Druid system |
Historical nodes | Store the queryable data |
Coordinators | Manage data availability in a Druid cluster by rebalancing and replication of data in Historical nodes |
Overlords | Manage data ingestion by assigning ingestion tasks to MiddleManagers |
Brokers | Handle queries from external systems or clients |
The above five node types are internal to the Druid architecture. Apart from them, Druid also takes support from some external systems.
-
Deep storage: To ensure fault tolerance, Druid uses deep storage. Any data in the system is stored in deep storage like AWS S3. This ensures data durability, as well ...
Get hands-on with 1400+ tech skills courses.