What is lambda architecture?
Lambda architecture is a big data processing model that organizations use to combine the batch pipeline and real-time data pipeline.
Incoming data component
The incoming data can be from various sources like application logs, clickstreams, etc. The data is simultaneously sent to the batch layer and the speed layer.
Batch layer
The batch layer is responsible for managing the
- Data is raw, i.e., unprocessed data.
- Data is immutable, i.e., new data gets appended to the dataset rather than getting updated.
The master dataset is the source of truth and lives forever. Even if there’s loss of data in other layers, the results can be recomputed by running through the master dataset. The batch layer also precomputes the data into batch views.
Speed layer
The speed layer takes care of the data that is yet to be indexed by the batch layer, i.e., recently arrived data. It complements the batch layer by indexing the new data; thus, the speed layer reduces the latency of user queries on the latest data.
Serving layer
The serving layer combines the results generated from the batch and speed layer in order to answer the user queries.
Advantages of lambda architecture
- Scalability: each component in the architecture can be scaled independently.
- High availability: a combination of the batch and speed layer ensures that queries never go unanswered.
- Real-time in nature
- Fault tolerance: if there are problems in other layers, the results can be recomputed by running through the master dataset.
Disadvantages of lambda architecture
It can be difficult to maintain and debug two different technology stack and code bases for batch.