This device is not compatible.

Real-time Analysis of a Simulated E-commerce Data Stream

PROJECT


Real-time Analysis of a Simulated E-commerce Data Stream

In this project, we’ll learn how to design and implement a simulated real-time data pipeline that ingests, processes, and analyzes streaming data from an e-commerce website to make informed decisions.

Real-time Analysis of a Simulated E-commerce Data Stream

You will learn to:

Create a data stream using a Python script.

Create Kafka topics on the Confluent Cloud.

Deploy Apache Druid cluster on EC2 instance.

Build a dashboard and alerts with Grafana.

Skills

Data Engineering

Data Visualization

Streaming Data Handling

Prerequisites

Basic understanding of Kafka

Free account Confluent Cloud

Basic understanding of Druid

Basic understanding of Grafana

AWS account

Technologies

Kafka

Druid logo

Druid

Python

Grafana logo

Grafana

Confluent logo

Confluent

Project Description

In this project, we will design and implement a real-time data pipeline that ingests, processes and analyzes a simulated e-commerce data stream. We will get a chance to tackle some of the challenges data engineers face with the help of a realistic case study.

We will grapple with integrating a suite of cutting-edge tools, including Confluent Cloud, Kafka, Apache Druid, and Grafana, all hosted on AWS. The challenges include managing streaming data ingestion, ensuring low-latency querying, visualizing data dynamically, and optimizing cloud resources. Brace yourself for a comprehensive hands-on experience in modern data engineering!

The data pipeline
The data pipeline

Project Tasks

1

Generate Purchase records

Task 0: Getting Started

Task 1: Build a Catalog of Products

Task 2: Set Up Variables and Functions

Task 3: Generate Records

2

Kafka Confluent Cloud

Task 4: Create Resources on Confluent Cloud

Task 5: Create a Kafka Producer Instance

Task 6 : Send Records to Kafka Topic

3

Deploy Apache Druid

Task 7: Configure AWS CLI with Programmatic Credentials

Task 8: Create a Security Group

Task 9: Create an EC2 Instance

Task 10: Set Up and Deploy Apache Druid on EC2

Task 11: Create a Kafka Data Source

Task 12: Query Your Data

4

Build a Dashboard with Grafana

Task 13: Create a Data Source

Task 14: Build a Dashboard

Task 15: Clean Up

Congratulations!