Coding Exercise: Analyze Clickstream with RDDs
Explore how to use PySpark RDDs to analyze clickstream data from an e-commerce platform. Learn to apply key transformations such as map, filter, and reduceByKey to count page visits and create foundational ETL pipelines for big data processing.
We'll cover the following...
We'll cover the following...
Scenario
You’re working as a junior data engineer at a growing e-commerce company. Every day, the platform collects millions of clickstream logs—records of users interacting with the website. For now, you’ve been given a small sample to practice with.
Dataset
Here’s a small sample ...