This device is not compatible.
PROJECT
Create Your First Data Pipeline with a Dashboard
We’ll teach you how to create a data pipeline and interactive data visualization in Python. We’ll begin by building a bespoke data pipeline with Kedro and then utilize hvPlot to display the findings as interactive graphs.
You will learn to:
Create the data preprocessing and data transformation pipelines
Apply multiple levels of transformations on data
Visualize data to draw conclusions
Add interactivity to visualizations
Skills
Data Science
Data Visualisation
Data Manipulation
Data Pipeline Engineering
Prerequisites
Basic programming in Python
Basic knowledge of Pandas
Basic knowledge of data pipelines
Basic knowledge of plotting in Python
Technologies
Kedro
Python
HvPlot
Project Description
According to the Statista 2022 analysis, the quantity of data generated, recorded, replicated, and consumed globally is predicted to skyrocket to 181 zettabytes from 2021 to 2025:
Data is meaningless without professionals who convert it into valuable insights. Here are a few examples of how a data scientist offers value to a company:
- They allow a business to make better decisions based on the insights.
- They suggest actionable goals based on trends that will help the company flourish.
- They determine new opportunities and make judgments based on measurable, data-driven evidence, and evaluate these conclusions using consumer insights.
- They perform target audience identification and refinement based on buyer trends.
As a result, modern organizations are awash in data which necessarily involves data processing and analysis. In this project, we’ll learn to create a data pipeline and interactive data visualization in Python using Kedro and hvPlot to get valuable insights from our data.
- Kedro is an open-source Python framework for creating systematic, reusable, and modular data pipelines.
- hvPlot is a high-level plotting API based on HoloViews. It offers an alternative to the static plotting API offered by Matplotlib and other libraries. It provides numerous interactive features such as panning, zooming, hovering, clickable/selectable axis, legends, and so on.
We will begin by building a data pipeline from scratch with Kedro. We’ll introduce data preprocessing in our data pipeline along the way and run our data pipeline successfully. We’ll get an overview of the static and interactive plots with Pandas and hvPlot, respectively. Then, we’ll transform the processed data further and display our findings as interactive plots.
Project Tasks
1
Star the Data Pipeline
Task 1: Load the Raw Data
Task 2: Create the First Node
Task 3: Create a Data Preprocessing Node
Task 4: Use the Data Catalog
Task 5: Design the Data Pipeline
Task 6: Run the Data Pipeline
2
Set Up Interactive Plotting
Task 7: Create the Static Plots With Pandas
Task 8: Create Dynamic Plots with hvPlot
Task 9: Create the Dynamically Filtered KDEs Using hvPlot
3
Perform Advanced Data Manipulations in the Pipeline
Task 10: Create a Node for Data Transformation
Task 11: Modify the Data Catalog
Task 12: Run the Data Pipeline With Recently Created Node
4
Enhanced Interactive Plots with hvPlot
Task 13: Load the Transformed Data
Task 14: Plot a KDE: Hourly Temperatures for Individual Classes
Task 15: Plot a KDE: Hourly Wind Speeds for Individual Classes
Congratulations!