Course Introduction

Get a brief overview of the course, including the topics covered, prerequisites, intended audience, and tools used.

What this course is about

Data is the fuel that powers modern businesses and organizations. It allows us to make informed decisions, identify trends, and create products and services that meet the needs of our customers. However, data is rarely clean and structured when we first receive it. Instead, it often comes in various formats and can contain errors, inconsistencies, and missing values.

Data wrangling is the process of cleaning, transforming, and preparing data for analysis. It's an essential step in the data science workflow and is necessary because real-world data is often messy and unstructured. Data wrangling involves various techniques, from handling missing data to reshaping datasets to identifying and handling outliers.

In this course, we'll explore the concept of data wrangling and its application in developing data solutions.

What we'll learn

This course is designed to provide a comprehensive understanding of data wrangling and its application in developing data solutions. By the end of the course, we will be able to:

  • Describe the concept of data wrangling and its importance in data analysis.

  • Discuss common data wrangling challenges and their solutions, including handling missing data, dealing with data outliers, and addressing data quality issues.

  • Compare data wrangling with related concepts, such as data mining, visualization, analysis, and machine learning.

  • Differentiate between data wrangling techniques to understand their suitability in different scenarios.

  • Apply data wrangling techniques using Python to prepare data for further analysis. This includes techniques such as data cleaning, transformation, and aggregation.

The audience

This course is ideal for data professionals who want to learn how to effectively transform and prepare data for analysis. It's also suitable for anyone interested in learning Python for data wrangling, especially if you want to become a data analyst, data scientist, data engineer, or machine learning engineer. No prior knowledge of data wrangling is necessary, but a basic understanding of Python programming is recommended.

Tools that are covered in this course

Throughout this course, we'll use Python and several libraries commonly used in data wrangling, including NumPy and pandas. NumPy is a library for working with numerical data in Python, while pandas is a library for data manipulation and analysis. We'll learn how to use pandas to clean, transform, and aggregate data. We'll also use scikit-learn (sklearn), a library for machine learning, to identify outliers in our data.

We are excited to take this journey with you and help you develop a strong foundation in data wrangling!