Data Visualization

Now, we’ll dive into the world of data visualization. Data visualization is a fundamental skill in data science that enables us to gain insights from data and communicate our findings effectively. We’ll use seaborn, a popular Python data visualization library, to explore different visualization techniques.

Note: The initial four lessons of this chapter are dedicated to practical hands-on exercises that cover different aspects of data science for a specific problem. We are going to cover the following:

  • In this lesson, we cover the problem statement, dataset properties, and its visualization.

  • In the next lesson, we cover the basic data processing techniques.

  • The next lesson wraps the modeling and analysis of the processed dataset. It provides a basis for ML-based and DL-based modeling and how to evaluate the performance of the models.

  • The last lesson of this practical exercise focuses on the analysis of the results and their presentation for nontechnical stakeholders.

Problem statement

In this series of exercises, we’ll solve the problem of tip estimation for a restaurant. We have a publicly available labeled dataset with different variables, such as the total amount of the bill, the sex of the payer, and the time and day of the bill payment. With the labeled data, we can solve the problem with supervised regression. It’s a regression problem because we have to predict the continuous data—tip amount. Supervised learning is an ML technique where we use labeled data to train the model for predictions. The dataset contains examples with known tip values, and the model learns the underlying relationship between input features, such as total bill, number of diners, and time of day, for making predictions. We’ll assess the model’s performance with automatic metrics, such as mean absolute error or mean squared error, reflecting the accuracy of the model’s tip predictions.

Importing the dataset

As the first step, we load the dataset and check its properties. Here, we have a dataset that is readily available for use. If we have raw or unstructured data, we might need to clean the data before data visualization. Let’s import the necessary libraries and load our dataset:

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy