Solving the Traveling Salesperson Problem in Python/

...

Inspect Data: Data Analysis

Learn why exploratory data analysis is so important for any data science project.

We'll cover the following...

Exploratory data analysis (EDA)
- Descriptive statistics
- The pandas_profiling package

Press + to interact

Descriptive statistics

Start with the simple data analysis first. Before we start applying basic statistics to the data, it’s a good idea to take a look at the data quality in advance. With the light function, isnull, we can check if there are any records with null values.

Fortunately, the data is complete. That’s a good start, but it doesn’t tell us anything about whether incorrect entries might have been made or not. For example, it could be that one longitude or latitude has an incorrect value. For example, we can check the minimum and maximum values of all longitudes and latitudes with the min and max functions.

Press + to interact

But how are we supposed to judge if the longitude and latitude information is correct? We could now check that all numbers are at least within 0 to 180 degrees. But even if the numbers were valid for the coordinate system, this still says nothing about whether they are correct.

But surely there must be a better way than programming all these checks manually one by one, right?

The `pandas_profiling` package

A very handy tool to use to take a first look at our data is the pandas_profiling package. This library automatically generates a standardized univariate and multivariate report for data understanding. This comprehensive report of the dataset makes it easy to quickly get an overview and identify potential issues. It can help to detect outliers, correlations, missing values, and other patterns in the data. Additionally, it provides a convenient way to visualize the data for further exploration. With the profile_report function, we can also add a title to this report. The pandas_profiling package includes a minimal configuration in which the most expensive calculations, such as correlations and interactions between variables, are turned off. Since correlations between the coordinates have no relevance, we set minimal to True. The report can be output as an HTML file.

Press + to interact

The pandas_profiling package is especially useful when the data to be analyzed includes more than just the three columns and 13 rows. A quick look at the overview shows that there are no missing fields. Moreover, as expected, the lat and lon columns are of numerical data type, while the store column is categorical. But it’s also worth mentioning that pandas_profiling might not be the optimal tool for large datasets.

In this lesson, we learned how to conduct exploratory data analysis and why checking data quality is so important.

What the Traveling Salesperson Problem Is About

Preprocessing of Traveling Salesperson Data

Solving the Traveling Salesperson Problem

Traveling Salesperson Data Mining

Building the Traveling Salesperson Dashboard

Scalability

Conclusion

Appendix

Use Simulated Annealing for the Traveling Salesman Problem

Inspect Data: Data Analysis

Exploratory data analysis (EDA)

Descriptive statistics

The `pandas_profiling` package

Use Simulated Annealing for the Traveling Salesman Problem

Inspect Data: Data Analysis

Exploratory data analysis (EDA)

Descriptive statistics

The pandas_profiling package

The `pandas_profiling` package