Introduction to the Traveling Salesperson Problem

Discover the history and mathematical context behind the traveling salesperson problem for an increased understanding of combinatorial optimization.

What we’re going to solve

TSP is a very well-known challenge. The task is to choose a sequence for visiting several places in such a way that no location except the first is visited more than once, no location is missed, the total travel distance of the traveling salesperson is as short as possible, and the first location is the same as the last location.

Let’s imagine we work for a burger chain that operates worldwide. In our role as a data scientist, our boss has asked us to calculate the shortest distance between 13 stores located in Moscow. This is because our colleague from the sales department is supposed to visit all these 13 stores within one day (in other words, as quickly as possible) and return to their starting point at the end. The starting point is the store that was visited first.

Pictures say a thousand words, right? Therefore, take a look at the map below. The 13 blue dots represent the stores and their two-dimensional spatial distribution.

Thirteen stores on a map

The salesperson needs to start at one store and visit all the stores in turn once. In the end, the salesperson is to return to the starting point again. Let’s say the salesperson starts at StoreK at the bottom right of the map. From there, the salesperson should visit all stores. Since time equals money, it’s only natural that we want to find the shortest traveling route. So, instead of connecting the points crosswise, we look for an order that keeps the length of the red lines as small as possible in total. Upon visual inspection, for example, the red connections (route 1) might look relatively reasonable. Or does the green route (route 2) represent the shorter total distance? Or maybe the purple route (route 3) is even shorter?

Double-click the desired route in the legend on the right

As you might have already noted, there is an almost unmanageable number of different sequences (we’ll learn later how many options there are in total). But how can we find the shortest total distance without losing track? This course will bring us closer to the solution step by step. We’ll first break the problem down into smaller subproblems to make it easier. For instance, instead of 13 stores, it’s much easier to start with five stores.

Network graph shows interconnections
Network graph shows interconnections

Learning how to solve this quest using a programmatic approach is the goal of this course. Great importance is also placed on the graphical representation. For example, we’ll use network diagrams to intuitively identify patterns.

Even if we set the starting point to one specific store (StoreA, for example), five stores already result in 24 different routes.

Different routes for five stores starting from StoreA

In this course, we’ll learn how to solve TSP in a structured way.

Why we’re going to solve it

It’s important to solve TSP as optimally as possible for the following reasons:

  • Time: The shorter the travel distance, the more time we can spend in our shops instead of on the road.

  • Money: Less driving distance usually means less travel expenses.

  • CO2 footprint: Lesser distance means lesser fuel and gas emissions.

TSP can also be used for solving other problems, like:

  • Optimizing the order of tasks in a manufacturing process.

  • Optimizing the order of stops for a public transportation system, delivery truck, taxi service, courier service, etc.

Therefore, it’s worthwhile in several respects to deal with this algorithm in detail. And that’s exactly what we’re going to do in this course.

How we’re going to solve it

TSP is a type of optimization problem. Data science techniques, such as machine learning and optimization algorithms, can be used to solve it. In addition, in this course, we’ll use data clustering techniques. Clustering is the process of organizing data into groups based on similarities. Enriching the traveling salesperson data will help us make better decisions. This course provides detailed explanations and suggestions so we can adapt what we learn to our requirements as easily as possible.

Possible solutions for TSP
Possible solutions for TSP

The emphasis of this course is on practical examples with a strong focus on graphical evaluations and real app functionality. Instead of spending excessive time on theoretical formulas like the Euclidean distance, we’ll concentrate on how to apply that formula to our real business case. This course focuses on applied programming rather than theory.

def euclidean_distance(origin, destination):
distance = np.sqrt((origin[0]-destination[0])**2 + (origin[1]-destination[1])**2)

We’ll also critically question the practicality of the formulas used (e.g., the practicality of Euclidean distance vs. road distance) because, in this course, the output is what matters most.

Roads (blue line) usually don’t connect places straight as a die (black line)
Roads (blue line) usually don’t connect places straight as a die (black line)