# Case Study

Learn how to optimize machine learning model training using concurrent.futures for parallel hyperparameter tuning and grid search evaluation strategy.

One of the problems that often plagues data scientists working on machine learning applications is the amount of time it takes to train a model. In our specific example of the $k$-nearest neighbors implementation, training means performing the hyperparameter tuning to find an optimal value of $k$ and the right distance algorithm. In the previous chapters of our case study, we’ve tacitly assumed there will be an optimal set of hyperparameters. We’ll look at one way to locate the optimal parameters.

In more complex and less well-defined problems, the time spent training the model can be quite long. If the volume of data is immense, then very expensive compute and storage resources are required to build and train the model.

## Hyperparameter tuning and compute-intensive tasks

In our case study, hyperparameter tuning is an example of a compute-intensive application. There’s very little I/O; if we use shared memory, there’s no I/O. This means that a process pool to allow parallel computation is essential. We can wrap the process pool in AsyncIO coroutines, but the extra `async`

and `await`

syntax
seems unhelpful for this kind of compute-intensive example. Instead, we’ll use the `concurrent.futures`

module to build our hyperparameter tuning function. The design pattern for `concurrent.futures`

is to make use of a processing pool to farm out the various testing computations to a number of workers, and gather the results to determine which combination is optimal. A process pool means each worker can occupy a separate core, maximizing compute time. We’ll want to run as many tests of `Hyperparameter`

instances at the same time as possible.

## Components and summary of model

We’ll be using the `TrainingKnownSample`

and the `TestingKnownSample`

class definitions. We’ll need to keep these in a `TrainingData`

instance. And, most importantly, we’ll need `Hyperparameter`

instances.

We can summarize the model like this:

Get hands-on with 1200+ tech skills courses.