Choosing a task for ML depends on several factors, including the problem domain, available data, and resources. Here are some general guidelines for choosing a task for ML:

  • Identifying the problem: First, we need to identify the problem we want to solve. This could be a business problem, a research question, or a challenge in our field.

  • Determining the data availability: We must determine if we have the necessary data to train an ML model. The amount and quality of data will determine the type of ML task we can perform.

  • Considering the task type: We need to consider the different types of ML tasks and which one is best suited for our problem. For example, if we want to predict a continuous numerical value, we can use a regression algorithm. If we want to categorize data into different classes, we can use a classification algorithm.

  • Evaluating the complexity of the task: We need to consider the complexity of the task and the resources we have available. Some ML tasks require more computing power and time than others, so we might need to choose a task that is feasible given our available resources.

  • Setting the evaluation metric: We need to set the evaluation metric that we'll use to measure the performance of the ML model. The evaluation metric should be aligned with the problem we want to solve and the goals we want to achieve.

  • Iterating and improving: ML is an iterative process, so we need to plan to iterate and improve our model over time as we collect more data, refine our approach, and adjust our evaluation metric.

Ultimately, the task we choose for ML should be aligned with our problem domain and the goals we want to achieve. Choosing the right task is critical to building an effective ML solution that can provide real value and insights.

Example of ML task selection process

Here is an example of a scenario where we can observe all the steps of this process.

Step 1: Identifying the problem

Let’s imagine a scenario where we want our ML model to recognize whether a particular player has won a tic-tac-toe game by looking at the board configuration at the end of the game.

Step 2: Determining the data availability

We need to decide what data we can feed into it. In this specific example, we can use a delimited file with the following columns:

  • Top-left-square

  • Top-middle-square

  • Top-right-square

  • Middle-left-square

  • Middle-middle-square

  • Middle-right-square

  • Bottom-left-square

  • Bottom-middle-square

  • Bottom-right-square

Each of these columns will have a value indicating whether the square has been taken by our player, the rival player, or has been left empty. We can put any arbitrary values, such as x for the squares taken by the player using “X,” o for the squares taken by the rival player using “O,” and b for the squares that have been left blank.

Step 3: Considering the task type

In this scenario, we determine whether the game was won or not. So it's a binary classification problem. Therefore, we need to apply a label column to our data with arbitrary-defined classes. It can be numeric values, such as 0 and 1. However, we can also use textual values, such as positive and negative. Once we apply both the data and the labels, the complete dataset can be found in the tic-tac-toe.txt file in the playground below:

Get hands-on with 1200+ tech skills courses.