...

Data

Understand the importance of data for the machine learning process pipeline.

We'll cover the following...

The ML process
Data in the classification problem
The 2D movie dataset
Visualizing movies data
What needs to be learned
Code from scratch

Press + to interact

This chapter demystifies each step of this process one by one. This lesson is about the first step—data.

Data in the classification problem

The first step in the human as well as machine learning pipeline is looking at data. In our previous lessons, we have learned that data could be of different types. For classifying the galaxy images and developing the image tagger application, we needed images in our dataset. Similarly, for the music identifier app, the dataset included a list of sound files. For the language translation app, we used a list of sentences as text in a certain language. Therefore, we now understand that for solving any machine learning problem, the first step is to acquire data.

It is the data where patterns need to be identified.

All data we have seen in previous examples consists of a set of input-output pairs. For classification problems, each image, such as in galaxy-type identification, is labeled with its corresponding class. The same structure can be observed in the photo tagger, music identifier, and language translation applications.

Press + to interact

The Machine Learning Problem

The Machine Learning Process

From a Single Neuron to Artificial Neural Networks

Code for Machine Learning Using scikit-learn

Concluding Thoughts

How to Predict the Traffic Volume Using Machine Learning

Machine Learning Fundamentals

Data

The ML process

Data in the classification problem