Data
Explore the foundational role of data in machine learning by understanding how input-output pairs form the basis of supervised learning. This lesson helps you visualize a two-dimensional dataset, recognize the importance of generalization, and prepare data for model training to classify examples effectively.
The ML process
For any complex problem that requires the computer to be able to identify patterns, there is an ML process to solve it.
This chapter demystifies each step of this process one by one. This lesson is about the first step—data.
Data in the classification problem
The first step in the human as well as machine learning pipeline is looking at data. In our previous lessons, we have learned that data could be of different types. For classifying the galaxy images and developing the image tagger application, we needed images in our dataset. Similarly, for the music identifier app, the dataset included a list of sound files. For the language translation app, we used a list of sentences as text in a certain language. Therefore, we now understand that for solving any machine learning problem, the first step is to acquire data.
It is the data where patterns need to be identified.
All data we have seen in ...