Improving ML Model Performance

Explore practical strategies to improve your machine learning model's performance. Understand how to optimize training data, choose appropriate evaluation metrics like F1 score, balance precision with inference time, analyze errors, handle real-world data variations, and apply transfer learning. This lesson prepares you to develop more robust and effective ML models for real applications.

We'll cover the following...

Ideas for improving ML model performance
Assumptions in ML
Single number evaluation metric
Training and test set distribution
Comparing algorithms and human
Error analysis of the model
Data from different distribution
Transfer learning

Assumptions in ML

When building models, we must keep a few assumptions in mind, including:

Good performance on the training set
Good performance on validation set
Good performance on the test set
Good performance on real data

To consider all these assumptions, we start with training data, apply some ideas, and moving to a validation set followed by the test data. If performance is not good on real-world data, we need to enhance our algorithm with large data in different sets and change the cost function according to the real problem.

Comparing algorithms and human

The goal of any machine learning model is to solve a problem efficiently by building a model that replaces the need for human labor on those tasks. For some tasks, like audio recognition and facial recognition, machine learning actually surpasses human performance and provides superior results.

Comparing the performance between a computer and a human shows how the algorithm is more beneficial. If the algorithm surpasses human-level performance, it is more efficient, and we can try with better parameters to improve it further. If it does not reach human-level performance, we can take the data from the human performance and add it to the training data to improve the computer’s performance.

Human performance can help create better models. That’s why large companies focus on data generated from human labor and take surveys or arrange paid tasks to provide this data. and improve our algorithms. Below, the chart compares the accuracy of different approaches for two tasks.

Data from different distribution

To solve any problem using machine learning, we start with the data. If the data is not available, we must generate it. Consider a voice command smart home application. You want to design a system that takes a user’s voice and does the operations requested of it. A few examples are, “Turn on the TV”, “Switch off the light”, “Increase fan speed by 2”,“Open the door”, etc. You start by building this system and creating the data. You record these instructions in a studio and write the instructions for each voice command. Your system works well in your testing, and you deploy it to your smart devices.

But wait! Let’s say it goes terribly wrong. When the user instructs the system to switch on the light, the system turns off the fan. When asked to turn on the TV, it opens the door instead. The user becomes frustrated and uninstalls your system. Where did things go wrong?

One possibility is different data during training versus in the real world. For example, in testing, you only considered a positive voice, but in practice, a music system is also on, so that voice is also coming into your system. The home of one user is near a loud road, so traffic noise is interrupting your system. So many problems can arise that you don’t account for in training.

A good system addresses every possible scenario that could occur once an application is deployed. It is necessary to train on these noisy examples and evaluate the system’s performance. Take examples of noisy sets, prepare a noisy test set, and evaluate the performance. If it’s not running perfectly, improve your model with the addition of new data. You will need to:

Handle different data distributions

Check data manually to understand the problem’s origin.
Put more real data into the training and validation set.
Generate augmented data. (i.e., noisy voice data or multiple voices talking at once)

Suppose you have created an image classification model. You added different examples, blurry images, multiclass examples, etc. You separated data into training and validation. You built a model on training data and checked the performance on the validation set. Now, you are using a set of new images captured from your mobile as a testing set and apply the model. You used these new images as a test set. These are the performance numbers:

Training error – 10%

Validation error – 9%

Testing error – 1%.

Is this case possible? How can you justify a low testing error?

Show Answer

Did you find this helpful?

Model	Precision	Recall
C1	0.80	0.90
C2	0.85	0.86
C3	0.88	0.82

Model	Precision	Recall	F1 Score
C1	0.80	0.90	0.847
C2	0.85	0.86	0.854
C3	0.88	0.82	0.848

Model	F1 Score	Inference Time
C4	0.88	52ms
C5	0.78	15ms
C6	0.90	350ms

	Task-1	Task-2
Human Accuracy	99.5%	92%
Training Accuracy	90%	90%
Test Accuracy	85%	85%

	Accuracy	Wrong Labels
C1	80%	2%
C2	96%	2%

1.Are You Ready to Become a Data Scientist?

2.Python Basics

3.Python Libraries

4.More Data Science Tools

5.Data Structures and Algorithms - I

6.Data Structures and Algorithms - II

7.Statistics and Probability

8.Feature Engineering

9.Basics of Machine Learning

10.Regression

11.Classification

12.Unsupervised Learning

13.Advanced Topics in Machine Learning

14.Conclusion

Mock Interview

Improving ML Model Performance

Ideas for improving ML model performance

Assumptions in ML

Single number evaluation metric

Training and test set distribution

Comparing algorithms and human

Error analysis of the model

Data from different distribution

Transfer learning