Fundamentals of Machine Learning: A Pythonic Introduction/

...

Validation

Get introduced to the importance of data splits and the process of cross-validation.

We'll cover the following...

Data splits
- Validation set
- Implementing data splits
  - Using numpy
  - Using sklearn
Cross-validation
- Implementing k-fold cross-validation

Since regularization is a method to fine-tune the subject model by introducing an additional penalty in the error function, we need to validate its impact. Several hyperparameters need to be set before optimizing the objective function. The hyperparameters include model $f_\bold w$ , loss function $L$ , regularization function $R$ , and the scale of regularization $\lambda$ . Validation is the process of testing the accuracy of the trained model, which also measures the validity of the hyperparameters.

Note: An accurate indicator of generalization is the performance of the trained model on unseen dataThis is the data that isn’t used in the training process..

Data splits

Where to get the unseen data for validation? One way is to hold out a percentage of available data and use the rest for training. Once the training is complete, the validation can be carried out on the subset of available data that was kept for validation, known as the hold-out set.

Note: The more popular term used for hold-out set is test set.

Press + to interact

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Validation

Data splits

Validation set