Search⌘ K

Data Preprocessing: Training and Testing

Explore the process of separating input features and labels, then dividing the data into training and testing sets using Python’s train_test_split. Understand how to validate machine learning models effectively through proper data preprocessing and saving prepared datasets for future use.

Training and testing

We’ve talked about the goal of building an algorithm that performs well on data it already knows and predicts the labels of yet unknown data. This is what makes it essential to separate the data into a training and a testing set. We use the training set to build our algorithm, and we use the testing set to validate its performance.

Even though Kaggle provides a testing set, we ...