Load built-in dataset

In this lesson, we'll see how to load the built-in datasets and create data based on certain distributions.

We'll cover the following

In this lesson, we cover the topic of datasets. A dataset is an essential part of the process of Machine Learning projects because it is the starting point for a project.

The scikit-learn library has many built-in datasets, some well-known and widely used. For example, it has the iris and mnist datasets for classification and the boston house price for regression. In addition to these predefined datasets, scikit-learn provides other functions that can generate some data that follows certain distributions.

Meanwhile, scikit-learn pre-defines some functions that can download real-world datasets from the internet, such as 20 newsgroups, LFW, or KDDcup99 datasets.

All datasets are in the module datasets. Import this module at the beginning of your Python file as below.

import sklearn.datasets as datasets

Get hands-on with 1200+ tech skills courses.