Search⌘ K
AI Features

Generate synthetic dataset

Explore how to generate synthetic datasets in Scikit-Learn for both classification and regression tasks. Understand key parameters for creating controlled data distributions, visualize sample data, and prepare artificial datasets to enhance your machine learning experiments.

In the last lesson, we showed how to load the built-in dataset.

In addition to those built-in datasets, scikit-learn also provides some functions that could generate data that follows some distributions.

Generate classification dataset

As we have already mentioned above, scikit-learn provides some functions to build artificial datasets. As you can see from the code below, make_classification generates a random n-class classification dataset.

Notice: The default number of the class is 2, you can change it from the parameter n_classes.

In this example below, the data has two ...