How to create data for n-class problems using Scikit-learn
Overview
In Scikit-learn, the sklearn.datasets.make_classification() function generates the data for an n-class classification problem. Let's take a closer look at the syntax, parameters, and return values of the function.
Syntax
Here is the syntax of the function:
sklearn.datasets.make_classification(n_samples=100,n_features=20,n_informative=2,n_redundant=2,n_repeated=0,n_classes=2,n_clusters_per_class=2,weights=None,flip_y=0.01,class_sep=1.0,hypercube=True,shift=0.0,scale=1.0,shuffle=True,random_state=None)
Parameters
n_samples: This is the number of samples, and its value type is anint. The default value is100.n_features: This is the total number of functions. Its value type isint, and its default value is20.n_informative: This is the number of informative features. Its value type isint, and its default value is2.n_redundant: This is the number of redundant functions. This feature generates arbitrary linear combinations of informative features. Its value type isint, and its default value is2.n_repeated: This is the number of repeating functions that derive from information and redundant functions. Its value type isint, and its default value is0.n_classes: This is the number of classes (or labels) for classification problems. Its value type isint, and its default value is2.n_clusters_per_class: This is the number of clusters per class. Its value type isint, and its default value is2.weights: This is the proportion of monsters assigned to each category. Its value type is an array-like shape(n_classes,)or(n_classes - 1,)and its default value isNone.flip_y: This is the proportion of samples randomly assigned to classes. Its value type isfloat, and its default value is0.01.class_sep: This is the factor to multiply the size of the hypercube with. Its value type isfloat, and its default value is1.0.hypercube:This is a boolean value. If it's set toTrue, the clusters are placed on the vertices of the hypercube. If it's set toFalse, the clusters are placed on the vertices of any polyhedron. Its default value isTrue.shift: This shifts the function by the specified value. Its value type isfloat, and its default value is0.0.scale: This multiplies the function by the specified value. Its value type isfloat, and its default value is1.0.shuffle: This shuffles the samples and the features. Its value type isbool, and its default value isTrue.random_state: This controls the generation of random numbers used to create the dataset. Its value type isint, and its default value isNone.
Return values
The function returns the following two values:
X: This shows the input samples in the form of an n-dimensional array of shape(n_samples, n_features).Y: This shows the integer labels for class membership of each sample in the form of an n-dimensional array of shape(n_samples,).
Example
In the code snippet below, we use the make_classification() function.
# import libraryfrom sklearn.datasets import make_classification# create features and targetfeatures, target = make_classification(n_samples=100,n_features=10,n_informative=10,n_redundant=0,n_classes=2,weights=[0.3, 0.7],random_state=42)# print features and targetprint("Features:")print(features[:5])print("Targets:")print(target[:5])
Explanation
- Line 2: We import the
make_classification()function from thesklearnlibrary. - Line 5: We call the
make_classification()function with parameters. - Lines 14–17: We print features and targets.
Free Resources
Copyright ©2026 Educative, Inc. All rights reserved