Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

sklearn
python

How to create data for n-class problems using Scikit-learn

Arslan Tariq

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Overview

In Scikit-learn, the sklearn.datasets.make_classification() function generates the data for an n-class classification problem. Let's take a closer look at the syntax, parameters, and return values of the function.

Syntax

Here is the syntax of the function:

sklearn.datasets.make_classification(n_samples=100,
n_features=20,
n_informative=2,
n_redundant=2,
n_repeated=0,
n_classes=2,
n_clusters_per_class=2,
weights=None,
flip_y=0.01,
class_sep=1.0,
hypercube=True,
shift=0.0,
scale=1.0,
shuffle=True,
random_state=None)
Syntax of the function

Parameters

  • n_samples: This is the number of samples, and its value type is an int. The default value is 100.
  • n_features: This is the total number of functions. Its value type is int, and its default value is 20.
  • n_informative: This is the number of informative features. Its value type is int, and its default value is 2.
  • n_redundant: This is the number of redundant functions. This feature generates arbitrary linear combinations of informative features. Its value type is int, and its default value is 2.
  • n_repeated: This is the number of repeating functions that derive from information and redundant functions. Its value type is int, and its default value is 0.
  • n_classes: This is the number of classes (or labels) for classification problems. Its value type is int, and its default value is 2.
  • n_clusters_per_class: This is the number of clusters per class. Its value type is int, and its default value is 2.
  • weights: This is the proportion of monsters assigned to each category. Its value type is an array-like shape (n_classes,) or (n_classes - 1,) and its default value is None.
  • flip_y: This is the proportion of samples randomly assigned to classes. Its value type is float, and its default value is 0.01.
  • class_sep: This is the factor to multiply the size of the hypercube with. Its value type is float, and its default value is 1.0.
  • hypercube: This is a boolean value. If it's set to True, the clusters are placed on the vertices of the hypercube. If it's set to False, the clusters are placed on the vertices of any polyhedron. Its default value is True.
  • shift: This shifts the function by the specified value. Its value type is float, and its default value is 0.0.
  • scale: This multiplies the function by the specified value. Its value type is float, and its default value is 1.0.
  • shuffle: This shuffles the samples and the features. Its value type is bool, and its default value is True.
  • random_state: This controls the generation of random numbers used to create the dataset. Its value type is int, and its default value is None.

Return values

The function returns the following two values:

  • X: This shows the input samples in the form of an n-dimensional array of shape (n_samples, n_features).
  • Y: This shows the integer labels for class membership of each sample in the form of an n-dimensional array of shape (n_samples,).

Example

In the code snippet below, we use the make_classification() function.

# import library
from sklearn.datasets import make_classification
# create features and target
features, target = make_classification(n_samples=100,
n_features=10,
n_informative=10,
n_redundant=0,
n_classes=2,
weights=[0.3, 0.7],
random_state=42)
# print features and target
print("Features:")
print(features[:5])
print("Targets:")
print(target[:5])
Create data for n-class problems using sklearn

Explanation

  • Line 2: We import the make_classification() function from the sklearn library.
  • Line 5: We call the make_classification() function with parameters.
  • Lines 14–17: We print features and targets.

RELATED TAGS

sklearn
python

CONTRIBUTOR

Arslan Tariq
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring