Search⌘ K
AI Features

Discretizing

Explore the process of discretizing continuous features with scikit-learn to create categorical bins. Learn to apply KBinsDiscretizer for uniform-width intervals and QuantileTransformer for equal-sized quantile bins. Understand how to choose the appropriate method based on data distribution to improve interpretation and computational efficiency.

Discretizing features refers to the process of converting continuous numerical features into categorical features by dividing the range of the feature into intervals, called bins. It can be useful for transforming continuous features into a form that can be visualized and interpreted more easily.

Illustration of a feature being discretized into bins
Illustration of a feature being discretized into bins

In addition to potentially helping with interpretation, this technique can be used to reduce the memory and computational requirements of models, especially in resource-constrained environments, such as mobile devices or embedded systems.

The scikit-learn methods for discretizing features include KBinsDiscretizer and QuantileTransformer.

The KBinsDiscretizer method

The KBinsDiscretizer method discretizes continuous features into a specified number of bins. The following code demonstrates how to use the KBinsDiscretizer ...

Python 3.8
import numpy as np
from sklearn.preprocessing import KBinsDiscretizer
# Define the numerical variables
X = np.array(
[[1.9, 2.8, 6],
[4.7, 5.6, 8],
[0.1, 2.8, 12],
[0.4, 8.2, 99]]
)
# Create the KBinsDiscretizer object
discretizer = KBinsDiscretizer(n_bins=3, encode='ordinal')
# Transform the numerical variables
X_discretized = discretizer.fit_transform(X)
# Print the original variables and the resulting discretized variables
print("Original: \n",X)
print("Discretized: \n",X_discretized.round(2))
  • ...