What is KNNImputer in scikit-learn?
The KNNImputer belongs to the scikit-learn module in Python.
Scikit-learnis generally used for machine learning.
The KNNImputer is used to fill in missing values in a dataset using the k-Nearest Neighbors method.
k-Nearest Neighbors algorithm is used for classification and prediction problems.
The KNNImputer predicts the value of a missing value by observing trends in related columns. It then chooses the best fit value based on the k-Nearest Neighbors algorithm.
The illustration below show how KNNImputer works in scikit-learn:
Definition
The KNNImputer class is defined as follows:
class sklearn.impute.KNNImputer(*, missing_values=nan, n_neighbors=5, weights='uniform', metric='nan_euclidean', copy=True, add_indicator=False)
Parameters
The KNNImputer class takes in the following parameters:
| Parameters | Purpose |
|---|---|
missing_values |
All instances of missing_values will be imputed. Values include int, float, str, np.nan or None. By default: np.nan |
n_neighbors |
Number of neighbors used for prediction. By default: 5 |
weights |
Weight function used for prediction. Vales include uniform, distance, or callable. By default: uniform. |
metric |
Distance metric for searching neighbors. Used in k-nearest neighbors algorithm. Value include nan_euclidean or callable. By default: nan_euclidean |
copy |
Takes in a bool value. If True, a copy of the data will be created. If False, imputation will be done in-place. By default: True |
add_indicator |
Takes in a bool value. If True, a MissingIndicator transform will stack onto the output of the imputer’s transform. By default: False |
Method
The KNNImputer class has several methods:
| Method | Purpose |
|---|---|
fit(X) |
Fit the imputer on X. |
fit_transform(X) |
Fit to data, then transform it. |
get_param() |
Get parameters for this estimator. |
set_params(**params) |
Set parameters for the estimator |
transform(X) |
Impute all missing values of X |
Simple imputation can work using the fit_transform method only.
Example
The following example shows how we can use the KNNImputer in scikit-learn:
import numpy as np # Importing numpy to create an arrayfrom sklearn.impute import KNNImputer# Creating array with missing valuesX = [[1, 2, np.nan], [3, 6, 12], [np.nan, 12, 24], [2, 4, 16]]print("Original array: ", X)imputer = KNNImputer(n_neighbors=2) # Creating a KNNImputerarray = imputer.fit_transform(X) # Imputing dataprint("Updated array: ", array)
Free Resources