Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python
knnimputer

What is KNNImputer in scikit-learn?

Hassaan Waqar

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The KNNImputer belongs to the scikit-learn module in Python.

Scikit-learn is generally used for machine learning.

The KNNImputer is used to fill in missing values in a dataset using the k-Nearest Neighbors method.

k-Nearest Neighbors algorithm is used for classification and prediction problems.

The KNNImputer predicts the value of a missing value by observing trends in related columns. It then chooses the best fit value based on the k-Nearest Neighbors algorithm.

The illustration below show how KNNImputer works in scikit-learn:

How does a KNNImputer work?

Definition

The KNNImputer class is defined as follows:

class sklearn.impute.KNNImputer(*, missing_values=nan, n_neighbors=5, weights='uniform', metric='nan_euclidean', copy=True, add_indicator=False)

Parameters

The KNNImputer class takes in the following parameters:

Parameters Purpose
missing_values All instances of missing_values will be imputed. Values include int, float, str, np.nan or None. By default: np.nan
n_neighbors Number of neighbors used for prediction. By default: 5
weights Weight function used for prediction. Vales include uniform, distance, or callable. By default: uniform.
metric Distance metric for searching neighbors. Used in k-nearest neighbors algorithm. Value include nan_euclidean or callable. By default: nan_euclidean
copy Takes in a bool value. If True, a copy of the data will be created. If False, imputation will be done in-place. By default: True
add_indicator Takes in a bool value. If True, a MissingIndicator transform will stack onto the output of the imputer’s transform. By default: False

Method

The KNNImputer class has several methods:

Method Purpose
fit(X) Fit the imputer on X.
fit_transform(X) Fit to data, then transform it.
get_param() Get parameters for this estimator.
set_params(**params) Set parameters for the estimator
transform(X) Impute all missing values of X

Simple imputation can work using the fit_transform method only.

Example

The following example shows how we can use the KNNImputer in scikit-learn:

import numpy as np # Importing numpy to create an array
from sklearn.impute import KNNImputer
# Creating array with missing values
X = [[1, 2, np.nan], [3, 6, 12], [np.nan, 12, 24], [2, 4, 16]]
print("Original array: ", X)
imputer = KNNImputer(n_neighbors=2) # Creating a KNNImputer
array = imputer.fit_transform(X) # Imputing data
print("Updated array: ", array)

RELATED TAGS

python
knnimputer

CONTRIBUTOR

Hassaan Waqar
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring