Trusted answers to developer questions

Hassaan Waqar

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The ** KNNImputer** belongs to the

`scikit-learn`

module in Python.

`Scikit-learn`

is generally used for machine learning.

The `KNNImputer`

is used to fill in missing values in a dataset using the k-Nearest Neighbors method.

k-Nearest Neighbors algorithm is used for classification and prediction problems.

The `KNNImputer`

predicts the value of a missing value by observing trends in related columns. It then chooses the best fit value based on the k-Nearest Neighbors algorithm.

The illustration below show how `KNNImputer`

works in `scikit-learn`

:

The `KNNImputer`

class is defined as follows:

```
class sklearn.impute.KNNImputer(*, missing_values=nan, n_neighbors=5, weights='uniform', metric='nan_euclidean', copy=True, add_indicator=False)
```

The `KNNImputer`

class takes in the following parameters:

Parameters | Purpose |
---|---|

`missing_values` |
All instances of `missing_values` will be imputed. Values include `int` , `float` , `str` , `np.nan` or `None` . By default: `np.nan` |

`n_neighbors` |
Number of neighbors used for prediction. By default: 5 |

`weights` |
Weight function used for prediction. Vales include `uniform` , `distance` , or `callable` . By default: `uniform` . |

`metric` |
Distance metric for searching neighbors. Used in k-nearest neighbors algorithm. Value include `nan_euclidean` or `callable` . By default: `nan_euclidean` |

`copy` |
Takes in a `bool` value. If True, a copy of the data will be created. If False, imputation will be done in-place. By default: True |

`add_indicator` |
Takes in a `bool` value. If True, a MissingIndicator transform will stack onto the output of the imputer’s transform. By default: False |

The `KNNImputer`

class has several methods:

Method | Purpose |
---|---|

`fit(X)` |
Fit the imputer on X. |

`fit_transform(X)` |
Fit to data, then transform it. |

`get_param()` |
Get parameters for this estimator. |

`set_params(**params)` |
Set parameters for the estimator |

`transform(X)` |
Impute all missing values of X |

Simple imputation can work using the `fit_transform`

method only.

The following example shows how we can use the `KNNImputer`

in scikit-learn:

import numpy as np # Importing numpy to create an arrayfrom sklearn.impute import KNNImputer# Creating array with missing valuesX = [[1, 2, np.nan], [3, 6, 12], [np.nan, 12, 24], [2, 4, 16]]print("Original array: ", X)imputer = KNNImputer(n_neighbors=2) # Creating a KNNImputerarray = imputer.fit_transform(X) # Imputing dataprint("Updated array: ", array)

RELATED TAGS

python

knnimputer

CONTRIBUTOR

Hassaan Waqar

Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring

Related Courses