Trusted answers to developer questions

What is the k-nearest neighbor algorithm?

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

The k-nearest neighbors (KNN) algorithm is a supervised machine learning algorithm.

KNN assumes that similar things exist in close proximity. In data science, it implies that similar data points are close to each other. KNN uses similarity to calculate the distance between points on a graph.

How it works

  • The algorithm calculates the distance of a new data point to all other training data points. The distance can be of any type, e.g., Euclidean, Manhattan, etc.
  • The algorithm then sorts the calculated Euclidean distances in ascending order.
  • Next, the algorithm selects the k-nearest data pointsthe first k points, where k can be any integer. The selection based on the proximity to other data points regardless of what feature the numerical values represent.
  • Finally, the algorithm assigns the data point to the class where similar data points lie.
Pick a value for k (e.g., 5).
1 of 3

Advantages

  1. Very easy to implement.

  2. This algorithm can be used for both classification and regression.

  3. Since data is not previously assumed, it is very useful in cases of nonlinear data.

  4. The algorithm ensures relatively high accuracy.

Disadvantages

  1. It is a bit more expensive as it stores the entire training data.

  2. High memory storage requirements for this algorithm.

  3. Higher sets of values may lead to inaccurate predictions.

  4. Highly sensitive to the scale of the data.

Where is KNN used?

The following are some of the areas in which KNN can be applied successfully:

  1. KNN is often used in banking systems to identify if an individual or organization is fit for a grant or a loan based on key characteristics.

  2. KNN can be used in Speech Recognition, Handwriting Detection, Image Recognition, and Video Recognition.

  3. A potential voter can be classified into categories based on characteristics (like “voter” or “non-voter”) for elections.

For implementation in Python, click here.

RELATED TAGS

machine learning

CONTRIBUTOR

Sarvech Qadir
Copyright ©2024 Educative, Inc. All rights reserved
Did you find this helpful?