Search⌘ K
AI Features

The Essential k-NN Algorithm

Explore the essential k-Nearest Neighbors algorithm, including steps to classify data points and how to optimize performance by minimizing sorting overhead with Python's bisect and heapq modules. Understand computational costs and benchmark different approaches to enhance algorithm efficiency.

Overview

We can summarize the kk-NN algorithm as having the following steps:

  1. Create a list of all (distance, training sample) pairs.
  2. Sort these in ascending order.
  3. Pick to the first kk, which will be the k nearest neighbors.
  4. Chose the mode (the highest frequency) label for the kk nearest neighbors.

The implementation would look like this:

Python 3.10.4
class Measured(NamedTuple):
distance: float
sample: TrainingKnownSample
def k_nn_1(
k: int, dist: DistanceFunc, training_data: TrainingList,
unknown: AnySample) -> str:
distances = sorted(
map(
lambda t: Measured(dist(t, unknown), t), training_data) )
k_nearest = distances[:k]
k_frequencies: Counter[str] = collections.Counter(
s.sample.sample.species for s in k_nearest
)
mode, fq = k_frequencies.most_common(1)[0]
return mode

While clear, this does accumulate a large number of distance values in the distances list object, when only kk ...