KNN vs. K-Means: Supervised vs. Unsupervised Explained

Q: Confusing KNN and K-Means because both use 'K' and distance

This is one of the most common mix-ups in intro ML. The key distinction: KNN is supervised (needs labels, predicts classes), K-Means is unsupervised (no labels, finds clusters). Their $K$ values mean completely different things.

Q: Saying K-Means is guaranteed to find the globally optimal clustering

K-Means is only guaranteed to converge to a local minimum of the within-cluster variance objective. Different random initializations can produce entirely different clusterings. This is why running K-Means multiple times and picking the best result is standard practice.

Q: Applying K-Means to non-globular clusters and expecting good results

K-Means minimizes Euclidean distance to centroids, which implicitly assumes clusters are convex and spherical. It will completely fail on ring-shaped or crescent-shaped clusters. DBSCAN or spectral clustering handle these cases.

Q: Thinking KNN can be used for clustering

KNN cannot cluster unlabeled data — it requires that the $K$ neighbors already have labels to vote with. Using KNN without labels produces nothing meaningful. If you see an unlabeled dataset and need groups, that's K-Means territory.

Q: Assuming K-Means training is free because there's 'no model'

Students sometimes think unsupervised means no computation. K-Means absolutely has a training phase — it iteratively assigns points and recomputes centroids. Its training cost $O(n \times K \times d \times i)$ is non-trivial.

Try K-Nearest Solver →Try K-Means Solver →

TL;DR — KNN is a supervised classification algorithm — it needs labeled training data and predicts the class of a new point by majority vote of its $K$ nearest neighbors. K-Means is an unsupervised clustering algorithm — it has no labels and discovers $K$ natural groupings in data by iteratively moving cluster centroids. The shared 'K' is deeply misleading: in KNN, $K$ is the number of neighbors to consult; in K-Means, $K$ is the number of clusters to create.

Feature Comparison

Feature	K-Nearest Neighbors (KNN)	K-Means Clustering
Problem Type	Supervised classification — requires labeled training data	Unsupervised clustering — requires no labels; discovers structure on its own
What 'K' Means	$K$ = number of nearest neighbors to vote on a prediction	$K$ = number of clusters (centroids) to partition the data into
Output	A class label for each new query point	A cluster assignment (integer ID) for every point in the dataset
Core Math	Euclidean distance to find neighbors; majority vote among their labels	Minimize within-cluster variance: $\underset{S}{\arg\min} \sum_{i=1}^{K} \sum_{x \in S_i} \\|x - \mu_i\\|^2$
Training Phase	None — stores all training points, nothing is computed	Iterative — alternates between assigning points to the nearest centroid and recomputing centroids until convergence
Requires Labels	Yes — every training point must have a known class label	No — purely data-driven; labels are the output, not the input
Sensitivity to Initial Conditions	None — KNN is deterministic given the same dataset and $K$	High — random centroid initialization can produce very different final clusters; K-Means++ initialization helps
Convergence Guarantee	N/A — no iterative process; prediction is a direct lookup	Guaranteed to converge, but only to a local minimum — the global optimum is not guaranteed
How to Choose K	Cross-validation — try multiple values of $K$ and pick the one with best validation accuracy	Elbow method — plot inertia vs. $K$ and look for the 'elbow' where adding more clusters gives diminishing returns
Handles Non-Globular Shapes	Yes — flexible decision boundaries adapt to any shape	No — K-Means assumes clusters are convex and roughly spherical; fails on ring or crescent shapes

Complexity Showdown

Training Time

K-Nearest:

O(1)

K-Means:

O(n \times K \times d \times i)

KNN does zero computation at training time. K-Means runs $i$ full iterations, each of which assigns all $n$ points to the nearest of $K$ centroids across $d$ features. In practice $i$ is small (often $< 100$ ), but it's never zero.

Prediction Time

K-Nearest:

O(n \times d)

per query

K-Means:

O(K \times d)

per query

Assigning a new point to its cluster in K-Means only requires computing distance to $K$ centroids — typically a tiny number. KNN must compute distance to all $n$ training points. For large $n$ , this gap is enormous.

Space Complexity

K-Nearest:

O(n \times d)

K-Means:

O(K \times d)

KNN stores every training point permanently. K-Means, once trained, only needs to store $K$ centroid vectors — each of dimension $d$ . For large datasets with small $K$ , K-Means uses a fraction of the memory.

When To Use Which?

Use KNN when:

✓You have labeled training data and want to predict the class of new, unseen points.
✓The decision boundary between classes is complex and non-linear — KNN adapts naturally without feature engineering.
✓Your dataset is small enough that $O(n \times d)$ prediction cost is acceptable.
✓You want a simple, assumption-free baseline classifier to benchmark against more complex models.
✓You need a quick sanity check on whether your data is separable before investing in heavier models.

Use K-Means when:

✓You have no labels and want to discover hidden groupings or segments in your data.
✓Your clusters are expected to be roughly spherical and similar in size — K-Means assumes convex geometry.
✓You need to compress or summarize data by replacing each point with its cluster centroid (e.g., vector quantization).
✓You want to do customer segmentation, document grouping, or image color quantization where the number of groups is known in advance.
✓Speed matters — K-Means scales well to large datasets with $O(n \times K \times d \times i)$ per iteration, where $i$ is a small number of iterations.

Common Exam Traps

⚠️

Confusing KNN and K-Means because both use 'K' and distance

This is one of the most common mix-ups in intro ML. The key distinction: KNN is supervised (needs labels, predicts classes), K-Means is unsupervised (no labels, finds clusters). Their $K$ values mean completely different things.

⚠️

Saying K-Means is guaranteed to find the globally optimal clustering

K-Means is only guaranteed to converge to a local minimum of the within-cluster variance objective. Different random initializations can produce entirely different clusterings. This is why running K-Means multiple times and picking the best result is standard practice.

⚠️

Applying K-Means to non-globular clusters and expecting good results

K-Means minimizes Euclidean distance to centroids, which implicitly assumes clusters are convex and spherical. It will completely fail on ring-shaped or crescent-shaped clusters. DBSCAN or spectral clustering handle these cases.

⚠️

Thinking KNN can be used for clustering

KNN cannot cluster unlabeled data — it requires that the $K$ neighbors already have labels to vote with. Using KNN without labels produces nothing meaningful. If you see an unlabeled dataset and need groups, that's K-Means territory.

⚠️

Assuming K-Means training is free because there's 'no model'

Students sometimes think unsupervised means no computation. K-Means absolutely has a training phase — it iteratively assigns points and recomputes centroids. Its training cost $O(n \times K \times d \times i)$ is non-trivial.

Final Verdict

If you have labels and want to predict, use KNN. If you have no labels and want to discover groups, use K-Means. These algorithms are not interchangeable — they solve fundamentally different problems. The shared use of distance and the letter $K$ is a surface-level coincidence that exams love to exploit.

K-Nearest Neighbors (KNN)

Try the K-Nearest Solver →Read K-Nearest Theory Guide

K-Means Clustering

Try the K-Means Solver →Read K-Means Theory Guide