K-Means Clustering: A Complete Solved Numerical Example
Scenario: Mobile App User Segmentation
The Objective: Segment mobile app users into distinct behavioral groups based on daily session count and average session duration.
Step 1: The Dataset & Initial Centroids
K-Means is an unsupervised learning algorithm, meaning there is no "Target" class to predict. Instead, it groups data points into K distinct clusters.
| Data Point | Daily_Sessions | Avg_Session_Duration_mins |
|---|---|---|
| P1 | 1 | 5 |
| P2 | 2 | 6 |
| P3 | 1 | 4 |
| P4 | 8 | 20 |
| P5 | 9 | 22 |
| P6 | 8 | 19 |
| P7 | 5 | 12 |
| P8 | 9 | 25 |
Step 2: The Iterative Process
The algorithm alternates between two steps until the clusters stop changing: Assigning points to the nearest centroid, and recalculating the centroid to be the geometric center of its new cluster.
Iteration 1
A. Calculate Euclidean Distances & Assign Clusters
| Point | Coordinates | Dist to C1 | Dist to C2 | Assigned To |
|---|---|---|---|---|
| P1 | [1, 5] | 0.00 | 16.55 | C1 |
| P2 | [2, 6] | 1.41 | 15.23 | C1 |
| P3 | [1, 4] | 1.00 | 17.46 | C1 |
| P4 | [8, 20] | 16.55 | 0.00 | C2 |
| P5 | [9, 22] | 18.79 | 2.24 | C2 |
| P6 | [8, 19] | 15.65 | 1.00 | C2 |
| P7 | [5, 12] | 8.06 | 8.54 | C1 |
| P8 | [9, 25] | 21.54 | 5.10 | C2 |
B. Calculate New Centroids (Means)
Iteration 2
A. Calculate Euclidean Distances & Assign Clusters
| Point | Coordinates | Dist to C1 | Dist to C2 | Assigned To |
|---|---|---|---|---|
| P1 | [1, 5] | 2.15 | 18.12 | C1 |
| P2 | [2, 6] | 0.79 | 16.81 | C1 |
| P3 | [1, 4] | 3.02 | 19.04 | C1 |
| P4 | [8, 20] | 14.44 | 1.58 | C2 |
| P5 | [9, 22] | 16.68 | 0.71 | C2 |
| P6 | [8, 19] | 13.53 | 2.55 | C2 |
| P7 | [5, 12] | 5.93 | 10.12 | C1 |
| P8 | [9, 25] | 19.46 | 3.54 | C2 |
B. Calculate New Centroids (Means)
Step 3: Convergence & Final Result
The algorithm stops when the cluster assignments no longer change between iterations.
The algorithm successfully converged after 2 iterations.
Final Cluster 1
Centroid: [2.25, 6.75]
Points: P1, P2, P3, P7
Final Cluster 2
Centroid: [8.50, 21.50]
Points: P4, P5, P6, P8
Final Takeaway
By observing the final clusters, we can easily group these users into specific cohorts (e.g., "Power Users" vs "Casual Users") to drive targeted business decisions.