K-Means Clustering: A Complete Solved Numerical Example

Scenario: Mobile App User Segmentation

The Objective: Segment mobile app users into distinct behavioral groups based on daily session count and average session duration.

Step 1: The Dataset & Initial Centroids

K-Means is an unsupervised learning algorithm, meaning there is no "Target" class to predict. Instead, it groups data points into K distinct clusters.

Data PointDaily_SessionsAvg_Session_Duration_mins
P115
P226
P314
P4820
P5922
P6819
P7512
P8925
Starting Centroids (Iteration 0)
C1 (p1): [1, 5]
C2 (p4): [8, 20]

Step 2: The Iterative Process

The algorithm alternates between two steps until the clusters stop changing: Assigning points to the nearest centroid, and recalculating the centroid to be the geometric center of its new cluster.

Iteration 1

Starting:C1 [1.0, 5.0]C2 [8.0, 20.0]
A. Calculate Euclidean Distances & Assign Clusters
Formula: d=(X2X1)2+(Y2Y1)2d = \sqrt{(X_2 - X_1)^2 + (Y_2 - Y_1)^2}
PointCoordinatesDist to C1Dist to C2Assigned To
P1[1, 5]0.0016.55C1
P2[2, 6]1.4115.23C1
P3[1, 4]1.0017.46C1
P4[8, 20]16.550.00C2
P5[9, 22]18.792.24C2
P6[8, 19]15.651.00C2
P7[5, 12]8.068.54C1
P8[9, 25]21.545.10C2
B. Calculate New Centroids (Means)
Formula: Cnew=(x1+x2++xnN,y1+y2++ynN)C_{new} = \left( \dfrac{x_1 + x_2 + \dots + x_n}{N}, \dfrac{y_1 + y_2 + \dots + y_n}{N} \right)
Cluster C14 points
Points:P1, P2, P3, P7
Dim 1:(1 + 2 + 1 + 5) / 4
= 2.25
Dim 2:(5 + 6 + 4 + 12) / 4
= 6.75
Cluster C24 points
Points:P4, P5, P6, P8
Dim 1:(8 + 9 + 8 + 9) / 4
= 8.50
Dim 2:(20 + 22 + 19 + 25) / 4
= 21.50

Iteration 2

Starting:C1 [2.3, 6.8]C2 [8.5, 21.5]
A. Calculate Euclidean Distances & Assign Clusters
Formula: d=(X2X1)2+(Y2Y1)2d = \sqrt{(X_2 - X_1)^2 + (Y_2 - Y_1)^2}
PointCoordinatesDist to C1Dist to C2Assigned To
P1[1, 5]2.1518.12C1
P2[2, 6]0.7916.81C1
P3[1, 4]3.0219.04C1
P4[8, 20]14.441.58C2
P5[9, 22]16.680.71C2
P6[8, 19]13.532.55C2
P7[5, 12]5.9310.12C1
P8[9, 25]19.463.54C2
B. Calculate New Centroids (Means)
Formula: Cnew=(x1+x2++xnN,y1+y2++ynN)C_{new} = \left( \dfrac{x_1 + x_2 + \dots + x_n}{N}, \dfrac{y_1 + y_2 + \dots + y_n}{N} \right)
Cluster C14 points
Points:P1, P2, P3, P7
Dim 1:(1 + 2 + 1 + 5) / 4
= 2.25
Dim 2:(5 + 6 + 4 + 12) / 4
= 6.75
Cluster C24 points
Points:P4, P5, P6, P8
Dim 1:(8 + 9 + 8 + 9) / 4
= 8.50
Dim 2:(20 + 22 + 19 + 25) / 4
= 21.50

Step 3: Convergence & Final Result

The algorithm stops when the cluster assignments no longer change between iterations.

The algorithm successfully converged after 2 iterations.

Final Cluster 1

Centroid: [2.25, 6.75]

Points: P1, P2, P3, P7

Final Cluster 2

Centroid: [8.50, 21.50]

Points: P4, P5, P6, P8

Final Takeaway

By observing the final clusters, we can easily group these users into specific cohorts (e.g., "Power Users" vs "Casual Users") to drive targeted business decisions.