Understanding K-Means: A Mind Model for Centroid-Based Clustering

K-Means is one of the most widely used algorithms in unsupervised learning. It’s fast, intuitive, and works well in many practical scenarios. But its simplicity can sometimes hide important nuances.

In this post, we’ll explore:

What K-Means actually does
The mathematical foundations
How to build a solid mind model
Its strengths, weaknesses, and ideal use cases

🧠 The Mental Model

Think of K-Means as a game of gravity:

You drop $k$ invisible magnets (centroids) into the data space
Every point is attracted to the closest magnet
The magnets move to the center of mass of their assigned points
Repeat until the system stabilizes (i.e., minimal movement)

K-Means is essentially clustering by proximity to centroids — iteratively adjusting those centroids until the data points around them are stable.

Core Parameters

1. k: Number of clusters

You must specify the number of clusters before the algorithm runs.

There’s no “perfect” $k$, so we typically use methods like:

Elbow Method
Silhouette Score
Gap Statistic

2. Distance Metric

Most implementations use Euclidean distance, though others are possible.

$$
\text{dist}(x, c) = \sqrt{\sum_{i=1}^{n} (x_i – c_i)^2}
$$

The K-Means Algorithm (Step by Step)

Initialize $k$ centroids
- Randomly select $k$ points as starting centroids (or use K-Means++)
Assign each point to the nearest centroid
- Creates temporary clusters
Update centroids
- Move each centroid to the mean of all points assigned to it
Repeat steps 2–3 until:
- Centroids do not move significantly
- Or a max number of iterations is reached

Mathematically Speaking

K-Means attempts to minimize the within-cluster sum of squares (WCSS):

$$
\text{WCSS} = \sum_{i=1}^{k} \sum_{x \in C_i} | x – \mu_i |^2
$$

Where:

$ C_i $: set of points in cluster $i$
$ \mu_i $: centroid of cluster $i$

This is not a guaranteed global optimum, because K-Means can get stuck in local minima — hence, it’s common to run it multiple times with different initializations.

Properties to Know

Property	Value
Type	Centroid-based
Input	Numeric, continuous features only
Scalability	Very high (efficient even for large datasets)
Output	Cluster assignments + centroid positions
Complexity	$ O(n \cdot k \cdot i \cdot d) $
Deterministic?	❌ No (unless using K-Means++ plus fixed seed)

Choosing $k$: The Elbow Method

Plot WCSS as a function of $k$:

As $k$ increases, WCSS decreases
Look for the “elbow” — the point where WCSS flattens
That’s often a good balance between under- and overfitting

Use Cases

Customer segmentation
Image compression (e.g., color quantization)
Document clustering (on TF-IDF or embedding vectors)
Market basket analysis

Limitations

Requires $k$ to be known in advance
Assumes clusters are spherical and equally sized
Not great with non-convex shapes or varying densities
Sensitive to outliers

Final Thoughts

K-Means is fast, elegant, and surprisingly powerful for a wide range of tasks.
But its simplicity means it assumes a lot — especially about the shape and size of clusters.

A good mental model for K-Means is this:

“It clusters by distance to a moving average — and it gets confused by anything nonlinear, non-uniform, or noisy.”