Your cart is currently empty!
K-Means is one of the most widely used algorithms in unsupervised learning. It’s fast, intuitive, and works well in many practical scenarios. But its simplicity can sometimes hide important nuances.
In this post, we’ll explore:
- What K-Means actually does
- The mathematical foundations
- How to build a solid mind model
- Its strengths, weaknesses, and ideal use cases
🧠 The Mental Model
Think of K-Means as a game of gravity:
- You drop \(k\) invisible magnets (centroids) into the data space
- Every point is attracted to the closest magnet
- The magnets move to the center of mass of their assigned points
- Repeat until the system stabilizes (i.e., minimal movement)
K-Means is essentially clustering by proximity to centroids — iteratively adjusting those centroids until the data points around them are stable.
Core Parameters
1. k: Number of clusters
You must specify the number of clusters before the algorithm runs.
There’s no “perfect” \(k\), so we typically use methods like:
- Elbow Method
- Silhouette Score
- Gap Statistic
2. Distance Metric
Most implementations use Euclidean distance, though others are possible.
$$
\text{dist}(x, c) = \sqrt{\sum_{i=1}^{n} (x_i – c_i)^2}
$$
The K-Means Algorithm (Step by Step)
- Initialize \(k\) centroids
- Randomly select \(k\) points as starting centroids (or use K-Means++)
- Assign each point to the nearest centroid
- Creates temporary clusters
- Update centroids
- Move each centroid to the mean of all points assigned to it
- Repeat steps 2–3 until:
- Centroids do not move significantly
- Or a max number of iterations is reached
Mathematically Speaking
K-Means attempts to minimize the within-cluster sum of squares (WCSS):
$$
\text{WCSS} = \sum_{i=1}^{k} \sum_{x \in C_i} | x – \mu_i |^2
$$
Where:
- \( C_i \): set of points in cluster \(i\)
- \( \mu_i \): centroid of cluster \(i\)
This is not a guaranteed global optimum, because K-Means can get stuck in local minima — hence, it’s common to run it multiple times with different initializations.
Properties to Know
Property | Value |
---|---|
Type | Centroid-based |
Input | Numeric, continuous features only |
Scalability | Very high (efficient even for large datasets) |
Output | Cluster assignments + centroid positions |
Complexity | \( O(n \cdot k \cdot i \cdot d) \) |
Deterministic? | ❌ No (unless using K-Means++ plus fixed seed) |
Choosing \(k\): The Elbow Method
Plot WCSS as a function of \(k\):
- As \(k\) increases, WCSS decreases
- Look for the “elbow” — the point where WCSS flattens
- That’s often a good balance between under- and overfitting
Use Cases
- Customer segmentation
- Image compression (e.g., color quantization)
- Document clustering (on TF-IDF or embedding vectors)
- Market basket analysis
Limitations
- Requires \(k\) to be known in advance
- Assumes clusters are spherical and equally sized
- Not great with non-convex shapes or varying densities
- Sensitive to outliers
Final Thoughts
K-Means is fast, elegant, and surprisingly powerful for a wide range of tasks.
But its simplicity means it assumes a lot — especially about the shape and size of clusters.
A good mental model for K-Means is this:
“It clusters by distance to a moving average — and it gets confused by anything nonlinear, non-uniform, or noisy.”