Limitations of k-means clustering

Cluster Analysis in Python

Shaumik Daityari

Business Analyst

Limitations of k-means clustering

  • How to find the right _K_ (number of clusters)?
  • Impact of seeds
  • Biased towards equal sized clusters
Cluster Analysis in Python

Impact of seeds

Initialize a random seed

from numpy import random
random.seed(12)

Seed: np.array(1000, 2000)

Cluster sizes: 29, 29, 43, 47, 52

 

Seed: np.array(1,2,3)

Cluster sizes: 26, 31, 40, 50, 53

Cluster Analysis in Python

Impact of seeds: plots

Seed: np.array(1000, 2000)

Seed: np.array(1,2,3)

Cluster Analysis in Python

Uniform clusters in k means

Cluster Analysis in Python

Uniform clusters in k-means: a comparison

K-means clustering with 3 clusters

Hierarchical clustering with 3 clusters

Cluster Analysis in Python

Final thoughts

  • Each technique has its pros and cons
  • Consider your data size and patterns before deciding on algorithm
  • Clustering is exploratory phase of analysis
Cluster Analysis in Python

Next up: exercises

Cluster Analysis in Python

Preparing Video For Download...