Limitations of hierarchical clustering

Cluster Analysis in Python

Shaumik Daityari

Business Analyst

Measuring speed in hierarchical clustering

  • timeit module
  • Measure the speed of .linkage() method
  • Use randomly generated points
  • Run various iterations to extrapolate
Cluster Analysis in Python

Use of timeit module

from scipy.cluster.hierarchy import linkage
import pandas as pd
import random, timeit

points = 100 df = pd.DataFrame({'x': random.sample(range(0, points), points), 'y': random.sample(range(0, points), points)})
%timeit linkage(df[['x', 'y']], method = 'ward', metric = 'euclidean')
1.02 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Cluster Analysis in Python

Comparison of runtime of linkage method

  • Increasing runtime with data points
  • Quadratic increase of runtime
  • Not feasible for large datasets

Cluster Analysis in Python

Next up - exercises

Cluster Analysis in Python

Preparing Video For Download...