Visualizing hierarchies

Unsupervised Learning in Python

Benjamin Wilson

Director of Research at lateral.io

Visualizations communicate insight

  • "t-SNE" : Creates a 2D map of a dataset (later)
  • "Hierarchical clustering" (this video)
Unsupervised Learning in Python

A hierarchy of groups

  • Groups of living things can form a hierarchy
  • Clusters are contained in one another

 

Hierarchical tree of animals

Unsupervised Learning in Python

Eurovision scoring dataset

  • Countries gave scores to songs performed at the Eurovision 2016
  • 2D array of scores
  • Rows are countries, columns are songs

 

Eurovision data

1 https://www.eurovision.tv/page/results
Unsupervised Learning in Python

Hierarchical clustering of voting countries

Eurovision hierarchical clustering

Unsupervised Learning in Python

Hierarchical clustering

  • Every country begins in a separate cluster
  • At each step, the two closest clusters are merged
  • Continue until all countries in a single cluster
  • This is "agglomerative" hierarchical clustering
Unsupervised Learning in Python

The dendrogram of a hierarchical clustering

  • Read from the bottom up
  • Vertical lines represent clusters

Eurovision hierarchical clustering

Unsupervised Learning in Python

The dendrogram of a hierarchical clustering

  • Read from the bottom up
  • Vertical lines represent clusters

One cluster of Eurovision hierarchical clustering

Unsupervised Learning in Python

Dendrograms, step-by-step

One cluster of Eurovision hierarchical clustering

Unsupervised Learning in Python

Dendrograms, step-by-step

One cluster of Eurovision hierarchical clustering with Greece/Cyprus cluster highlighted

Unsupervised Learning in Python

Dendrograms, step-by-step

One cluster of Eurovision hierarchical clustering with Bulgaria/Greece/Cyprus cluster highlighted

Unsupervised Learning in Python

Dendrograms, step-by-step

One cluster of Eurovision hierarchical clustering with Moldova/Russia cluster highlighted

Unsupervised Learning in Python

Dendrograms, step-by-step

One cluster of Eurovision hierarchical clustering with Moldova/Russia/Armenia cluster highlighted

Unsupervised Learning in Python

Dendrograms, step-by-step

Merging of Greece/Cyprus/Bulgaria cluster with Moldova/Russia?Armenia

Unsupervised Learning in Python

Dendrograms, step-by-step

Eurovision hierarchical clustering

Unsupervised Learning in Python

Hierarchical clustering with SciPy

  • Given samples (the array of scores), and country_names
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram

mergings = linkage(samples, method='complete')
dendrogram(mergings, labels=country_names, leaf_rotation=90, leaf_font_size=6) plt.show()
Unsupervised Learning in Python

Let's practice!

Unsupervised Learning in Python

Preparing Video For Download...