Introduction to hierarchical clustering

Unsupervised Learning in R

Hank Roark

Senior Data Scientist at Boeing

Hierarchical clustering

  • Number of clusters is not known ahead of time
  • Two kinds: bottom-up and top-down, this course bottom-up
Unsupervised Learning in R

Simple example

five points

Unsupervised Learning in R

Five clusters

each point is a cluster

Unsupervised Learning in R

Four clusters

four clusters, one cluster with two points

Unsupervised Learning in R

Three clusters

three clusters, two clusters on two points and one of one point

Unsupervised Learning in R

Two clusters

two clusters, one cluster of 3 points and one cluster of 2 points

Unsupervised Learning in R

One cluster

one cluster grouping all points

Unsupervised Learning in R

Hierarchical clustering in R

# Calculates similarity as Euclidean distance 
# between observations
dist_matrix <- dist(x)

# Returns hierarchical clustering model hclust(d = dist_matrix)
Call:
hclust(d = s)

Cluster method   : complete 
Distance         : euclidean 
Number of objects: 50
Unsupervised Learning in R

Let's practice!

Unsupervised Learning in R

Preparing Video For Download...