Selecting number of clusters

Unsupervised Learning in R

Hank Roark

Senior Data Scientist at Boeing

Interpreting results

# Create hierarchical cluster model: hclust.out
hclust.out <- hclust(dist(x))

# Inspect the result summary(hclust.out)
            Length Class  Mode     
merge       98     -none- numeric  
height      49     -none- numeric  
order       50     -none- numeric  
labels       0     -none- NULL     
method       1     -none- character
call         2     -none- call     
dist.method  1     -none- character
Unsupervised Learning in R

Dendrogram

  • Tree shaped structure used to interpret hierarchical clustering models

five observations plotted and a dendrogram

Unsupervised Learning in R

Dendrogram

  • Tree shaped structure used to interpret hierarchical clustering models

two cluster points are joined in the dendrogram

Unsupervised Learning in R

Dendrogram

  • Tree shaped structure used to interpret hierarchical clustering models

two other cluster points are joined in the dendrogram

Unsupervised Learning in R

Dendrogram

  • Tree shaped structure used to interpret hierarchical clustering models

two clusters and a point are joined in the dendrogram

Unsupervised Learning in R

Dendrogram

  • Tree shaped structure used to interpret hierarchical clustering models

all clusters and points are joined together in the dendrogram

Unsupervised Learning in R

Dendrogram plotting in R

# Draws a dendrogram
plot(hclust.out)

abline(h = 6, col = "red")

horizontal bar showing how many clusters we want on the dendrogram

Unsupervised Learning in R

Tree "cutting" in R

# Cut by height h
cutree(hclust.out, h = 6)
1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3
3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 2 4 2 4 4
# Cut by number of clusters k
cutree(hclust.out, k = 2)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1
Unsupervised Learning in R

Let's practice!

Unsupervised Learning in R

Preparing Video For Download...