PCA review and next steps

Unsupervised Learning in R

Hank Roark

Senior Data Scientist at Boeing

Review thus far

  • Downloaded data and prepared it for modeling
  • Exploratory data analysis
  • Performed principal component analysis
Unsupervised Learning in R

Next steps

  • Complete hierarchical clustering
  • Complete k-means clustering
  • Combine PCA and clustering
  • Contrast results of hierarchical clustering with diagnosis
  • Compare hierarchical and k-means clustering results
  • PCA as a pre-processing step for clustering
Unsupervised Learning in R

Review: hierarchical clustering in R

# Calculates similarity as Euclidean distance between observations
s <- dist(x)

# Returns hierarchical clustering model
hclust(s)
Call:
hclust(d = s)

Cluster method   : complete 
Distance         : euclidean 
Number of objects: 50 
Unsupervised Learning in R

Review: k-means in R

$$

# k-means algorithm with 5 centers, run 20 times
kmeans(x, centers = 5, nstart = 20)

$$

  • One observation per row, one feature per column
  • k-means has a random component
  • Run algorithm multiple times to improve odds of the best model
Unsupervised Learning in R

Let's practice!

Unsupervised Learning in R

Preparing Video For Download...