Introduction to the case study

Unsupervised Learning in R

Hank Roark

Senior Data Scientist at Boeing

Objectives

  • Complete analysis using unsupervised learning
  • Reinforce what you've already learned
  • Add steps not covered before (e.g., preparing data, selecting good features for supervised learning)
  • Emphasize creativity
Unsupervised Learning in R

Example use case

  • Human breast mass data:
    • Ten features measured of each cell nuclei
    • Summary information is provided for each group of cells
    • Includes diagnosis: benign (not cancerous) and malignant (cancerous)
1 Source: K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets"
Unsupervised Learning in R

Analysis

  • Download data and prepare data for modeling
  • Exploratory data analysis (# observations, # features, etc.)
  • Perform PCA and interpret results
  • Complete two types of clustering
  • Understand and compare the two types
  • Combine PCA and clustering
Unsupervised Learning in R

Review: PCA in R

pr.iris <- prcomp(x = iris[-5],
                  scale = FALSE,
                  center = TRUE)
summary(pr.iris)
Importance of components:
                          PC1     PC2    PC3     PC4
Standard deviation     2.0563 0.49262 0.2797 0.15439
Proportion of Variance 0.9246 0.05307 0.0171 0.00521
Cumulative Proportion  0.9246 0.97769 0.9948 1.00000
Unsupervised Learning in R

Unsupervised learning is open-ended

  • Steps in this use case are only one example of what can be done
  • There are other approaches to analyzing this dataset
Unsupervised Learning in R

Let's practice!

Unsupervised Learning in R

Preparing Video For Download...