Introduction to k-means clustering

Unsupervised Learning in R

Hank Roark

Senior Data Scientist at Boeing

k-means clustering algorithm

  • First of two clustering algorithms covered in this course
  • Breaks observations into pre-defined number of clusters

two groups

Unsupervised Learning in R

k-means in R

# k-means algorithm with 5 centers, run 20 times
kmeans(x, centers = 5, nstart = 20)
  • One observation per row, one feature per column
  • k-means has a random component
  • Run algorithm multiple times to improve odds of the best model
Unsupervised Learning in R

First exercises

  • First exercise uses synthetic data
  • Synthetic data generated from 3 subgroups
  • Selecting the best number of subgroups for k-means
  • Example with more fun data later in the chapter
Unsupervised Learning in R

Let's practice!

Unsupervised Learning in R

Preparing Video For Download...