Clustering

Understanding Data Science

Lis Sulmont

Curriculum Manager, DataCamp

What is clustering?

clustering.jpg

  • Divide data into categories
  • Use cases
    • Customer segmentation
    • Image segmentation
    • Anomaly detection
Understanding Data Science

Supervised Machine Learning

 

supervised-learning.jpg

Unsupervised Machine Learning

 

unsupervised-learning.jpg

Understanding Data Science

Case study: discovering new species

$$ Two flowers

Understanding Data Science

Defining features

  • Flower colors
  • Petal length and width
  • Sepal length and width
  • Number of petals

Flowers with features outlined

Understanding Data Science

Defining number of clusters

Flower observation data

Understanding Data Science

Comparing number of clusters

Two clusters:

Two clusters

Three clusters:

Three clusters

Understanding Data Science

Comparing number of clusters

Four clusters:

Four clusters

Eight clusters:

Eight clusters

Understanding Data Science

Comparing number of clusters

  • Up to you to decide on final number of clusters
  • Use domain knowledge to help decide
Understanding Data Science

Clustering review

Definition

  • Divide unlabeled dataset into different categories

Steps

  • Select features
  • Select number of clusters
  • Use clusters to solve problems
Understanding Data Science

Let's practice!

Understanding Data Science

Preparing Video For Download...