Introduction to model-based clustering

Mixture Models in R

Victor Medina

Researcher at The University of Edinburgh

What is clustering?

The procedure of partitioning a set of observations into a set of meaningful subclasses

 $\to$ Help to explore and understand the natural structure in a dataset

Mixture Models in R

Applications of clustering

  • Medicine
    • Ex. In medical imaging to distinguish between different types of tissue
  • Business
    • Ex. To discover distinctive groups of customers to develop targeted marketing programs
  • Social Sciences
    • Ex. To identify zones in a city by the type of committed crimes to manage law enforcement resources more effectively
Mixture Models in R

Clustering methods

  • Partitioning techniques
    • Find centers of clusters among the observations and each one is assigned to the cluster that has the closest center. Ex. Kmeans
  • Hierarchical techniques
    • Connect the observations based on their similarity to form clusters. Ex. Hierarchical clustering
  • Model-base methods
    • Use probabilistic distributions to create the clusters. Ex. Mixture models
Mixture Models in R

Gender dataset

gender <- read.csv("gender.csv")
head(gender)
    Height   Weight      BMI
1 73.84702 241.8936 31.18576
2 68.78190 162.3105 24.12104
3 74.11011 212.7409 27.23291
4 71.73098 220.0425 30.06706
5 69.88180 206.3498 29.70803
6 67.25302 152.2122 23.66049
Mixture Models in R

Gender dataset: Can you guess the gender?

library(ggplot2)
ggplot(gender, aes(x = Weight, y = BMI)) + geom_points()

Mixture Models in R

Gender dataset: Can you guess the gender?

Mixture Models in R

Under traditional cluster approaches

Mixture Models in R

Model-based clustering

Mixture Models in R

Model-based clustering

Mixture Models in R

Let's practice!

Mixture Models in R

Preparing Video For Download...