Univariate Gaussian Mixture Models

Mixture Models in R

Victor Medina

Researcher at The University of Edinburgh

gender %>% head()
  Gender   Height   Weight        BMI
1   Male 73.84702 241.8936 0.04435662
2   Male 68.78190 162.3105 0.03430822
3   Male 74.11011 212.7409 0.03873433
4   Male 71.73098 220.0425 0.04276545
5   Male 69.88180 206.3498 0.04225479
6   Male 67.25302 152.2122 0.03365316
gender %>% select(-Gender) %>% head()
    Height   Weight        BMI
1 73.84702 241.8936 0.04435662
2 68.78190 162.3105 0.03430822
3 74.11011 212.7409 0.03873433
4 71.73098 220.0425 0.04276545
5 69.88180 206.3498 0.04225479
6 67.25302 152.2122 0.03365316
Mixture Models in R

Modeling with Mixture Models

  1. Which is the suitable probability distribution?
  2. How many sub-populations should we consider?
  3. Which are the parameters and their estimations?
Mixture Models in R
head(gender %>% select(-Gender))
    Height   Weight        BMI
1 73.84702 241.8936 0.04435662
2 68.78190 162.3105 0.03430822
3 74.11011 212.7409 0.03873433
4 71.73098 220.0425 0.04276545
5 69.88180 206.3498 0.04225479
6 67.25302 152.2122 0.03365316
head(gender %>% select(Weight))
    Weight
1 241.8936
2 162.3105
3 212.7409
4 220.0425
5 206.3498
6 152.2122
Mixture Models in R
gender %>% 
  ggplot(aes(x = Weight)) + geom_histogram(bins = 100)

Mixture Models in R

Which distribution?

Histogram

Gaussian distributions

Mixture Models in R

How many clusters?

Mixture Models in R

Which parameters and how to estimate them?

Which parameters?

  • Two means
  • Two standard deviations
  • Two proportions

How to estimate them?

  • EM algorithm implemented in flexmix

Mixture Models in R

Let's practice!

Mixture Models in R

Preparing Video For Download...