Data preprocessing

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Advantages of k-means clustering

  • One of the most popular unsupervised learning method
  • Simple and fast
  • Works well*

* with certain assumptions about the data

Customer Segmentation in Python

Key k-means assumptions

  • Symmetric distribution of variables (not skewed)
  • Variables with same average values
  • Variables with same variance
Customer Segmentation in Python

Skewed variables

 

  • Left-skewed

 

  • Right-skewed

Customer Segmentation in Python

Skewed variables

  • Skew removed with logarithmic transformation

Customer Segmentation in Python

Variables on the same scale

  • K-means assumes equal mean
  • And equal variance
  • It's not the case with RFM data
datamart_rfm.describe()

Customer Segmentation in Python

Let's review the concepts

Customer Segmentation in Python

Preparing Video For Download...