Choosing number of clusters

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Methods

  • Visual methods - elbow criterion
  • Mathematical methods - silhouette coefficient
  • Experimentation and interpretation
Customer Segmentation in Python

Elbow criterion method

  • Plot the number of clusters against within-cluster sum-of-squared-errors (SSE) - sum of squared distances from every data point to their cluster center
  • Identify an "elbow" in the plot
  • Elbow - a point representing an "optimal" number of clusters
Customer Segmentation in Python

Elbow criterion method

# Import key libraries
from sklearn.cluster import KMeans
import seaborn as sns
from matplotlib import pyplot as plt

# Fit KMeans and calculate SSE for each *k* sse = {} for k in range(1, 11): kmeans = KMeans(n_clusters=k, random_state=1) kmeans.fit(data_normalized) sse[k] = kmeans.inertia_ # sum of squared distances to closest cluster center
# Plot SSE for each *k* plt.title('The Elbow Method') plt.xlabel('k'); plt.ylabel('SSE') sns.pointplot(x=list(sse.keys()), y=list(sse.values())) plt.show()
Customer Segmentation in Python

Elbow criterion method

The elbow criterion chart:

Customer Segmentation in Python

Elbow criterion method

The elbow criterion chart:

Customer Segmentation in Python

Using elbow criterion method

  • Best to choose the point on elbow, or the next point
  • Use as a guide but test multiple solutions
  • Elbow plot built on datamart_rfm

Customer Segmentation in Python

Experimental approach - analyze segments

  • Build clustering at and around elbow solution
  • Analyze their properties - average RFM values
  • Compare against each other and choose one which makes most business sense
Customer Segmentation in Python

Experimental approach - analyze segments

  • Previous 2-cluster solution
  • 3-cluster solution on the same normalized RFM dataset
Customer Segmentation in Python

Let's practice finding the optimal number of clusters!

Customer Segmentation in Python

Preparing Video For Download...