Profile and interpret segments

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Approaches to build customer personas

Summary statistics for each cluster e.g. average RFM values
Snake plots (from market research)
Relative importance of cluster attributes compared to population

Summary statistics of each cluster

Run k-means segmentation for several k values around the recommended value.

Create a cluster label column in the original DataFrame:

datamart_rfm_k2 = datamart_rfm.assign(Cluster = cluster_labels)

Calculate average RFM values and sizes for each cluster:

datamart_rfm_k2.groupby(['Cluster']).agg({
   'Recency': 'mean',
   'Frequency': 'mean',
   'MonetaryValue': ['mean', 'count'],
}).round(0)

Repeat the same for k=3

Summary statistics of each cluster

Compare average RFM values of each clustering solution

Snake plots to understand and compare segments

Market research technique to compare different segments
Visual representation of each segment's attributes
Need to first normalize data (center & scale)
Plot each cluster's average normalized values of each attribute

Prepare data for a snake plot

Transform datamart_normalized as DataFrame and add a Cluster column

datamart_normalized = pd.DataFrame(datamart_normalized, 
                                   index=datamart_rfm.index, 
                                   columns=datamart_rfm.columns)
datamart_normalized['Cluster'] = datamart_rfm_k3['Cluster']

Melt the data into a long format so RFM values and metric names are stored in 1 column each

datamart_melt = pd.melt(datamart_normalized.reset_index(), 
                    id_vars=['CustomerID', 'Cluster'],
                    value_vars=['Recency', 'Frequency', 'MonetaryValue'], 
                    var_name='Attribute', 
                    value_name='Value')

Visualize a snake plot

plt.title('Snake plot of standardized variables')
sns.lineplot(x="Attribute", y="Value", hue='Cluster', data=datamart_melt)

Relative importance of segment attributes

Useful technique to identify relative importance of each segment's attribute
Calculate average values of each cluster
Calculate average values of population
Calculate importance score by dividing them and subtracting 1 (ensures 0 is returned when cluster average equals population average)

cluster_avg = datamart_rfm_k3.groupby(['Cluster']).mean()

population_avg = datamart_rfm.mean()

relative_imp = cluster_avg / population_avg - 1

Analyze and plot relative importance

As a ratio moves away from 0, attribute importance for a segment (relative to total pop.) increases.

relative_imp.round(2)

         Recency  Frequency  MonetaryValue
Cluster                                   
0          -0.82       1.68           1.83
1           0.84      -0.84          -0.86
2          -0.15      -0.34          -0.42

# Plot heatmap
plt.figure(figsize=(8, 2))
plt.title('Relative importance of attributes')
sns.heatmap(data=relative_imp, annot=True, fmt='.2f', cmap='RdYlGn')
plt.show()

Relative importance heatmap

         Recency  Frequency  MonetaryValue
Cluster                                   
0          -0.82       1.68           1.83
1           0.84      -0.84          -0.86
2          -0.15      -0.34          -0.42

Your time to experiment with different customer profiling techniques!

Customer Segmentation in Python