Profile and interpret segments

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Approaches to build customer personas

  • Summary statistics for each cluster e.g. average RFM values
  • Snake plots (from market research)
  • Relative importance of cluster attributes compared to population
Customer Segmentation in Python

Summary statistics of each cluster

  • Run k-means segmentation for several k values around the recommended value.
  • Create a cluster label column in the original DataFrame:

    datamart_rfm_k2 = datamart_rfm.assign(Cluster = cluster_labels)
    

    Calculate average RFM values and sizes for each cluster:

    datamart_rfm_k2.groupby(['Cluster']).agg({
       'Recency': 'mean',
       'Frequency': 'mean',
       'MonetaryValue': ['mean', 'count'],
    }).round(0)
    
    • Repeat the same for k=3
Customer Segmentation in Python

Summary statistics of each cluster

  • Compare average RFM values of each clustering solution
Customer Segmentation in Python

Snake plots to understand and compare segments

  • Market research technique to compare different segments
  • Visual representation of each segment's attributes
  • Need to first normalize data (center & scale)
  • Plot each cluster's average normalized values of each attribute
Customer Segmentation in Python

Prepare data for a snake plot

Transform datamart_normalized as DataFrame and add a Cluster column

datamart_normalized = pd.DataFrame(datamart_normalized, 
                                   index=datamart_rfm.index, 
                                   columns=datamart_rfm.columns)
datamart_normalized['Cluster'] = datamart_rfm_k3['Cluster']

Melt the data into a long format so RFM values and metric names are stored in 1 column each

datamart_melt = pd.melt(datamart_normalized.reset_index(), 
                    id_vars=['CustomerID', 'Cluster'],
                    value_vars=['Recency', 'Frequency', 'MonetaryValue'], 
                    var_name='Attribute', 
                    value_name='Value')
Customer Segmentation in Python

Visualize a snake plot

plt.title('Snake plot of standardized variables')
sns.lineplot(x="Attribute", y="Value", hue='Cluster', data=datamart_melt)

Customer Segmentation in Python

Relative importance of segment attributes

  • Useful technique to identify relative importance of each segment's attribute
  • Calculate average values of each cluster
  • Calculate average values of population
  • Calculate importance score by dividing them and subtracting 1 (ensures 0 is returned when cluster average equals population average)
cluster_avg = datamart_rfm_k3.groupby(['Cluster']).mean()

population_avg = datamart_rfm.mean()
relative_imp = cluster_avg / population_avg - 1
Customer Segmentation in Python

Analyze and plot relative importance

  • As a ratio moves away from 0, attribute importance for a segment (relative to total pop.) increases.
relative_imp.round(2)
         Recency  Frequency  MonetaryValue
Cluster                                   
0          -0.82       1.68           1.83
1           0.84      -0.84          -0.86
2          -0.15      -0.34          -0.42
# Plot heatmap
plt.figure(figsize=(8, 2))
plt.title('Relative importance of attributes')
sns.heatmap(data=relative_imp, annot=True, fmt='.2f', cmap='RdYlGn')
plt.show()
Customer Segmentation in Python

Relative importance heatmap

         Recency  Frequency  MonetaryValue
Cluster                                   
0          -0.82       1.68           1.83
1           0.84      -0.84          -0.86
2          -0.15      -0.34          -0.42
Customer Segmentation in Python

Your time to experiment with different customer profiling techniques!

Customer Segmentation in Python

Preparing Video For Download...