Customer Segmentation in Python
Karolis Urbonas
Head of Data Science, Amazon
Create a cluster label column in the original DataFrame:
datamart_rfm_k2 = datamart_rfm.assign(Cluster = cluster_labels)
Calculate average RFM values and sizes for each cluster:
datamart_rfm_k2.groupby(['Cluster']).agg({
'Recency': 'mean',
'Frequency': 'mean',
'MonetaryValue': ['mean', 'count'],
}).round(0)
Transform datamart_normalized
as DataFrame and add a Cluster
column
datamart_normalized = pd.DataFrame(datamart_normalized,
index=datamart_rfm.index,
columns=datamart_rfm.columns)
datamart_normalized['Cluster'] = datamart_rfm_k3['Cluster']
Melt the data into a long format so RFM values and metric names are stored in 1 column each
datamart_melt = pd.melt(datamart_normalized.reset_index(),
id_vars=['CustomerID', 'Cluster'],
value_vars=['Recency', 'Frequency', 'MonetaryValue'],
var_name='Attribute',
value_name='Value')
plt.title('Snake plot of standardized variables')
sns.lineplot(x="Attribute", y="Value", hue='Cluster', data=datamart_melt)
cluster_avg = datamart_rfm_k3.groupby(['Cluster']).mean()
population_avg = datamart_rfm.mean()
relative_imp = cluster_avg / population_avg - 1
relative_imp.round(2)
Recency Frequency MonetaryValue
Cluster
0 -0.82 1.68 1.83
1 0.84 -0.84 -0.86
2 -0.15 -0.34 -0.42
# Plot heatmap
plt.figure(figsize=(8, 2))
plt.title('Relative importance of attributes')
sns.heatmap(data=relative_imp, annot=True, fmt='.2f', cmap='RdYlGn')
plt.show()
Recency Frequency MonetaryValue
Cluster
0 -0.82 1.68 1.83
1 0.84 -0.84 -0.86
2 -0.15 -0.34 -0.42
Customer Segmentation in Python