Klantsegmentatie in Python
Karolis Urbonas
Head of Data Science, Amazon
Maak een clusterlabelkolom in de originele DataFrame:
datamart_rfm_k2 = datamart_rfm.assign(Cluster = cluster_labels)
Bereken gemiddelde RFM-waarden en groottes per cluster:
datamart_rfm_k2.groupby(['Cluster']).agg({
'Recency': 'mean',
'Frequency': 'mean',
'MonetaryValue': ['mean', 'count'],
}).round(0)
Zet datamart_normalized om naar een DataFrame en voeg een Cluster-kolom toe
datamart_normalized = pd.DataFrame(datamart_normalized,
index=datamart_rfm.index,
columns=datamart_rfm.columns)
datamart_normalized['Cluster'] = datamart_rfm_k3['Cluster']
Smelt de data naar long-formaat zodat RFM-waarden en metriek-namen elk in 1 kolom staan
datamart_melt = pd.melt(datamart_normalized.reset_index(),
id_vars=['CustomerID', 'Cluster'],
value_vars=['Recency', 'Frequency', 'MonetaryValue'],
var_name='Attribute',
value_name='Value')
plt.title('Snake plot of standardized variables')
sns.lineplot(x="Attribute", y="Value", hue='Cluster', data=datamart_melt)
cluster_avg = datamart_rfm_k3.groupby(['Cluster']).mean()population_avg = datamart_rfm.mean()relative_imp = cluster_avg / population_avg - 1
relative_imp.round(2)
Recency Frequency MonetaryValue
Cluster
0 -0.82 1.68 1.83
1 0.84 -0.84 -0.86
2 -0.15 -0.34 -0.42
# Plot heatmap
plt.figure(figsize=(8, 2))
plt.title('Relative importance of attributes')
sns.heatmap(data=relative_imp, annot=True, fmt='.2f', cmap='RdYlGn')
plt.show()
Recency Frequency MonetaryValue
Cluster
0 -0.82 1.68 1.83
1 0.84 -0.84 -0.86
2 -0.15 -0.34 -0.42
Klantsegmentatie in Python