Customer Segmentation in Python
Karolis Urbonas
Head of Data Science, Amazon
Behavioral customer segmentation based on three metrics:
The RFM values can be grouped in several ways:
We are going to implement percentile-based grouping.
Process of calculating percentiles:
Data with eight CustomerID
and a randomly calculated Spend
values.
spend_quartiles = pd.qcut(data['Spend'], q=4, labels=range(1,5))
data['Spend_Quartile'] = spend_quartiles
data.sort_values('Spend')
# Create numbered labels r_labels = list(range(4, 0, -1))
# Divide into groups based on quartiles recency_quartiles = pd.qcut(data['Recency_Days'], q=4, labels=r_labels)
# Create new column data['Recency_Quartile'] = recency_quartiles
# Sort recency values from lowest to highest data.sort_values('Recency_Days')
As you can see, the quartile labels are reversed, since the more recent customers are more valuable.
We can define a list with string or any other values, depending on the use case.
# Create string labels r_labels = ['Active', 'Lapsed', 'Inactive', 'Churned']
# Divide into groups based on quartiles recency_quartiles = pd.qcut(data['Recency_Days'], q=4, labels=r_labels) # Create new column data['Recency_Quartile'] = recency_quartiles # Sort values from lowest to highest data.sort_values('Recency_Days')
Custom labels assigned to each quartile
Customer Segmentation in Python