Customer Segmentation in Python
Karolis Urbonas
Head of Data Science, Amazon
online
dataset like in the previous lessonsTotalSum
column = Quantity
x UnitPrice
.We're starting with a pre-processed online
DataFrame with only the latest 12 months of data:
print('Min:{}; Max:{}'.format(min(online.InvoiceDate),
max(online.InvoiceDate)))
Min:2010-12-10; Max:2011-12-09
Let's create a hypothetical snapshot_day data as if we're doing analysis recently.
snapshot_date = max(online.InvoiceDate) + datetime.timedelta(days=1)
# Aggregate data on a customer level datamart = online.groupby(['CustomerID']).agg({ 'InvoiceDate': lambda x: (snapshot_date - x.max()).days, 'InvoiceNo': 'count', 'TotalSum': 'sum'})
# Rename columns for easier interpretation datamart.rename(columns = {'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalSum': 'MonetaryValue'}, inplace=True)
# Check the first rows datamart.head()
Our table for RFM segmentation is completed!
Customer Segmentation in Python