Calculating RFM metrics

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Definitions

  • Recency - days since last customer transaction
  • Frequency - number of transactions in the last 12 months
  • Monetary Value - total spend in the last 12 months
Customer Segmentation in Python

Dataset and preparations

  • Same online dataset like in the previous lessons
  • Need to do some data preparation
  • New TotalSum column = Quantity x UnitPrice.

top_5_rows

Customer Segmentation in Python

Data preparation steps

We're starting with a pre-processed online DataFrame with only the latest 12 months of data:

print('Min:{}; Max:{}'.format(min(online.InvoiceDate),
                              max(online.InvoiceDate)))
Min:2010-12-10; Max:2011-12-09

Let's create a hypothetical snapshot_day data as if we're doing analysis recently.

snapshot_date = max(online.InvoiceDate) + datetime.timedelta(days=1)
Customer Segmentation in Python

Calculate RFM metrics

# Aggregate data on a customer level
datamart = online.groupby(['CustomerID']).agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
    'InvoiceNo': 'count',
    'TotalSum': 'sum'})

# Rename columns for easier interpretation datamart.rename(columns = {'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalSum': 'MonetaryValue'}, inplace=True)
# Check the first rows datamart.head()
Customer Segmentation in Python

Final RFM values

Our table for RFM segmentation is completed!

rfm_top_5

Customer Segmentation in Python

Let's practice calculating RFM values!

Customer Segmentation in Python

Preparing Video For Download...