Cohort analysis

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Cohort analysis heatmap

Rows:

  • First activity
  • Here - month of acquisition

Columns:

  • Time since first activity
  • Here - months since acquisition

Cohort heatmap

Customer Segmentation in Python

Cohort analysis heatmap

Rows:

  • First activity
  • Here - month of acquisition

Columns:

  • Time since first activity
  • Here - months since acquisition

Customer Segmentation in Python

Online retail data

Over 0.5 million transactions from a UK-based online retail store.

We will use a randomly sampled 20% subset of this dataset throughout the course.

OnlineRetail

Customer Segmentation in Python

Top 5 rows of data

online.head()

Top 5 rows

Customer Segmentation in Python

Assign acquisition month cohort

def get_month(x): return dt.datetime(x.year, x.month, 1)

online['InvoiceMonth'] = online['InvoiceDate'].apply(get_month)
grouping = online.groupby('CustomerID')['InvoiceMonth']
online['CohortMonth'] = grouping.transform('min')
online.head()

top5-cohort-added

Customer Segmentation in Python

Extract integer values from data

Define function to extract year, month and day integer values.

We will use it throughout the course.

def get_date_int(df, column):
    year = df[column].dt.year
    month = df[column].dt.month
    day = df[column].dt.day
    return year, month, day
Customer Segmentation in Python

Assign time offset value

invoice_year, invoice_month, _ = get_date_int(online, 'InvoiceMonth') 
cohort_year, cohort_month, _ = get_date_int(online, 'CohortMonth')

years_diff = invoice_year - cohort_year months_diff = invoice_month - cohort_month
online['CohortIndex'] = years_diff * 12 + months_diff + 1 online.head()

top5-time-offset-added

Customer Segmentation in Python

Count monthly active customers from each cohort

grouping = online.groupby(['CohortMonth', 'CohortIndex'])

cohort_data = grouping['CustomerID'].apply(pd.Series.nunique)
cohort_data = cohort_data.reset_index()
cohort_counts = cohort_data.pivot(index='CohortMonth', columns='CohortIndex', values='CustomerID')
print(cohort_counts)
Customer Segmentation in Python

online-pivot-counts

Customer Segmentation in Python

Your turn to build some cohorts!

Customer Segmentation in Python

Preparing Video For Download...