Customer Segmentation in Python
Karolis Urbonas
Head of Data Science, Amazon
Rows:
Columns:
Rows:
Columns:
Over 0.5 million transactions from a UK-based online retail store.
We will use a randomly sampled 20% subset of this dataset throughout the course.
online.head()
def get_month(x): return dt.datetime(x.year, x.month, 1)
online['InvoiceMonth'] = online['InvoiceDate'].apply(get_month)
grouping = online.groupby('CustomerID')['InvoiceMonth']
online['CohortMonth'] = grouping.transform('min')
online.head()
Define function to extract year
, month
and day
integer values.
We will use it throughout the course.
def get_date_int(df, column):
year = df[column].dt.year
month = df[column].dt.month
day = df[column].dt.day
return year, month, day
invoice_year, invoice_month, _ = get_date_int(online, 'InvoiceMonth') cohort_year, cohort_month, _ = get_date_int(online, 'CohortMonth')
years_diff = invoice_year - cohort_year months_diff = invoice_month - cohort_month
online['CohortIndex'] = years_diff * 12 + months_diff + 1 online.head()
grouping = online.groupby(['CohortMonth', 'CohortIndex'])
cohort_data = grouping['CustomerID'].apply(pd.Series.nunique)
cohort_data = cohort_data.reset_index()
cohort_counts = cohort_data.pivot(index='CohortMonth', columns='CohortIndex', values='CustomerID')
print(cohort_counts)
Customer Segmentation in Python