Customer Analytics and A/B Testing in Python
Ryan Grossman
Data Scientist, EDO
Group: pandas.DataFrame.groupby()
DataFrame.groupby(by=None, axis=0, level=None,
as_index=True, sort=True,
group_keys=True, squeeze=False, **kwargs)
Aggregate: pandas.DataFrame.agg()
DataFrame.agg(func, axis=0, *args, **kwargs)
by
: fields to group byaxis
: axis=0
will group by columns,axis=1
will group by rowsas_index
: as_index=True
will use group labels as index# sub_data_demo - combined demographics and purchase data
sub_data_grp = sub_data_demo.groupby(by=['country', 'device'],
axis=0,
as_index=False)
sub_data_grp
<pandas.core.groupby.DataFrameGroupBy object at 0x10ec29080>
# Mean price paid for each country/device
sub_data_grp.price.mean()
country device price
0 BRA and 312.163551
1 BRA iOS 247.884615
2 CAN and 431.448718
3 CAN iOS 505.659574
4 DEU and 398.848837
Pass the name of an aggregation function to agg()
:
# Find the mean price paid with agg
sub_data_grp.price.agg('mean')
country device price
0 BRA and 312.163551
1 BRA iOS 247.884615
2 CAN and 431.448718
3 CAN iOS 505.659574
4 DEU and 398.848837
Pass a list of names of aggregation functions:
# Mean and median price paid for each country/device
sub_data_grp.price.agg(['mean', 'median'])
mean median
country device
BRA and 312.163551 0
iOS 247.884615 0
CAN and 431.448718 699
iOS 505.659574 699
DEU and 398.848837 499
iOS 313.128000 0
Pass a dictionary of column names and aggregation functions
# Calculate multiple metrics across different groups
sub_data_grp.agg({'price': ['mean', 'min', 'max'],
'age': ['mean', 'min', 'max']})
country device price age
mean min max mean min max
0 BRA and 312.163551 0 999 24.303738 15 67
1 BRA iOS 247.884615 0 999 24.024476 15 79
2 CAN and 431.448718 0 999 23.269231 15 58
3 CAN iOS 505.659574 0 999 22.234043 15 38
4 DEU and 398.848837 0 999 23.848837 15 67
5 DEU iOS 313.128000 0 999 24.208000 15 54
def truncated_mean(data): """Compute the mean excluding outliers""" top_val = data.quantile(.9) bot_val = data.quantile(.1) trunc_data = data[(data <= top_val) & (data >= bot_val)] mean = trunc_data.mean() return(mean)
# Find the truncated mean age by group sub_data_grp.agg({'age': [truncated_mean]})
country device age
truncated_mean
0 BRA and 22.636364
...
Customer Analytics and A/B Testing in Python