A/B Testing in Python
Moe Lotfy, PhD
Principal Data Science Manager
Unit of analysis:
Randomization unit:
print(checkout.groupby('checkout_page')[['order_value','purchased']].agg({'sum','count','mean'}))
order_value purchased
mean sum count mean sum count
checkout_page
A 24.956437 61417.791564 2461 0.820333 2461.0 3000
B 29.876202 75915.430125 2541 0.847000 2541.0 3000
C 34.917589 90890.484142 2603 0.867667 2603.0 3000
checkout.groupby('checkout_page')['order_value'].sum()/
checkout.groupby('checkout_page')['purchased'].count()
checkout_page
A 20.472597
B 25.305143
C 30.296828
dtype: float64
# Delta method variance of ratio metric
def var_delta(x,y):
x_bar = np.mean(x)
y_bar = np.mean(y)
x_var = np.var(x,ddof=1)
y_var = np.var(y,ddof=1)
cov_xy = np.cov(x,y,ddof=1)[0][1]
# Note that we divide by len(x) here because the denominator of the test statistic is standard error (=sqrt(var/n))
var_ratio = (x_var/y_bar**2 + y_var*(x_bar**2/y_bar**4) - 2*cov_xy*(x_bar/y_bar**3))/len(x)
return var_ratio
# Delta method ztest calculation
ztest_delta(x_control,y_control,x_treatment,y_treatment, alpha = 0.05)
Input arguments:
x_control
: control variant user-level ratio numerator columny_control
: control variant user-level ratio denominator columnx_treatment
: treatment variant user-level ratio numerator columny_treatment
: treatment variant user-level ratio denominator columnalpha
: significance level.Output:
mean_control
: control group ratio metric meanmean_treatment
: treatment group ratio metric meandifference
: difference between treatment and control meansdiff_CI
: confidence interval of the difference in meansp-value
: the two-tailed z-test p-value# Create DataFrames for per user metrics for variants A and B
A_per_user = pd.DataFrame({'order_value':checkout[checkout['checkout_page']=='A'].groupby('user_id')['order_value'].sum()
,'page_view':checkout[checkout['checkout_page']=='A'].groupby('user_id')['user_id'].count()})
B_per_user = pd.DataFrame({'order_value':checkout[checkout['checkout_page']=='B'].groupby('user_id')['order_value'].sum()
,'page_view':checkout[checkout['checkout_page']=='B'].groupby('user_id')['user_id'].count()})
# Assign the control and treatment ratio columns
x_control = A_per_user['order_value']
y_control = A_per_user['page_view']
x_treatment = B_per_user['order_value']
y_treatment = B_per_user['page_view']
# Run a z-test for ratio metrics
ztest_delta(x_control,y_control,x_treatment,y_treatment)
{'mean_control': 20.472597188012,
'mean_treatment': 25.30514337484097,
'difference': 4.833,
'diff_CI': '[4.257, 5.408]',
'p-value': 5.954978880467735e-61}
A/B Testing in Python