A/B Testing in Python
Moe Lotfy, PhD
Principal Data Science Manager
# Calculate the mean and count of time on page by variant
print(checkout.groupby('checkout_page')['time_on_page'].agg({'mean', 'count'}))
mean count
checkout_page
A 44.668527 3000
B 42.723772 3000
C 42.223772 3000
# Set random seed for repeatability
np.random.seed(40)
# Take a random sample of size 25 from each variant
ToP_samp_A = checkout[checkout['checkout_page'] == 'A'].sample(25)['time_on_page']
ToP_samp_B = checkout[checkout['checkout_page'] == 'B'].sample(25)['time_on_page']
# Run a Mann-Whitney U test
mwu_test = pingouin.mwu(x=ToP_samp_A,
y=ToP_samp_B,
alternative='two-sided')
# Print the test results
print(mwu_test)
U-val alternative p-val RBC CLES
MWU 441.0 two-sided 0.013007 -0.4112 0.7056
Homepage signup rates A/B test
Null: There is no significant difference in signup rates between landing page designs C and D
Alternative: There is no significant difference in signup rates between them
# Calculate the number of users in groups C and D
n_C = homepage[homepage['landing_page'] == 'C']['user_id'].nunique()
n_D = homepage[homepage['landing_page'] == 'D']['user_id'].nunique()
# Compute unique signups in each group
signup_C = homepage[homepage['landing_page'] == 'C'].groupby('user_id')['signup'].max().sum()
no_signup_C = n_C - signup_C
signup_D = homepage[homepage['landing_page'] == 'D'].groupby('user_id')['signup'].max().sum()
no_signup_D = n_D - signup_D
# Create the signups table
table = [[signup_C, no_signup_C], [signup_D, no_signup_D]]
print('Group C signup rate:',round(signup_C/n_C,3))
print('Group D signup rate:',round(signup_D/n_D,3))
# Calculate p-value
print('p-value=',stats.chi2_contingency(table,correction=False)[1])
Group C signup rate: 0.064
Group D signup rate: 0.048
p-value= 0.009165
A/B Testing in Python