A/B-testen in Python
Moe Lotfy, PhD
Principal Data Science Manager

Analyseenheid:
Randomisatie-eenheid:

print(checkout.groupby('checkout_page')[['order_value','purchased']].agg({'sum','count','mean'}))
order_value purchased
mean sum count mean sum count
checkout_page
A 24.956437 61417.791564 2461 0.820333 2461.0 3000
B 29.876202 75915.430125 2541 0.847000 2541.0 3000
C 34.917589 90890.484142 2603 0.867667 2603.0 3000
checkout.groupby('checkout_page')['order_value'].sum()/
checkout.groupby('checkout_page')['purchased'].count()
checkout_page
A 20.472597
B 25.305143
C 30.296828
dtype: float64

# Variantie van ratiostatistiek via deltamethode
def var_delta(x,y):
x_bar = np.mean(x)
y_bar = np.mean(y)
x_var = np.var(x,ddof=1)
y_var = np.var(y,ddof=1)
cov_xy = np.cov(x,y,ddof=1)[0][1]
# Note that we divide by len(x) here because the denominator of the test statistic is standard error (=sqrt(var/n))
var_ratio = (x_var/y_bar**2 + y_var*(x_bar**2/y_bar**4) - 2*cov_xy*(x_bar/y_bar**3))/len(x)
return var_ratio
# Delta method ztest calculation
ztest_delta(x_control,y_control,x_treatment,y_treatment, alpha = 0.05)
Invoerargumenten:
x_control: tellerkolom op gebruikersniveau voor controlevarianty_control: noemer op gebruikersniveau voor controlevariantx_treatment: tellerkolom op gebruikersniveau voor treatmentvarianty_treatment: noemer op gebruikersniveau voor treatmentvariantalpha: significantieniveau.Uitvoer:
mean_control: gemiddelde ratiostatistiek controlemean_treatment: gemiddelde ratiostatistiek treatmentdifference: verschil tussen treatment- en controlegemiddeldediff_CI: betrouwbaarheidsinterval van het verschil in gemiddeldenp-value: tweezijdige z-test p-waarde# Maak DataFrames met per-gebruiker-statistieken voor varianten A en B
A_per_user = pd.DataFrame({'order_value':checkout[checkout['checkout_page']=='A'].groupby('user_id')['order_value'].sum()
,'page_view':checkout[checkout['checkout_page']=='A'].groupby('user_id')['user_id'].count()})
B_per_user = pd.DataFrame({'order_value':checkout[checkout['checkout_page']=='B'].groupby('user_id')['order_value'].sum()
,'page_view':checkout[checkout['checkout_page']=='B'].groupby('user_id')['user_id'].count()})
# Stel de ratio-kolommen voor control en treatment in
x_control = A_per_user['order_value']
y_control = A_per_user['page_view']
x_treatment = B_per_user['order_value']
y_treatment = B_per_user['page_view']
# Voer een z-test uit voor ratiostatistieken
ztest_delta(x_control,y_control,x_treatment,y_treatment)
{'mean_control': 20.472597188012,
'mean_treatment': 25.30514337484097,
'difference': 4.833,
'diff_CI': '[4.257, 5.408]',
'p-value': 5.954978880467735e-61}
A/B-testen in Python