Sanitychecks: interne validiteit

A/B-testen in Python

Moe Lotfy, PhD

Principal Data Science Manager

Sample Ratio Mismatch (SRM)

  • Sample Ratio Mismatch (SRM)
    • Allocatie over varianten wijkt af van het ontwerp
  • Chi-kwadraat goodness-of-fit-toets

Chi-kwadraatformule

Voorbeeld van SRM-allocatie

A/B-testen in Python

SRM-voorbeeld in Python

# Calculate the unique IDs per variant
AdSmart.groupby('experiment')['auction_id'].nunique()
experiment
control    4071
exposed    4006
# Assign the unqiue counts to each variant
control_users=AdSmart[AdSmart['experiment']=='control']['auction_id'].nunique()
exposed_users=AdSmart[AdSmart['experiment']=='exposed']['auction_id'].nunique()
total_users=control_users+exposed_users
# Calculate allocation ratios per variant
control_perc = control_users / total_users
exposed_perc = exposed_users / total_users
print("Percentage of users in the Control group:",100*round(control_perc,5),"%")
print("Percentage of users in the Exposed group:",100*round(exposed_perc,5),"%")
Percentage of users in the Control group: 50.402 %
Percentage of users in the Exposed group: 49.598 %
1 Adsmart Kaggle dataset: https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing
A/B-testen in Python

SRM-voorbeeld in Python

# Creat lists of observed and expected counts per variant
observed = [ control_users, exposed_users ]
expected = [ total_users/2, total_users/2 ]
# Import chisquare from scipy library
from scipy.stats import chisquare
# Run chisquare test on observed and expected lists
chi = chisquare(observed, f_exp=expected)
# Print test results and interpretation
print(chi)
if chi[1] < 0.01:
    print("SRM may be present")
else:
    print("SRM likely not present")
Power_divergenceResult(statistic=0.5230902562832735, pvalue=0.4695264353014863)
SRM likely not present
1 Adsmart Kaggle dataset: https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing
A/B-testen in Python

SRM: oorzaken vinden

Veelvoorkomende oorzaken van SRM:$^1$

  • Toewijzing: verkeerde buckets of kapotte randomisatie-functies
  • Uitvoering: vertraagde start of ongelijkmatige ramp-up van varianten
  • Datalogging: logvertragingen of botfiltering
  • Interferentie: experimentator pauzeert een variant
1 Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners
A/B-testen in Python

A/A-tests

  • A/A-test
    • Biedt twee groepen exact dezelfde ervaring
    • Vindt bugs in de experimentele setup
    • Geen statistisch significante verschillen in de metrics
    • Er kunnen nog steeds valse positieven zijn bij de gekozen $\alpha$ (5% van de tijd)
    • Toont scheve verdelingen tussen groepen (bv. browsers, devices)
A/B-testen in Python

Voorbeeld balans verdelingen in Python

  • Gebalanceerde browserverdeling
  • Geldige test
checkout.groupby('checkout_page')['browser'].value_counts(normalize=True)
checkout_page  browser
A              chrome     0.341333
               safari     0.332000
               firefox    0.326667
B              safari     0.352000
               firefox    0.325000
               chrome     0.323000
C              safari     0.346000
               chrome     0.330000
               firefox    0.324000
  • Scheve browserverdeling
  • Ongeldige test
 AdSmart.groupby('experiment')['browser'].value_counts(normalize=True)
experiment  browser                   
control     Chrome Mobile                 0.591992
            Facebook                      0.137804
            Samsung Internet              0.120855
            Chrome Mobile WebView         0.071727
            Mobile Safari                 0.060427
            Chrome Mobile iOS             0.008352
            Mobile Safari UI/WKWebView    0.007369
exposed     Chrome Mobile                 0.535197
            Chrome Mobile WebView         0.298802
            Samsung Internet              0.082876
            Facebook                      0.050674
            Mobile Safari                 0.022716
            Chrome Mobile iOS             0.004244
1 Adsmart Kaggle dataset: https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing
A/B-testen in Python

Laten we oefenen!

A/B-testen in Python

Preparing Video For Download...