Sanity checks: Internal validity

A/B Testing in Python

Moe Lotfy, PhD

Principal Data Science Manager

Sample Ratio Mismatch (SRM)

  • Sample Ration Mismatch (SRM)
    • Allocation across variants deviates from design
  • Chi-square goodness of fit test

Chi-square formula

Sample ratio mismatch allocation example

A/B Testing in Python

SRM python example

# Calculate the unique IDs per variant
AdSmart.groupby('experiment')['auction_id'].nunique()
experiment
control    4071
exposed    4006
# Assign the unqiue counts to each variant
control_users=AdSmart[AdSmart['experiment']=='control']['auction_id'].nunique()
exposed_users=AdSmart[AdSmart['experiment']=='exposed']['auction_id'].nunique()
total_users=control_users+exposed_users
# Calculate allocation ratios per variant
control_perc = control_users / total_users
exposed_perc = exposed_users / total_users
print("Percentage of users in the Control group:",100*round(control_perc,5),"%")
print("Percentage of users in the Exposed group:",100*round(exposed_perc,5),"%")
Percentage of users in the Control group: 50.402 %
Percentage of users in the Exposed group: 49.598 %
1 Adsmart Kaggle dataset: https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing
A/B Testing in Python

SRM python example

# Creat lists of observed and expected counts per variant
observed = [ control_users, exposed_users ]
expected = [ total_users/2, total_users/2 ]
# Import chisquare from scipy library
from scipy.stats import chisquare
# Run chisquare test on observed and expected lists
chi = chisquare(observed, f_exp=expected)
# Print test results and interpretation
print(chi)
if chi[1] < 0.01:
    print("SRM may be present")
else:
    print("SRM likely not present")
Power_divergenceResult(statistic=0.5230902562832735, pvalue=0.4695264353014863)
SRM likely not present
1 Adsmart Kaggle dataset: https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing
A/B Testing in Python

SRM root-causing

Common causes of SRM:$^1$

  • Assignment: incorrect bucketing or faulty randomization functions
  • Execution: delayed variants starting time or ramp up rates
  • Data logging: logging delays or bot filtering
  • Interference: experimenter pausing a variant
1 Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners
A/B Testing in Python

A/A tests

  • A/A test
    • Presents an identical experience to two groups of users
    • Reveals bugs in experimental setup
    • No statistically significance differences between the metrics
    • False positives can still happen at the specified $\alpha$ (5% of the time)
    • Reveals imbalances in distributions across groups (e.g. browsers, devices, etc.)
A/B Testing in Python

Distributions balance Python example

  • Balanced browsers distribution
  • Valid test
checkout.groupby('checkout_page')['browser'].value_counts(normalize=True)
checkout_page  browser
A              chrome     0.341333
               safari     0.332000
               firefox    0.326667
B              safari     0.352000
               firefox    0.325000
               chrome     0.323000
C              safari     0.346000
               chrome     0.330000
               firefox    0.324000
  • Imbalanced browsers distribution
  • Invalid test
 AdSmart.groupby('experiment')['browser'].value_counts(normalize=True)
experiment  browser                   
control     Chrome Mobile                 0.591992
            Facebook                      0.137804
            Samsung Internet              0.120855
            Chrome Mobile WebView         0.071727
            Mobile Safari                 0.060427
            Chrome Mobile iOS             0.008352
            Mobile Safari UI/WKWebView    0.007369
exposed     Chrome Mobile                 0.535197
            Chrome Mobile WebView         0.298802
            Samsung Internet              0.082876
            Facebook                      0.050674
            Mobile Safari                 0.022716
            Chrome Mobile iOS             0.004244
1 Adsmart Kaggle dataset: https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing
A/B Testing in Python

Let's practice!

A/B Testing in Python

Preparing Video For Download...