Sanity checks: external validity

A/B Testing in Python

Moe Lotfy, PhD

Principal Data Science Manager

Simpson's paradox

Simpson's Paradox: a statistical phenomenon where certain trends between variables emerge, disappear or reverse when the population is divided into segments.

print(simp_imbalanced.groupby('Variant').mean())
Variant     Conversion
A              0.80
B              0.64
print(simp_imbalanced.groupby(['Variant','Device']).mean())
Variant Device   Conversion
A       Phone        0.875
        Tablet       0.500
B       Phone        0.900
        Tablet       0.575
A/B Testing in Python

Simpson's paradox

simp_imbalanced.groupby(['Variant','Device'])\
                            ['Device'].count()
Variant  Device
A        Phone     40
         Tablet    10
B        Phone     10
         Tablet    40

Simpson's paradox example table

A/B Testing in Python

Simpson's paradox

simp_balanced.groupby(['Variant','Device'])\
                        ['Device'].count()
Variant  Device
A        Phone     40
         Tablet    10
B        Phone     40
         Tablet    10
print(simp_balanced.groupby('Variant').mean())
Variant     Conversion
A              0.70
B              0.52
print(simp_balanced.groupby(['Variant','Device']).mean())
Variant Device     Conversion
A       Phone        0.750
        Tablet       0.500
B       Phone        0.575
        Tablet       0.300
A/B Testing in Python

Novelty effect

  • Novelty effect
    • A short-lived improvement in metrics caused by users' curiosity about a new feature.
  • Change aversion
    • The opposite of novelty effect.
    • Users avoiding trying a new feature due to familiarity with the old one.
A/B Testing in Python

Novelty effect visual inspection

# Plot Lift in CTR vs test days
novelty.plot('date', 'CTR_lift')
plt.ylim([0, 0.09])
plt.title('Lift in CTR vs Test Duration')
plt.show()

Novely effect visual demonstration as a line plot of CTR vs experiment days

A/B Testing in Python

Correcting for novelty effects

  • Increasing the test duration
    • Start including data after treatment effect stabilizes.
  • Examine new and returning user cohorts
    • New users are by default less likely to experience novelty effects.
    • Old users compare consider their old experiences.
A/B Testing in Python

Let's practice!

A/B Testing in Python

Preparing Video For Download...