Calculating sample size

Customer Analytics and A/B Testing in Python

Ryan Grossman

Data Scientist, EDO

Calculating the sample size of our test

Customer Analytics and A/B Testing in Python

Null hypothesis

  • Hypothesis that control & treatment have the same impact on the response
    • Updated paywall does not improve conversion rate
    • Any observed difference is due to randomness
  • Rejecting the Null Hypothesis
    • Determine their is a difference between the treatment and control
    • Statistically significant result
Customer Analytics and A/B Testing in Python

Types of error & confidence level

  • Confidence Level: Probability of not making Type 1 Error
  • Higher this value, larger test sample needed
  • Common values: 0.90 & 0.95

alt

Customer Analytics and A/B Testing in Python

Statistical power

Statistical Power: Probability of finding a statistically significant result when the Null Hypothesis is false

Customer Analytics and A/B Testing in Python

Connecting the Different Components

  • Estimate our needed sample size from:
    • needed level of sensitivity
    • our desired test power & confidence level

Customer Analytics and A/B Testing in Python

Power formula

  • Sample size increases = Power increases
  • Confidence level increases = Power decreases

Customer Analytics and A/B Testing in Python

Sample size function

# Calculate the test power (some details omitted)
def get_power(n, p1, p2, cl):
    alpha = 1 - cl
    qu = stats.norm.ppf(1 - alpha/2)
    diff = abs(p2 - p1)
    bp = (p1 + p2) / 2
    ... 
    power = power_part_one + power_part_two
    return(power)

# Calculate the sample size needed for the specified # power and confidence level def get_sample_size(power, p1, p2, cl, max_n = 1000000): n = 1 while n <= max_n: tmp_power = get_power(n, p1, p2, cl) if tmp_power >= power: return n else: n = n + 1
Customer Analytics and A/B Testing in Python

Calculating our needed sample size

  • Baseline Conversion Rate: 0.03468 (calculated previously)
  • Confidence Level: 0.95 (chosen by us)
  • Desired Power: 0.80 (chosen by us)
  • Sensitivity: 0.1 (chosen by us)
sample_size_per_group = get_sample_size(
    0.8 # Desired Power     
    conversion_rate, 
    conversion_rate * 1.1 # Lifted conversion rate,
    0.95 # Confidence level)
print(sample_size_per_group)
 45788
Customer Analytics and A/B Testing in Python

Generality of this function

  • Function shown specific to conversion rate calculations
  • Different response variables have different but analogous formulas
Customer Analytics and A/B Testing in Python

Decreasing the needed sample size

  • Choose a unit of observation with lower variability
  • Excluding users irrelevant to the process/change
  • Think through how different factors relate to the sample size
Customer Analytics and A/B Testing in Python

Let's practice!

Customer Analytics and A/B Testing in Python

Preparing Video For Download...