Mann-Whitney U test

A/B Testing in R

Lauryn Burleigh

Data Scientist

Mann-Whitney U

  • Time to eat Cheese versus Pepperoni pizza
  • Not normal distribution
  • Non-parametric
    • Distribution shape not assumed
    • Mann-Whitney U test

Two left skewed histograms. Pepperoni in pink with a mean of 8 and Cheese in blue with a mean of 6.2.

A/B Testing in R

Assumptions

  • Same distribution shapes
  • Assesses difference in medians
  • Normal distribution: mean = median
  • Non-normal distribution: median more appropriate
  • Extra assumptions: more powerful test
  • Null hypothesis: no difference in the median time to eat cheese and pepperoni pizza
library(ggplot2)
ggplot(pizza, aes(x = Time, 
                  fill = Topping)) +
       geom_histogram() + 
       facet_grid(Topping~.)

Two left skewed histograms. Pepperoni in pink with a mean of 8 and Cheese in blue with a mean of 6.2.

A/B Testing in R

Sample Size

library(pwr)
pwr.2p2n.test(h = 0.40, 
              sig.level = 0.05, 
              power = 0.8, n1 = 100)
    difference of proportion power 
 calculation for binomial distribution
              h = 0.4
             n1 = 100
             n2 = 96.29156
      sig.level = 0.05
          power = 0.8
    alternative = two.sided
NOTE: different sample sizes
pwr.2p2n.test(h = 0.40, 
              sig.level = 0.05, 
              power = 0.8, n1 = 110)
    difference of proportion power 
 calculation for binomial distribution
              h = 0.4
             n1 = 110
             n2 = 88.54092
      sig.level = 0.05
          power = 0.8
    alternative = two.sided
NOTE: different sample sizes
  • Expected effect size h: rank-biserial correlation r
    • See how subjects in groups rank
A/B Testing in R

Test

wilcox.test(Time ~ Topping, 
            data = Pizza)
  • y ~ x
    • y: data
    • x: group
    Wilcoxon rank sum test with 
    continuity correction
data:  Enjoyment by Topping
W = 6051, p-value = 0.01026
alternative hypothesis: true location 
shift is not equal to 0
A/B Testing in R

Effect size and power

Effect size

library(effectsize)
rank_biserial(Time ~ Topping, 
              data = pizza)
r (rank biserial) |         95% CI
<----------------------------------
0.21              | [0.05, 0.36]
  • Small: 0.1
  • Medium: 0.3
  • Large: 0.5

1 - 0.14 = 0.86 probability of Type II error

Power Analysis

library(pwr)
pwr.2p2n.test(h = 0.21, sig.level = 0.01, 
              n1 = 100, n2 = 100)
     difference of proportion power calculation for binomial distribution 

              h = 0.21
             n1 = 100
             n2 = 100
      sig.level = 0.01
          power = 0.1376818
    alternative = two.sided

NOTE: different sample sizes
A/B Testing in R

Let's practice!

A/B Testing in R

Preparing Video For Download...