Using the randomization distribution

Foundations of Inference in R

Jo Hardin

Instructor

Understanding the null distribution

ch1_3_infer.003.png

Understanding the null distribution

ch1_3_infer.004.png

Understanding the null distribution

ch1_3_infer.005.png

Understanding the null distribution

ch1_3_infer.006.png

Understanding the null distribution

ch1_3_infer.007.png

Data consistent with null?

table(soda)

         location
drink    East West
cola     28   19
orange   6    7

soda %>% group_by(location) %>% 
    summarize(mean(drink == "cola"))

# A tibble: 2 × 2
  location `mean(drink == "cola")`
    <fctr>                   <dbl>
1     East               0.8235294
2     West               0.7307692

Significance

ch1_3_infer.011.png

How extreme are the observed data?

diff_orig <- soda %>%
  group_by(location) %>%
  summarize(prop_cola = mean(drink == "cola")) %>%
  summarize(diff(prop_cola)) %>%
  pull()

 soda_perm <- soda %>%
  specify(drink ~ location, success = "cola") %>%
  hypothesize(null = "independence") %>%
  generate(reps = 100, type = "permute") %>%
  calculate(stat = "diff in props", 
              order = c("west", "east"))

soda_perm %>% 
    summarize(proportion = mean(diff_orig >= stat))

 # A tibble: 1 x 1
  proportion
       <dbl>
1      0.380

Let's practice!

Foundations of Inference in R