Foundations of Inference in R
Jo Hardin
Instructor






Generating a distribution of the statistic from the null population gives information about whether the observed data are inconsistent with the null hypothesis
Original data
| Location | Cola | Orange |
|---|---|---|
| East | 28 | 6 |
| West | 19 | 7 |
$\hat{p}_\text{east} = 28/(28 + 6) = 0.82$
$\hat{p}_\text{west} = 19/(19 + 7) = 0.73$
First shuffle, same as original
| Location | Cola | Orange |
|---|---|---|
| East | 28 | 6 |
| West | 19 | 7 |

Second shuffle
| Location | Cola | Orange |
|---|---|---|
| East | 27 | 7 |
| West | 20 | 6 |

Third shuffle
| Location | Cola | Orange |
|---|---|---|
| East | 28 | 8 |
| West | 21 | 5 |

Fourth shuffle
| Location | Cola | Orange |
|---|---|---|
| East | 25 | 9 |
| West | 22 | 4 |

Fifth shuffle
| Location | Cola | Orange |
|---|---|---|
| East | 29 | 5 |
| West | 18 | 8 |

Fifth shuffle
| Location | Cola | Orange |
|---|---|---|
| East | 29 | 5 |
| West | 18 | 8 |







soda %>%
group_by(location) %>%
summarize(prop_cola =
mean(drink == "cola")) %>%
summarize(diff(prop_cola))
# A tibble: 1 x 1
`diff(prop_cola)`
<dbl>
1 -0.09276018
library(infer)
soda %>% specify(drink ~ location,
success = "cola") %>%
hypothesize(null = "independence") %>%
generate(reps = 1, type = "permute") %>%
calculate(stat = "diff in props",
order = c("west","east"))
# A tibble: 1 x 2
replicate stat
<int> <dbl>
1 1 -0.02488688
soda %>%
specify(drink ~ location, success = "cola") %>%
hypothesize(null = "independence") %>%
generate(reps = 5, type = "permute") %>%
calculate(stat = "diff in props", order = c("west", "east"))
# A tibble: 5 x 2
replicate stat
<int> <dbl>
1 1 0.04298643
2 2 -0.09276018
3 3 0.11085973
4 4 0.17873303
5 5 -0.16063348

Foundations of Inference in R