Foundations of Inference in R
Jo Hardin
Instructor
Generating a distribution of the statistic from the null population gives information about whether the observed data are inconsistent with the null hypothesis
Original data
Location | Cola | Orange |
---|---|---|
East | 28 | 6 |
West | 19 | 7 |
$\hat{p}_\text{east} = 28/(28 + 6) = 0.82$
$\hat{p}_\text{west} = 19/(19 + 7) = 0.73$
First shuffle, same as original
Location | Cola | Orange |
---|---|---|
East | 28 | 6 |
West | 19 | 7 |
Second shuffle
Location | Cola | Orange |
---|---|---|
East | 27 | 7 |
West | 20 | 6 |
Third shuffle
Location | Cola | Orange |
---|---|---|
East | 28 | 8 |
West | 21 | 5 |
Fourth shuffle
Location | Cola | Orange |
---|---|---|
East | 25 | 9 |
West | 22 | 4 |
Fifth shuffle
Location | Cola | Orange |
---|---|---|
East | 29 | 5 |
West | 18 | 8 |
Fifth shuffle
Location | Cola | Orange |
---|---|---|
East | 29 | 5 |
West | 18 | 8 |
soda %>%
group_by(location) %>%
summarize(prop_cola =
mean(drink == "cola")) %>%
summarize(diff(prop_cola))
# A tibble: 1 x 1
`diff(prop_cola)`
<dbl>
1 -0.09276018
library(infer)
soda %>% specify(drink ~ location,
success = "cola") %>%
hypothesize(null = "independence") %>%
generate(reps = 1, type = "permute") %>%
calculate(stat = "diff in props",
order = c("west","east"))
# A tibble: 1 x 2
replicate stat
<int> <dbl>
1 1 -0.02488688
soda %>%
specify(drink ~ location, success = "cola") %>%
hypothesize(null = "independence") %>%
generate(reps = 5, type = "permute") %>%
calculate(stat = "diff in props", order = c("west", "east"))
# A tibble: 5 x 2
replicate stat
<int> <dbl>
1 1 0.04298643
2 2 -0.09276018
3 3 0.11085973
4 4 0.17873303
5 5 -0.16063348
Foundations of Inference in R