Hypothesis Testing in R
Richie Cotton
Data Evangelist at DataCamp
You search for a coding solution online and the first result link is purple because you already visited it. How do you feel?
purple_link_counts <- stack_overflow %>%
count(purple_link)
# A tibble: 4 x 2
purple_link n
<fct> <int>
1 Hello, old friend 1330
2 Amused 409
3 Indifferent 426
4 Annoyed 290
hypothesized <- tribble(
~ purple_link, ~ prop,
"Hello, old friend", 1 / 2,
"Amused" , 1 / 6,
"Indifferent" , 1 / 6,
"Annoyed" , 1 / 6
)
# A tibble: 4 x 2
purple_link prop
<chr> <dbl>
1 Hello, old friend 0.5
2 Amused 0.167
3 Indifferent 0.167
4 Annoyed 0.167
$H_{0}$: The sample matches with the hypothesized distribution.
$H_{A}$: The sample does not match with the hypothesized distribution.
The test statistic, $\chi^{2}$, measures how far observed results are from expectations in each group.
alpha <- 0.01
n_total <- nrow(stack_overflow)
hypothesized <- tribble(
~ purple_link, ~ prop,
"Hello, old friend", 1 / 2,
"Amused" , 1 / 6,
"Indifferent" , 1 / 6,
"Annoyed" , 1 / 6
) %>%
mutate(n = prop * n_total)
# A tibble: 4 x 3
purple_link prop n
<chr> <dbl> <dbl>
1 Hello, old friend 0.5 1228.
2 Amused 0.167 409.
3 Indifferent 0.167 409.
4 Annoyed 0.167 409.
ggplot(purple_link_counts, aes(purple_link, n)) +
geom_col() +
geom_point(data = hypothesized, color = "purple")
hypothesized_props <- c(
"Hello, old friend" = 1 / 2,
Amused = 1 / 6,
Indifferent = 1 / 6,
Annoyed = 1 / 6
)
library(infer)
stack_overflow %>%
chisq_test(
response = purple_link,
p = hypothesized_props
)
# A tibble: 1 x 3
statistic chisq_df p_value
<dbl> <dbl> <dbl>
1 44.0 3 0.00000000154
Hypothesis Testing in R