Chi-square goodness of fit tests

Hypothesis Testing in R

Richie Cotton

Data Evangelist at DataCamp

Purple links

You search for a coding solution online and the first result link is purple because you already visited it. How do you feel?

purple_link_counts <- stack_overflow %>% 
  count(purple_link)
# A tibble: 4 x 2
  purple_link           n
  <fct>             <int>
1 Hello, old friend  1330
2 Amused              409
3 Indifferent         426
4 Annoyed             290
Hypothesis Testing in R

Declaring the hypotheses

hypothesized <- tribble(
  ~ purple_link, ~ prop,
  "Hello, old friend", 1 / 2,
  "Amused"           , 1 / 6,
  "Indifferent"      , 1 / 6,
  "Annoyed"          , 1 / 6
)
# A tibble: 4 x 2
  purple_link        prop
  <chr>             <dbl>
1 Hello, old friend 0.5  
2 Amused            0.167
3 Indifferent       0.167
4 Annoyed           0.167

$H_{0}$: The sample matches with the hypothesized distribution.

$H_{A}$: The sample does not match with the hypothesized distribution.

The test statistic, $\chi^{2}$, measures how far observed results are from expectations in each group.

alpha <- 0.01
1 tribble is short for "row-wise tibble"; not to be confused with the alien species from Star Trek
Hypothesis Testing in R

Hypothesized counts by category

n_total <- nrow(stack_overflow)
hypothesized <- tribble(
  ~ purple_link, ~ prop,
  "Hello, old friend", 1 / 2,
  "Amused"           , 1 / 6,
  "Indifferent"      , 1 / 6,
  "Annoyed"          , 1 / 6
) %>%
  mutate(n = prop * n_total)
# A tibble: 4 x 3
  purple_link        prop     n
  <chr>             <dbl> <dbl>
1 Hello, old friend 0.5   1228.
2 Amused            0.167  409.
3 Indifferent       0.167  409.
4 Annoyed           0.167  409.
Hypothesis Testing in R

Visualizing counts

ggplot(purple_link_counts, aes(purple_link, n)) +
  geom_col() +
  geom_point(data = hypothesized, color = "purple")

Bar plot of number of answers vs purple_link answer with purple points representing the hypothesized number

Hypothesis Testing in R

chi-square goodness of fit test using chisq_test()

hypothesized_props <- c(
  "Hello, old friend" = 1 / 2,
  Amused              = 1 / 6,
  Indifferent         = 1 / 6,
  Annoyed             = 1 / 6
)
library(infer)
stack_overflow %>% 
  chisq_test(
    response = purple_link,
    p = hypothesized_props
  )
# A tibble: 1 x 3
  statistic chisq_df       p_value
      <dbl>    <dbl>         <dbl>
1      44.0        3 0.00000000154
Hypothesis Testing in R

Let's practice!

Hypothesis Testing in R

Preparing Video For Download...