Hypothesis test for a proportion

Inference for Categorical Data in R

Andrew Bray

Assistant Professor of Statistics at Reed College

Do half of Americans favor capital punishment?

gss2016 %>%
  ggplot(aes(x = cappun)) +
  geom_bar()

p_hat <- gss2016 %>%
  summarize(mean(cappun == "FAVOR")) %>%
  pull()
p_hat

0.5666667

Do half of Americans favor capital punishment?

null <- gss2016 %>%
  specify(
    response = cappun, 
    success = "FAVOR"
  ) %>%

  hypothesize(
    null = "point", 
    p = 0.5
  ) %>%

  generate(
    reps = 500, 
    type = "simulate"
  ) %>%

  calculate(stat = "prop")

A tibble: 500 x 2
   replicate  stat
   <fct>     <dbl>
 1 1         0.48 
 2 2         0.447
 3 3         0.48 
 4 4         0.44 
 5 5         0.407
 6 6         0.52 
 7 7         0.413
 8 8         0.553
 9 9         0.52 
10 10        0.467
# … with 490 more rows

Do half of Americans favor capital punishment?

ggplot(null, aes(x = stat)) +
  geom_density() +
  geom_vline(
    xintercept = p_hat, 
    color = "red"
  )

null %>%
  summarize(mean(stat > p_hat)) %>%
  pull() * 2

Hypothesis test

Null hypothesis: theory about the state of the world.
Null distribution: distribution of test statistics assuming null is true.
p-value: a measure of consistency between null hypothesis and your observations.
- high p-value: consistent (p-val > alpha)
- low p-value: inconsistent (p-val < alpha)

Let's practice!

Inference for Categorical Data in R