Intervals for differences

Inference for Categorical Data in R

Andrew Bray

Assistant Professor of Statistics at Reed College

A question in two variables

Do women and men believe at different rates?

Let $p$ be the proportion that believe in life after death.

  • $H_{0} : p_{female} - p_{male} = 0$
  • $H_{A} : p_{female} - p_{male} \ne 0$
Inference for Categorical Data in R

Do women and men have different opinions on life after death?

ggplot(gss2016, aes(x = sex, fill = postlife)) +
  geom_bar()

ch2v2-postlife-barplot.png

Inference for Categorical Data in R

Do women and men have different opinions on life after death?

ggplot(gss2016, aes(x = sex, fill = postlife)) +
  geom_bar(position = "fill")

ch2v2-postlife-barplot-filled.png

Inference for Categorical Data in R

Do women and men have different opinions on life after death?

p_hats <- gss2016 %>%
  group_by(sex) %>%
  summarize(mean(postlife == "YES", na.rm = TRUE)) %>%
  pull()
d_hat <- diff(p_hats)
d_hat
0.1472851
Inference for Categorical Data in R

Generating data from H0

  • $H_{0} : p_{female} - p_{male} = 0$
  • There is no association between belief in the afterlife and the sex of a subject.
  • The variable postlife is independent from the variable sex.

Generate data by permutation

Inference for Categorical Data in R

Do women and men have different opinions on life after death?

gss2016 %>%
  specify(
    response = postlife, 
    explanatory = sex, 
    success = "YES"
  ) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1, type = "permute")
Inference for Categorical Data in R

Do women and men have different opinions on life after death?

gss2016 %>%
  specify(
    postlife ~ sex,  # this line is new
    success = "YES"
  ) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1, type = "permute")
Response: postlife (factor)
Explanatory: sex (factor)
Null Hypothesis:  independence 
# A tibble: 137 x 3
# Groups:   replicate [1]
   postlife sex    replicate
   <fct>    <fct>      <int>
 1 YES      FEMALE         1
 2 YES      MALE           1
 3 YES      FEMALE         1
 4 YES      MALE           1
 5 YES      MALE           1
 6 YES      FEMALE         1
 7 NO       FEMALE         1
Inference for Categorical Data in R

Do women and men have different opinions on life after death?

gss2016 %>%
  specify(
    postlife ~ sex, 
    success = "YES"
  ) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1, type = "permute")
Response: postlife (factor)
Explanatory: sex (factor)
Null Hypothesis:  independence 
# A tibble: 137 x 3
# Groups:   replicate [1]
   postlife sex    replicate
   <fct>    <fct>      <int>
 1 YES      FEMALE         1
 2 NO       MALE           1
 3 NO       FEMALE         1
 4 YES      MALE           1
 5 YES      MALE           1
 6 YES      FEMALE         1
 7 YES      FEMALE         1
Inference for Categorical Data in R

Do women and men have different opinions on life after death?

gss2016 %>%
  specify(postlife ~ sex, success = "YES") %>%
  hypothesize(null = "independence") %>%
  generate(reps = 500, type = "permute") %>%
  calculate(stat = "diff in props", order = c("FEMALE", "MALE"))
Warning message:
Removed 13 rows containing missing values.
Inference for Categorical Data in R

Do women and men have different opinions on life after death?

ggplot(null, aes(x = stat)) +
  geom_density() +
  geom_vline(xintercept = d_hat, color = "red")

These data suggest that there is a difference between sexes in the belief of life after death.

ch2v2-density-plot.png

Inference for Categorical Data in R

Let's practice!

Inference for Categorical Data in R

Preparing Video For Download...