Goodness of fit

Inference for Categorical Data in R

Andrew Bray

Assistant Professor of Statistics at Reed College

First Digit Distribution

4-2-1.png

Inference for Categorical Data in R

First Digit Distribution

4-2-2.png

Inference for Categorical Data in R

First Digit Distribution

4-2-3.png

Inference for Categorical Data in R

First Digit Distribution

4-2-4.png

Inference for Categorical Data in R

First Digit Distribution

4-2-5.png

Inference for Categorical Data in R

Chi-squared distance

4-2-6.png

Inference for Categorical Data in R

Chi-squared distance

4-2-7.png

Inference for Categorical Data in R

Chi-squared distance

4-2-8.png

Inference for Categorical Data in R

Chi-squared distance

4-2-9.png

Inference for Categorical Data in R

Chi-squared distance

4-2-10.png

Inference for Categorical Data in R

Chi-squared distance

4-2-11.png

Inference for Categorical Data in R

Chi-squared distance

4-2-12.png

Inference for Categorical Data in R

Chi-squared distance

4-2-13.png

Inference for Categorical Data in R

Chi-squared distance

4-2-14.png

Inference for Categorical Data in R

Chi-squared distance

4-2-15.png

Inference for Categorical Data in R

First Digit Distribution

4-2-16.png

Inference for Categorical Data in R

First Digit Distribution

4-2-17.png

Inference for Categorical Data in R

First Digit Distribution

4-2-18.png

Inference for Categorical Data in R

First Digit Distribution

4-2-19.png

Inference for Categorical Data in R

Example: uniformity of party

ggplot(gss2016, aes(x = party)) +
  geom_bar() +
  geom_hline(yintercept = 149/3, color = "goldenrod", size = 2)
tab <- gss2016 %>%
     select(party) %>%
     table()
tab
Dem Ind Rep 
 43  72  34
p_uniform <- c(Dem = 1/3, Ind = 1/3, Rep = 1/3)
chisq.test(tab, p = p_uniform)$stat
X-squared 
 15.87919

4-2-20.png

Inference for Categorical Data in R

Simulating the null

gss2016 %>%
  specify(response = party) %>%
  hypothesize(null = "point", p = p_uniform) %>%
  generate(reps = 1, type = "simulate")
# A tibble: 149 x 2
# Groups:   replicate [1]
   party replicate
   <fct> <fct>    
 1 I        1        
 2 D        1        
 3 I        1        
 4 I        1        
 5 D        1        
 6 R        1        
 7 I        1        
 8 R        1        
 9 D        1        
10 I        1        
# ... with 139 more rows
Inference for Categorical Data in R

Simulating the null

sim_1 <- gss2016 %>%
   specify(response = party) %>%
   hypothesize(null = “point”, p = p_uniform) %>%
   generate(reps = 1, type = "simulate")
ggplot(sim_1, aes(x = party)) +
   geom_bar()

4-2-21.png

Inference for Categorical Data in R

Let's practice!

Inference for Categorical Data in R

Preparing Video For Download...