Exploring two categorical variables

Analyzing Survey Data in R

Kelly McConville

Assistant Professor of Statistics

NHANES: race and diabetes

svytable(~Diabetes, design = NHANES_design)
Diabetes
       No       Yes 
275814034  24335536
tab_w <- svytable(~Race1 + Diabetes, design = NHANES_design) 
tab_w
          Diabetes
Race1             No       Yes
  Black     32697528   4003497
  Hispanic  17258245   1370393
  Mexican   27886500   2081657
  White    177088354  14708094
  Other     20883407   2171895
Analyzing Survey Data in R
tab_w <- as.data.frame(tab_w)
tab_w
      Race1 Diabetes      Freq
1     Black       No  32697528
2  Hispanic       No  17258245
3   Mexican       No  27886500
4     White       No 177088354
5     Other       No  20883407
6     Black      Yes   4003497
7  Hispanic      Yes   1370393
8   Mexican      Yes   2081657
9     White      Yes  14708094
10    Other      Yes   2171895
ggplot(data = tab_w, mapping = aes(x = Race1, fill = Diabetes, y = Freq)) +
  geom_col() + 
  coord_flip()
Analyzing Survey Data in R

NHANES: race and diabetes

Barplot of diabetes counts by race

Analyzing Survey Data in R

NHANES: race and diabetes

ggplot(data = tab_w, mapping = aes(x = Race1, 
                                   y = Freq,
                                fill = Diabetes)) + 
  geom_col(position = "fill") + 
  coord_flip()

Barplot of distribution of diabetes by race

Analyzing Survey Data in R

Let's practice!

Analyzing Survey Data in R

Preparing Video For Download...