Analyzing Survey Data in R
Kelly McConville
Assistant Professor of Statistics
It is collected using a 4 stage design.
Stage 0: The U.S. is stratified by geography and proportion of minority populations.
library(NHANES)
dim(NHANESraw)
20293 78
library(dplyr)
summarize(NHANESraw, N_hat = sum(WTMEC2YR))
# A tibble: 1 x 1
N_hat
<dbl>
1 608534400
NHANESraw <- mutate(NHANESraw, WTMEC4YR = WTMEC2YR / 2)
NHANES_design <- svydesign(data = NHANESraw, strata = ~SDMVSTRA, id = ~SDMVPSU, nest = TRUE, weights = ~WTMEC4YR)
distinct(NHANESraw, SDMVPSU)
# A tibble: 3 x 1
SDMVPSU
<int>
1 1
2 2
3 3
Analyzing Survey Data in R