Introduction to NHANES and sampling

Experimental Design in R

Joanne Xiong

Data Scientist

Intro to NHANES dataset

NHANES = National Health and Nutrition Examination Survey

  • Conducted by the National Center for Health Statistics (NCHS), a division of the Centers for Disease Control (CDC)

  • Data collected a variety of ways, including interviews & a physical exam

  • Questions cover medical, dental, socioeconomic, dietary, and general health-related conditions

Experimental Design in R

Intro to sampling

Probability Sampling: probability is used to select the sample (in various ways)

Non-probability Sampling: probability is not used to select the sample

  • Voluntary response: whoever agrees to respond is the sample
  • Convenience sampling: subjects convenient to the researcher are chosen.
Experimental Design in R

Sampling - Part 1

Simple Random Sampling (SRS)

  • Every unit in a population has an equal probability of being sampled
sample()

Stratified Sampling

  • Splitting your population by some strata variable
  • Taking a simple random sample inside of each stratified group
dataset %>% 
   group_by(strata_variable) %>% 
   slice_sample()
Experimental Design in R

Sampling - Part 2

Cluster Sampling

  • Divide the population into groups called clusters
cluster(dataset, 
        cluster_var_name,
        number_to_select,
        method = "option")

Systematic Sampling

  • Choosing a sample in a systematic way
  • Best implemented in R with a custom function

Multi-stage Sampling

  • Combines one or more sampling methods
Experimental Design in R

Let's practice!

Experimental Design in R

Preparing Video For Download...