Missing Data dependence

Dealing With Missing Data in R

Nicholas Tierney

Statistician

Outline

  • MCAR Missing Completely at Random

  • MAR Missing At Random

  • MNAR Missing Not At Random

Dealing With Missing Data in R

MCAR: What is it?

Missingness has no association with any data you have observed, or not observed.

test vacation
NA TRUE
11.533340 FALSE
10.126115 TRUE
NA FALSE
NA TRUE
8.551881 FALSE
NA FALSE
NA TRUE
10.608264 TRUE
Dealing With Missing Data in R

MCAR: What are the implications

Implications

  • Imputation is advisable
  • Deleting observations may reduce sample size, limiting inference, but will not bias
  • You should be imputing data
Dealing With Missing Data in R

MAR: What is it?

Missingness depends on data observed, but not data observed

Implications:

  • Impute
  • Deleting observations not ideal, may lead to bias
test vacation depression
NA TRUE 87.93109
11.533340 FALSE 40.02708
10.126115 TRUE 48.62883
NA FALSE 88.21743
NA TRUE 90.29282
8.551881 FALSE 44.77343
NA FALSE 89.48865
NA TRUE 89.99209
10.608264 TRUE 45.56832
Dealing With Missing Data in R

MNAR: What is it?

Missingness of the response is related to an unobserved value relevant to the assessment of interest.

Implications:

  • Data will be biased from deletion and imputation
  • Inference can be limited, proceed with caution.
test vacation depression
NA TRUE NA
11.533340 FALSE 11.533340
10.126115 TRUE 10.126115
NA FALSE NA
NA TRUE NA
8.551881 FALSE 8.551881
NA FALSE NA
NA TRUE NA
10.608264 TRUE 10.608264
Dealing With Missing Data in R

Example: MCAR

vis_miss(mt_cars, cluster = TRUE)

Dealing With Missing Data in R

Example: MAR

oceanbuoys %>% arrange(year) %>% vis_miss()

Dealing With Missing Data in R

Example: MNAR

vis_miss(ocean, cluster = TRUE)

Dealing With Missing Data in R

Let's practice!

Dealing With Missing Data in R

Preparing Video For Download...