Final remarks

Handling Missing Data with Imputations in R

Michal Oleszak

Machine Learning Engineer

What you know

Chapter 1: The problem of missing data:

  • Modeling incomplete data requires a special treatment.
  • The 3 missing data mechanisms: MCAR, MAR and MNAR.
  • Visualizations and statistical tests provide insights into missing data patterns.

Chapter 2: Donor-based imputation:

  • Mean imputation is typically a poor choice.
  • Hot-deck imputation.
  • k-Nearest-Neighbors imputation.
Handling Missing Data with Imputations in R

What you know

Chapter 3: Model-based imputation

  • Looping over variables and imputing them until convergence.
  • Replicating data variability by drawing from conditional distributions.
  • Tree-based imputation using random forests.

Chapter 4: Uncertainty from imputation

  • Multiple imputation by bootstrapping.
  • Multiple imputation with MICE.
Handling Missing Data with Imputations in R

Which imputation method to choose?

Some loose guidelines:

  • Huge data or has to run in real-time in production?

    Use hot-deck imputation.

  • Suspect specific relations between the variables based on domain knowledge?

    Use model-based imputation.

  • Need not be very fast and the relations between variables are not obvious?

    Use kNN or tree-based imputation.

Handling Missing Data with Imputations in R

How to estimate uncertainty from imputation?

Some loose guidelines:

  • Has to be relatively fast?
  • You have ideas about which models to use and how to specify them?

Use MICE.

  • Want to use a non-parametric method (kNN, hot-deck)?
  • Don't want to worry about assumptions of specific models?

Use bootstrapping.

Handling Missing Data with Imputations in R

Next steps

  • miceVignettes:

    • Passive imputation and post-processing
    • Imputing multi-level data
    • Sensitivity analysis
  • S. van Buuren (2018). Flexible Imputation of Missing Data. Second Edition. CRC/Chapman & Hall, FL: Boca Raton.

  • Other R packages: Amelia, mi

The cover of the book "Flexible Imputation of Missing Data" by Stef van Buuren.

Handling Missing Data with Imputations in R

Congratulations and good luck!

Handling Missing Data with Imputations in R

Preparing Video For Download...