Introduction to missing data

Dealing With Missing Data in R

Nicholas Tierney

Statistician

Introduction

The best thing to do with missing data is to not have any

--Gertrude Mary Cox

  • Working with real-world data = working with missing data

  • Missing data can have unexpected effects on your analysis

  • Bad imputation can lead to poor estimates and decisions.

Dealing With Missing Data in R

What will you learn

  • What missing values are
  • How to find missing data
  • How to wrangle and tidy missing data
  • Explore why is data missing
  • Impute missing values
Dealing With Missing Data in R

Assumed knowledge

Dealing With Missing Data in R

What are missing values?

Missing values are values that should have been recorded but were not.

NA = Not Available.

Dealing With Missing Data in R

How do I check if I have missing values?

x <- c(1, NA, 3, NA, NA, 5)

any_na(x)
TRUE
are_na(x)
FALSE  TRUE FALSE  TRUE  TRUE FALSE
n_miss(x)
3
prop_miss(x)
0.5
Dealing With Missing Data in R

Working with missing data

NA + anything = NA

heights
Sophie    Dan   Fred 
   165    177     NA
sum(heights)
NA
Dealing With Missing Data in R

Missing data gotchas

NaN: Not a Number.

any_na(NaN)
TRUE
any_na(NULL)
FALSE
any_na(Inf)
FALSE
Dealing With Missing Data in R

Missing data gotchas (2)

NA | TRUE
TRUE
NA | FALSE
NA
NA + NaN
NA
NaN + NA
NaN
Dealing With Missing Data in R

Let's practice!

Dealing With Missing Data in R

Preparing Video For Download...