Dealing With Missing Data in R
Nicholas Tierney
Statistician
The best thing to do with missing data is to not have any
--Gertrude Mary Cox
Working with real-world data = working with missing data
Missing data can have unexpected effects on your analysis
Bad imputation can lead to poor estimates and decisions.
Missing values are values that should have been recorded but were not.
NA
= Not Available.
x <- c(1, NA, 3, NA, NA, 5)
any_na(x)
TRUE
are_na(x)
FALSE TRUE FALSE TRUE TRUE FALSE
n_miss(x)
3
prop_miss(x)
0.5
NA
+ anything = NA
heights
Sophie Dan Fred
165 177 NA
sum(heights)
NA
NaN
: Not a Number.
any_na(NaN)
TRUE
any_na(NULL)
FALSE
any_na(Inf)
FALSE
NA | TRUE
TRUE
NA | FALSE
NA
NA + NaN
NA
NaN + NA
NaN
Dealing With Missing Data in R