Dealing With Missing Data in R
Nicholas Tierney
Statistician
The best thing to do with missing data is to not have any
--Gertrude Mary Cox
Working with real-world data = working with missing data
Missing data can have unexpected effects on your analysis
Bad imputation can lead to poor estimates and decisions.
Missing values are values that should have been recorded but were not.
NA = Not Available.
x <- c(1, NA, 3, NA, NA, 5)any_na(x)
TRUE
are_na(x)
FALSE  TRUE FALSE  TRUE  TRUE FALSE
n_miss(x)
3
prop_miss(x)
0.5
NA + anything = NA
heights
Sophie    Dan   Fred 
   165    177     NA
sum(heights)
NA
NaN: Not a Number.
any_na(NaN)
TRUE
any_na(NULL)
FALSE
any_na(Inf)
FALSE
NA | TRUE
TRUE
NA | FALSE
NA
NA + NaN
NA
NaN + NA
NaN
Dealing With Missing Data in R