Dealing With Missing Data in R
Nicholas Tierney
Statistician
Census data containing:
income | education |
---|---|
48.69087 | NA |
40.93218 | NA |
52.69245 | high_school |
31.33808 | NA |
89.35671 | university |
103.87278 | university |
Two main features
income | education | income_NA | education_NA |
---|---|---|---|
48.69087 | NA | !NA | NA |
40.93218 | NA | !NA | NA |
52.69245 | high_school | !NA | !NA |
31.33808 | NA | !NA | NA |
89.35671 | university | !NA | !NA |
103.87278 | university | !NA | !NA |
bind_shadow(airquality)
# A tibble: 153 x 12
Ozone Solar.R Wind Temp Month Day Ozone_NA Solar.R_NA Wind_NA Temp_NA
<int> <int> <dbl> <int> <int> <int> <fct> <fct> <fct> <fct>
1 41 190 7.4 67 5 1 !NA !NA !NA !NA
2 36 118 8 72 5 2 !NA !NA !NA !NA
3 12 149 12.6 74 5 3 !NA !NA !NA !NA
4 18 313 11.5 62 5 4 !NA !NA !NA !NA
5 NA NA 14.3 56 5 5 NA NA !NA !NA
6 28 NA 14.9 66 5 6 !NA NA !NA !NA
7 23 299 8.6 65 5 7 !NA !NA !NA !NA
8 19 99 13.8 59 5 8 !NA !NA !NA !NA
9 8 19 20.1 61 5 9 !NA !NA !NA !NA
10 NA 194 8.6 69 5 10 NA !NA !NA !NA
# ... with 143 more rows, and 2 more variables: Month_NA <fct>, Day_NA <fct>
airquality %>%
bind_shadow() %>%
group_by(Ozone_NA) %>%
summarize(mean = mean(Wind))
Ozone_NA | mean |
---|---|
!NA | 9.862069 |
NA | 10.256757 |
Dealing With Missing Data in R