Dealing With Missing Data in R
Nicholas Tierney
Statistician
NA
NA
NA
. This is a mistake.score | grade | place |
---|---|---|
3 | N/A | -99 |
-99 | E | 97 |
4 | missing | 95 |
-99 | na | 92 |
7 | n/a | -98 |
10 | missing | |
12 | . | 88 |
16 | . | |
9 | N/a | 86 |
miss_scan_count()
chaos %>% miss_scan_count(search = list("N/A"))
# A tibble: 3 x 2
Variable n
<chr> <int>
1 score 0
2 grade 1
3 place 0
chaos %>%
miss_scan_count(search = list("N/A",
"N/a"))
# A tibble: 3 x 2
Variable n
<chr> <int>
1 score 0
2 grade 2
3 place 0
chaos %>%
replace_with_na(replace = list(grade = c("N/A", "N/a")))
# A tibble: 9 x 3
score grade place
<dbl> <chr> <chr>
1 3 NA -99
2 -99 E 97
3 4 missing 95
4 -99 na 92
5 7 n/a -98
6 10 " " missing
7 12 . 88
8 16 "" .
9 9 NA 86
replace_with_na
can be repetitive:
replace_with_na_all()
All variables.
replace_with_na_at()
A subset of selected variables.replace_with_na_if()
A subset of variables that fulfill some condition ( numeric, character).chaos %>%
replace_with_na_all(condition = ~.x == -99)
# A tibble: 9 x 3
score grade place
<dbl> <chr> <chr>
1 3 N/A NA
2 NA E 97
3 4 missing 95
4 NA na 92
5 7 n/a -98
6 10 " " missing
7 12 . 88
8 16 "" .
9 9 N/a 86
chaos %>%
replace_with_na_all(condition = ~.x %in% c("N/A", "missing", "na"))
# A tibble: 9 x 3
score grade place
<dbl> <chr> <chr>
1 3 NA -99
2 -99 E 97
3 4 NA 95
4 -99 NA 92
5 7 n/a -98
6 10 " " NA
7 12 . 88
8 16 "" .
9 9 N/a 86
Dealing With Missing Data in R