Searching for and replacing missing values

Dealing With Missing Data in R

Nicholas Tierney

Statistician

What we are going to cover

  • How to look for hidden missing values
  • Replacing missing value labels with NA
  • Checking your assumptions on missingness
Dealing With Missing Data in R

Searching for and replacing missing values

  • Ideal = NA
  • Missing values can be coded incorrectly: e.g., "missing", "Not Available", "N/A"
  • Assuming that missing values are coded as NA. This is a mistake.
Dealing With Missing Data in R

Understanding Chaos

score grade place
3 N/A -99
-99 E 97
4 missing 95
-99 na 92
7 n/a -98
10 missing
12 . 88
16 .
9 N/a 86

Dealing With Missing Data in R

Searching for missing values

miss_scan_count()

chaos %>% miss_scan_count(search = list("N/A"))
# A tibble: 3 x 2
  Variable     n
  <chr>    <int>
1 score        0
2 grade        1
3 place        0
Dealing With Missing Data in R

Searching for missing values

chaos %>%
  miss_scan_count(search = list("N/A",
                                "N/a"))
# A tibble: 3 x 2
  Variable     n
  <chr>    <int>
1 score        0
2 grade        2
3 place        0
Dealing With Missing Data in R

Replacing missing values

chaos %>%
  replace_with_na(replace = list(grade = c("N/A", "N/a")))
# A tibble: 9 x 3
  score grade   place  
  <dbl> <chr>   <chr>  
1     3 NA      -99    
2   -99 E       97     
3     4 missing 95     
4   -99 na      92     
5     7 n/a     -98    
6    10 " "     missing
7    12 .       88     
8    16 ""      .      
9     9 NA      86
Dealing With Missing Data in R

"scoped variants" of replace_with_na

  • replace_with_na can be repetitive:

    • Use it across many different variables and values
    • Complex cases, replacing values less than -1, only affect character columns.
  • replace_with_na_all() All variables.

  • replace_with_na_at() A subset of selected variables.
  • replace_with_na_if() A subset of variables that fulfill some condition ( numeric, character).
Dealing With Missing Data in R

Using scoped variants of replace_with_na

chaos %>%
  replace_with_na_all(condition = ~.x == -99)
# A tibble: 9 x 3
  score grade   place  
  <dbl> <chr>   <chr>  
1     3 N/A     NA     
2    NA E       97     
3     4 missing 95     
4    NA na      92     
5     7 n/a     -98    
6    10 " "     missing
7    12 .       88     
8    16 ""      .      
9     9 N/a     86
Dealing With Missing Data in R

Using scoped variants of replace_with_na

chaos %>% 
  replace_with_na_all(condition = ~.x %in% c("N/A", "missing", "na"))
# A tibble: 9 x 3
  score grade place
  <dbl> <chr> <chr>
1     3 NA    -99  
2   -99 E     97   
3     4 NA    95   
4   -99 NA    92   
5     7 n/a   -98  
6    10 " "   NA   
7    12 .     88   
8    16 ""    .    
9     9 N/a   86
Dealing With Missing Data in R

Let's practice!

Dealing With Missing Data in R

Preparing Video For Download...