Completing data with all value combinations

Rimodellare i dati con tidyr

Jeroen Boeye

Head of Machine Learning, Faktion

Rolling Stones and Beatles

album_df
# A tibble: 3 x 3
   year artist         n_albums
  <int> <chr>             <int>
1  1977 Beatles               2
2  1977 Rolling Stones        1
3  1979 Beatles               1
Rimodellare i dati con tidyr

Initial and target situation

Before complete

After complete

Rimodellare i dati con tidyr

Initial and target situation

Before complete

After complete with fill

Rimodellare i dati con tidyr

The complete() function

album_df %>% 
  complete(year, artist)
# A tibble: 4 x 3
   year artist         n_albums
  <int> <chr>             <int>
1  1977 Beatles               2
2  1977 Rolling Stones        1
3  1979 Beatles               1
4  1979 Rolling Stones       NA
Rimodellare i dati con tidyr

The complete() function: overwriting NA values

album_df %>% 
  complete(year, artist, fill = list(n_albums = 0L))
# A tibble: 4 x 3
   year artist         n_albums
  <int> <chr>             <int>
1  1977 Beatles               2
2  1977 Rolling Stones        1
3  1979 Beatles               1
4  1979 Rolling Stones        0
Rimodellare i dati con tidyr

The complete() function: adding unseen values

album_df %>% 
  complete(
    year,
    artist = c(
      "Beatles", 
      "Rolling Stones", 
      "ABBA"), 
    fill = list(n_albums = 0L)
  )
# A tibble: 6 x 3
   year artist         n_albums
  <int> <chr>             <int>
1  1977 ABBA                  0
2  1977 Beatles               2
3  1977 Rolling Stones        1
4  1979 ABBA                  0
5  1979 Beatles               1
6  1979 Rolling Stones        0
Rimodellare i dati con tidyr

The complete() function: adding unseen values

album_df %>% 
  complete(
    year = 1977:1979,
    artist, 
    fill = list(n_albums = 0L)
  )
# A tibble: 6 x 3
   year artist         n_albums
  <int> <chr>             <int>
1  1977 Beatles               2
2  1977 Rolling Stones        1
3  1978 Beatles               0
4  1978 Rolling Stones        0
5  1979 Beatles               1
6  1979 Rolling Stones        0
Rimodellare i dati con tidyr

Generating a sequence with full_seq()

full_seq(c(1977, 1979), period = 1)
1977 1978 1979
full_seq(c(1977, 1979, 1980, 1980, 1980), period = 1)
1977 1978 1979 1980
full_seq(album_df$year, period = 1)
1977 1978 1979
Rimodellare i dati con tidyr

Using full_seq() inside complete()

album_df %>% 
  complete(
    year = full_seq(year, period = 1),
    artist, 
    fill = list(n_albums = 0L)
  )
# A tibble: 6 x 3
   year artist         n_albums
  <dbl> <chr>             <int>
1  1977 Beatles               2
2  1977 Rolling Stones        1
3  1978 Beatles               0
4  1978 Rolling Stones        0
5  1979 Beatles               1
6  1979 Rolling Stones        0
Rimodellare i dati con tidyr

Generating a date sequence with full_seq()

full_seq(c(as.Date("2000-01-01"), as.Date("2000-01-10")), period = 1)
 [1] "2000-01-01" "2000-01-02" "2000-01-03" "2000-01-04" "2000-01-05"
 [6] "2000-01-06" "2000-01-07" "2000-01-08" "2000-01-09" "2000-01-10"
Rimodellare i dati con tidyr

Let's practice!

Rimodellare i dati con tidyr

Preparing Video For Download...