Joining tables to themselves

Joining Data with dplyr

Chris Cardillo

Data Scientist

The themes table

themes
# A tibble: 665 x 3
      id name           parent_id
   <dbl> <chr>              <dbl>
 1     1 Technic               NA
 2     2 Arctic Technic         1
 3     3 Competition            1
 4     4 Expert Builder         1
 5     5 Model                  1
 6     6 Airport                5
 7     7 Construction           5
 8     8 Farm                   5
 9     9 Fire                   5
10    10 Harbor                 5
# … with 655 more rows
Joining Data with dplyr

The hierarchy of themes

hierarchy-of-themes

Joining Data with dplyr

Child-parent table

themes %>%
  inner_join(themes, by = c("parent_id" = "id"))
# A tibble: 544 x 5
      id name.x         parent_id name.y  parent_id.y
   <dbl> <chr>              <dbl> <chr>         <dbl>
 1     2 Arctic Technic         1 Technic          NA
 2     3 Competition            1 Technic          NA
 3     4 Expert Builder         1 Technic          NA
 4     5 Model                  1 Technic          NA
 5     6 Airport                5 Model             1
 6     7 Construction           5 Model             1
 7     8 Farm                   5 Model             1
 8     9 Fire                   5 Model             1
 9    10 Harbor                 5 Model             1
10    11 Off-Road               5 Model             1
# … with 534 more rows
Joining Data with dplyr

Adding a suffix

themes %>%
  inner_join(themes, by = c("parent_id" = "id"), suffix = c("_child", "_parent"))
# A tibble: 544 x 5
      id name_child     parent_id name_parent parent_id_parent
   <dbl> <chr>              <dbl> <chr>                  <dbl>
 1     2 Arctic Technic         1 Technic                   NA
 2     3 Competition            1 Technic                   NA
 3     4 Expert Builder         1 Technic                   NA
 4     5 Model                  1 Technic                   NA
 5     6 Airport                5 Model                      1
 6     7 Construction           5 Model                      1
 7     8 Farm                   5 Model                      1
 8     9 Fire                   5 Model                      1
 9    10 Harbor                 5 Model                      1
10    11 Off-Road               5 Model                      1
# … with 534 more rows
Joining Data with dplyr

Lord of the Rings themes: parent

themes %>%
  inner_join(themes, by = c("parent_id" = "id"), suffix = c("_child", "_parent")) %>%
  filter(name_child == "The Lord of the Rings")
# A tibble: 1 x 5
     id name_child            parent_id name_parent                      parent_id_parent
  <dbl> <chr>                     <dbl> <chr>                                       <dbl>
1   566 The Lord of the Rings       561 The Hobbit and Lord of the Rings               NA
Joining Data with dplyr

Lord of the Rings themes: children

themes %>%
  inner_join(themes, by = c("parent_id" = "id"), suffix = c("_child", "_parent")) %>%
  filter(name_parent == "The Lord of the Rings")
# A tibble: 3 x 5
     id name_child                 parent_id name_parent           parent_id_parent
  <dbl> <chr>                          <dbl> <chr>                            <dbl>
1   567 The Fellowship of the Ring       566 The Lord of the Rings              561
2   568 The Two Towers                   566 The Lord of the Rings              561
3   569 The Return of the King           566 The Lord of the Rings              561
Joining Data with dplyr

The Lord of the Rings trilogy

Lord-of-the-Rings-themes

Joining Data with dplyr

Let's practice!

Joining Data with dplyr

Preparing Video For Download...