Intro to non-rectangular data

Reshaping Data with tidyr

Jeroen Boeye

Head of Machine Learning, Faktion

Rectangular data

Spreadsheets

spreadsheet

CSV

name,gender,date
Dezik,Male,1951-07-22
Dezik,Male,1951-07-29
Tsygan,Male,1951-07-22
Lisa,Female,1951-07-29
Chizhik,Male,1951-08-15
Reshaping Data with tidyr

Non-rectangular formats

JSON

{
    "name": "Darth Vader",
    "species": "Human",
    "homeworld": "Tatooine",
    "films": [
        "Revenge of the Sith",
        "Return of the Jedi",
        "The Empire Strikes Back",
        "A New Hope"
    ]
}

XML

<note>
  <from>Teacher</from>
  <to>Student</to>
  <heading>Almost there</heading>
  <body>It's the final chapter!</body>
</note>
1 Star Wars data from the repurrrsive package.
Reshaping Data with tidyr

A list of lists of lists

rjson::fromJSON(file = "star_wars.json")
[[1]]
[[1]]$name
[1] "Darth Vader"
[[1]]$films
[1] "Revenge of the Sith" "Return of the Jedi" "The Empire Strikes Back" "A New Hope"             

[[2]]
[[2]]$name
[1] "Jar Jar Binks"
[[2]]$films
[1] "Attack of the Clones" "The Phantom Menace"
Reshaping Data with tidyr

A first step to rectangling

star_wars_list <- rjson::fromJSON(file = "star_wars.json")
tibble(character = star_wars_list)
# A tibble: 2 x 1
  character       
  <list>          
1 <named list [2]>
2 <named list [2]>
Reshaping Data with tidyr

Unnesting lists to columns

tibble(character = star_wars_list) %>% 
  unnest_wider(character)
# A tibble: 2 x 2
  name          films    
  <chr>         <list>   
1 Darth Vader   <chr [4]>
2 Jar Jar Binks <chr [2]>
Reshaping Data with tidyr

Unnesting lists to columns

tibble(character = star_wars_list) %>% 
  unnest_wider(character) %>% 
  unnest_wider(films)
# A tibble: 2 x 5
  name          ...1                 ...2               ...3                    ...4      
  <chr>         <chr>                <chr>              <chr>                   <chr>     
1 Darth Vader   Revenge of the Sith  Return of the Jedi The Empire Strikes Back A New Hope
2 Jar Jar Binks Attack of the Clones The Phantom Menace NA                      NA
Reshaping Data with tidyr

Let's practice!

Reshaping Data with tidyr

Preparing Video For Download...