Reshaping Data with tidyr
Jeroen Boeye
Head of Machine Learning, Faktion
Spreadsheets
CSV
name,gender,date
Dezik,Male,1951-07-22
Dezik,Male,1951-07-29
Tsygan,Male,1951-07-22
Lisa,Female,1951-07-29
Chizhik,Male,1951-08-15
JSON
{
"name": "Darth Vader",
"species": "Human",
"homeworld": "Tatooine",
"films": [
"Revenge of the Sith",
"Return of the Jedi",
"The Empire Strikes Back",
"A New Hope"
]
}
XML
<note>
<from>Teacher</from>
<to>Student</to>
<heading>Almost there</heading>
<body>It's the final chapter!</body>
</note>
rjson::fromJSON(file = "star_wars.json")
[[1]]
[[1]]$name
[1] "Darth Vader"
[[1]]$films
[1] "Revenge of the Sith" "Return of the Jedi" "The Empire Strikes Back" "A New Hope"
[[2]]
[[2]]$name
[1] "Jar Jar Binks"
[[2]]$films
[1] "Attack of the Clones" "The Phantom Menace"
star_wars_list <- rjson::fromJSON(file = "star_wars.json")
tibble(character = star_wars_list)
# A tibble: 2 x 1
character
<list>
1 <named list [2]>
2 <named list [2]>
tibble(character = star_wars_list) %>%
unnest_wider(character)
# A tibble: 2 x 2
name films
<chr> <list>
1 Darth Vader <chr [4]>
2 Jar Jar Binks <chr [2]>
tibble(character = star_wars_list) %>%
unnest_wider(character) %>%
unnest_wider(films)
# A tibble: 2 x 5
name ...1 ...2 ...3 ...4
<chr> <chr> <chr> <chr> <chr>
1 Darth Vader Revenge of the Sith Return of the Jedi The Empire Strikes Back A New Hope
2 Jar Jar Binks Attack of the Clones The Phantom Menace NA NA
Reshaping Data with tidyr