Reshaping Data with tidyr
Jeroen Boeye
Head of Machine Learning, Faktion
Happy families are all alike, but every unhappy family is unhappy in its own way.
Leo Tolstoy
Tidy datasets are all alike, but every messy dataset is messy in its own way.
Hadley Wickham
Structure
Structure
Structure
Structure
character_df
# A tibble: 4 x 3
name homeworld species
<chr> <chr> <chr>
1 Luke Skywalker Tatooine Human
2 R2-D2 Naboo Droid
3 Darth Vader Tatooine Human
4 Obi-Wan Kenobi Stewjon Human
character_df %>%
select(name, homeworld)
# A tibble: 4 x 2
name homeworld
<chr> <chr>
1 Luke Skywalker Tatooine
2 R2-D2 Naboo
3 Darth Vader Tatooine
4 Obi-Wan Kenobi Stewjon
character_df %>%
filter(homeworld == "Tatooine")
# A tibble: 2 x 3
name homeworld species
<chr> <chr> <chr>
1 Luke Skywalker Tatooine Human
2 Darth Vader Tatooine Human
character_df %>%
mutate(is_human = species == "Human")
# A tibble: 4 x 4
name homeworld species is_human
<chr> <chr> <chr> <lgl>
1 Luke Skywalker Tatooine Human TRUE
2 R2-D2 Naboo Droid FALSE
3 Darth Vader Tatooine Human TRUE
4 Obi-Wan Kenobi Stewjon Human TRUE
character_df %>%
group_by(homeworld) %>%
summarize(n = n())
# A tibble: 3 x 2
homeworld n
<chr> <int>
1 Naboo 1
2 Stewjon 1
3 Tatooine 2
population_df
# A tibble: 4 x 2
country population
<chr> <dbl>
1 Brazil, South America 210.
2 Nepal, Asia 28.1
3 Senegal, Africa 15.8
4 Australia, Oceania 25.0
population_df %>%
separate(country, into = c("country", "continent"), sep = ", ")
# A tibble: 4 x 3
country continent population
<chr> <chr> <dbl>
1 Brazil South America 210.
2 Nepal Asia 28.1
3 Senegal Africa 15.8
4 Australia Oceania 25.0
Reshaping Data with tidyr