Machine Learning in the Tidyverse
Dmitriy (Dima) Gorenshteyn
Lead Data Scientist, Memorial Sloan Kettering Cancer Center
library(tidyverse)
nested <- gapminder %>%
group_by(country) %>%
nest()
> nested$data[[4]]
# A tibble: 52 x 6
year infant_mortality life_expectancy fertility population gdpPercap
<int> <dbl> <dbl> <dbl> <dbl> <int>
1 1960 37.3 68.8 2.70 7065525 7415
2 1961 35.0 69.7 2.79 7105654 7781
3 1962 32.9 69.5 2.80 7151077 7937
4 1963 31.2 69.6 2.82 7199962 8209
5 1964 29.7 70.1 2.80 7249855 8652
6 1965 28.3 69.9 2.70 7298794 8893
nested %>%
unnest(data)
# A tibble: 4,004 x 7
country year infant_mortality life_expectancy fertility population ...
<fct> <int> <dbl> <dbl> <dbl> <dbl> ...
1 Algeria 1960 148 47.5 7.65 11124892 ...
2 Algeria 1961 148 48.0 7.65 11404859 ...
3 Algeria 1962 148 48.6 7.65 11690152 ...
4 Algeria 1963 148 49.1 7.65 11985130 ...
5 Algeria 1964 149 49.6 7.65 12295973 ...
6 Algeria 1965 149 50.1 7.66 12626953 ...
Machine Learning in the Tidyverse