Programming with dplyr
Dr. Chester Ismay
Educator, Data Scientist, and R/Python Consultant
Joining Data with dplyr
Introduction to Writing Functions in R
dplyr
pipelinesdplyr
join knowledgedplyr
and ggplot2
coderlang
package to decipher tidy evaluationcountry | region | year | infant_mortality_rate | fertility_rate | perc_rural_pop |
---|---|---|---|---|---|
Saudi Arabia | Western Asia | 2013 | 13.3 | 2.64 | 17.260 |
Greece | Southern Europe | 2014 | 3.7 | 1.54 | 22.298 |
Latvia | Northern Europe | 2014 | 7.2 | 1.62 | 32.048 |
Romania | Eastern Europe | 2014 | 10.1 | 1.43 | 46.100 |
Netherlands | Western Europe | 2015 | 3.2 | 1.78 | 9.827 |
names(world_bank_data)
[1] "iso" "country" "continent"
[4] "region" "year" "infant_mortality_rate"
[7] "fertility_rate" "perc_electric_access" "perc_college_complete"
[10] "perc_cvd_crd_70" "unemployment_rate" "perc_rural_pop"
world_bank_data %>%
select(country, continent, region, year, perc_rural_pop, perc_college_complete)
# A tibble: 300 x 6
country continent region year perc_rural_pop perc_college_complete
<chr> <fct> <fct> <dbl> <dbl> <dbl>
1 Portugal Europe Southern Europe 2000 45.6 7.26
2 Armenia Asia Western Asia 2001 35.6 20.4
3 Bulgaria Europe Eastern Europe 2001 30.8 18.0
4 Portugal Europe Southern Europe 2001 45.0 7.57
5 Qatar Asia Western Asia 2004 2.91 20.9
6 Saudi Arabia Asia Western Asia 2004 19.2 14.9
7 Pakistan Asia Southern Asia 2005 66.0 3.92
# ... with 293 more rows
continents_vector <- c("Africa", "Asia")
asia_africa_results <- world_bank_data %>%
select(country, continent, region, year, perc_rural_pop, perc_college_complete) %>%
filter(continent %in% continents_vector)
asia_africa_results
# A tibble: 111 x 6
country continent region year perc_rural_pop perc_college_complete
<chr> <fct> <fct> <dbl> <dbl> <dbl>
1 Armenia Asia Western Asia 2001 35.6 20.4
2 Qatar Asia Western Asia 2004 2.91 20.9
3 Saudi Arabia Asia Western Asia 2004 19.2 14.9
4 Pakistan Asia Southern Asia 2005 66.0 3.92
5 Nigeria Africa Western Africa 2006 60.1 9.04
6 Pakistan Asia Southern Asia 2006 65.8 6.30
7 Singapore Asia South-Eastern Asia 2006 0 19.6
8 Azerbaijan Asia Western Asia 2007 47.2 14.9
9 Qatar Asia Western Asia 2007 2.08 25.1
10 Singapore Asia South-Eastern Asia 2007 0 20.1
# ... with 101 more rows
asia_africa_results <- asia_africa_results %>%
mutate(perc_urban_pop = 100 - perc_rural_pop)
# A tibble: 111 x 7
country continent region year perc_rural_pop perc_college_complete perc_urban_pop
<chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 Armenia Asia Western Asia 2001 35.6 20.4 64.4
2 Qatar Asia Western Asia 2004 2.91 20.9 97.1
3 Saudi Arabia Asia Western Asia 2004 19.2 14.9 80.8
4 Pakistan Asia Southern Asia 2005 66.0 3.92 34.0
5 Nigeria Africa Western Africa 2006 60.1 9.04 39.9
6 Pakistan Asia Southern Asia 2006 65.8 6.30 34.2
7 Singapore Asia South-Eastern Asia 2006 0 19.6 100
8 Azerbaijan Asia Western Asia 2007 47.2 14.9 52.8
9 Qatar Asia Western Asia 2007 2.08 25.1 97.9
10 Singapore Asia South-Eastern Asia 2007 0 20.1 100
# ... with 101 more rows
asia_africa_results %>%
group_by(region) %>%
summarize( mean_urban = mean(perc_urban_pop) )
# A tibble: 9 x 2
region mean_urban
<fct> <dbl>
1 Central Asia 49.2
2 Eastern Africa 19.5
3 Eastern Asia 74.2
4 Middle Africa 42.4
5 South-Eastern Asia 79.8
6 Southern Africa 64.8
7 Southern Asia 40.0
8 Western Africa 39.6
9 Western Asia 78.9
Programming with dplyr