Wees vruchtbaar met dplyr

Programmeren met dplyr

Dr. Chester Ismay

Educator, Data Scientist, and R/Python Consultant

Cursusvereisten

 

  • Data joinen met dplyr

  • Introductie tot functies schrijven in R

Programmeren met dplyr

Cursusoverzicht

Hoofdstuk 1

  • dplyr-pipelines opfrissen
  • Kolommen kiezen op basis van patronen

Hoofdstuk 2

  • Kolommen in je data verplaatsen
  • Transformaties over meerdere kolommen uitvoeren

Hoofdstuk 3

  • Kennis van dplyr-joins verdiepen
  • Zet-theorieclausules gebruiken om met meerdere databronnen te programmeren

Hoofdstuk 4

  • Functies maken om dplyr- en ggplot2-code te bundelen
  • Het rlang-pakket gebruiken om tidy evaluation te begrijpen
Programmeren met dplyr

De tibble world_bank_data

country region year infant_mortality_rate fertility_rate perc_rural_pop
Saudi Arabia Western Asia 2013 13.3 2.64 17.260
Greece Southern Europe 2014 3.7 1.54 22.298
Latvia Northern Europe 2014 7.2 1.62 32.048
Romania Eastern Europe 2014 10.1 1.43 46.100
Netherlands Western Europe 2015 3.2 1.78 9.827
Programmeren met dplyr

Kolommen van world_bank_data

names(world_bank_data)
 [1] "iso"                   "country"               "continent"            
 [4] "region"                "year"                  "infant_mortality_rate"
 [7] "fertility_rate"        "perc_electric_access"  "perc_college_complete"
[10] "perc_cvd_crd_70"       "unemployment_rate"     "perc_rural_pop" 
Programmeren met dplyr

Selecteer enkele kolommen uit world_bank_data

world_bank_data %>%
    select(country, continent, region, year, perc_rural_pop, perc_college_complete)
# A tibble: 300 x 6
   country      continent region           year perc_rural_pop perc_college_complete
   <chr>        <fct>     <fct>           <dbl>          <dbl>                 <dbl>
 1 Portugal     Europe    Southern Europe  2000          45.6                   7.26
 2 Armenia      Asia      Western Asia     2001          35.6                  20.4 
 3 Bulgaria     Europe    Eastern Europe   2001          30.8                  18.0 
 4 Portugal     Europe    Southern Europe  2001          45.0                   7.57
 5 Qatar        Asia      Western Asia     2004           2.91                 20.9 
 6 Saudi Arabia Asia      Western Asia     2004          19.2                  14.9 
 7 Pakistan     Asia      Southern Asia    2005          66.0                   3.92
# ... with 293 more rows
Programmeren met dplyr

Rijen filteren op continentwaarden

continents_vector <- c("Africa", "Asia")
asia_africa_results <- world_bank_data %>%
    select(country, continent, region, year, perc_rural_pop, perc_college_complete) %>%
    filter(continent %in% continents_vector)
Programmeren met dplyr

Resultaten van rijfilter

asia_africa_results
# A tibble: 111 x 6
   country      continent region              year perc_rural_pop perc_college_complete
   <chr>        <fct>     <fct>              <dbl>          <dbl>                 <dbl>
 1 Armenia      Asia      Western Asia        2001          35.6                  20.4 
 2 Qatar        Asia      Western Asia        2004           2.91                 20.9 
 3 Saudi Arabia Asia      Western Asia        2004          19.2                  14.9 
 4 Pakistan     Asia      Southern Asia       2005          66.0                   3.92
 5 Nigeria      Africa    Western Africa      2006          60.1                   9.04
 6 Pakistan     Asia      Southern Asia       2006          65.8                   6.30
 7 Singapore    Asia      South-Eastern Asia  2006           0                    19.6 
 8 Azerbaijan   Asia      Western Asia        2007          47.2                  14.9 
 9 Qatar        Asia      Western Asia        2007           2.08                 25.1 
10 Singapore    Asia      South-Eastern Asia  2007           0                    20.1 
# ... with 101 more rows
Programmeren met dplyr

Nieuwe kolom muteren

asia_africa_results <- asia_africa_results %>%
    mutate(perc_urban_pop = 100 - perc_rural_pop)
Programmeren met dplyr

Resultaten van mutate

# A tibble: 111 x 7
   country      continent region              year perc_rural_pop perc_college_complete perc_urban_pop
   <chr>        <fct>     <fct>              <dbl>          <dbl>                 <dbl>          <dbl>
 1 Armenia      Asia      Western Asia        2001          35.6                  20.4            64.4
 2 Qatar        Asia      Western Asia        2004           2.91                 20.9            97.1
 3 Saudi Arabia Asia      Western Asia        2004          19.2                  14.9            80.8
 4 Pakistan     Asia      Southern Asia       2005          66.0                   3.92           34.0
 5 Nigeria      Africa    Western Africa      2006          60.1                   9.04           39.9
 6 Pakistan     Asia      Southern Asia       2006          65.8                   6.30           34.2
 7 Singapore    Asia      South-Eastern Asia  2006           0                    19.6           100  
 8 Azerbaijan   Asia      Western Asia        2007          47.2                  14.9            52.8
 9 Qatar        Asia      Western Asia        2007           2.08                 25.1            97.9
10 Singapore    Asia      South-Eastern Asia  2007           0                    20.1           100  
# ... with 101 more rows
Programmeren met dplyr

Analyseer stedelijk percentage per regio

asia_africa_results %>%

group_by(region) %>%
summarize( mean_urban = mean(perc_urban_pop) )
# A tibble: 9 x 2
  region             mean_urban
  <fct>                   <dbl>
1 Central Asia             49.2
2 Eastern Africa           19.5
3 Eastern Asia             74.2
4 Middle Africa            42.4
5 South-Eastern Asia       79.8
6 Southern Africa          64.8
7 Southern Asia            40.0
8 Western Africa           39.6
9 Western Asia             78.9
Programmeren met dplyr

Laten we oefenen!

Programmeren met dplyr

Preparing Video For Download...