Fruttifero con dplyr

Programmare con dplyr

Dr. Chester Ismay

Educator, Data Scientist, and R/Python Consultant

Prerequisiti del corso

 

  • Join dei dati con dplyr

  • Introduzione a scrivere funzioni in R

Programmare con dplyr

Programma del corso

Capitolo 1

  • Ripasso delle pipeline dplyr
  • Scegli colonne in base a pattern

Capitolo 2

  • Sposta colonne nei tuoi dati
  • Trasforma su più colonne

Capitolo 3

  • Rafforza le join di dplyr
  • Usa clausole di teoria degli insiemi per migliorare la programmazione con più fonti dati

Capitolo 4

  • Crea funzioni per racchiudere codice dplyr e ggplot2
  • Usa il pacchetto rlang per capire la tidy evaluation
Programmare con dplyr

Il tibble world_bank_data

country region year infant_mortality_rate fertility_rate perc_rural_pop
Saudi Arabia Western Asia 2013 13.3 2.64 17.260
Greece Southern Europe 2014 3.7 1.54 22.298
Latvia Northern Europe 2014 7.2 1.62 32.048
Romania Eastern Europe 2014 10.1 1.43 46.100
Netherlands Western Europe 2015 3.2 1.78 9.827
Programmare con dplyr

Colonne di world_bank_data

names(world_bank_data)
 [1] "iso"                   "country"               "continent"            
 [4] "region"                "year"                  "infant_mortality_rate"
 [7] "fertility_rate"        "perc_electric_access"  "perc_college_complete"
[10] "perc_cvd_crd_70"       "unemployment_rate"     "perc_rural_pop" 
Programmare con dplyr

Seleziona alcune colonne da world_bank_data

world_bank_data %>%
    select(country, continent, region, year, perc_rural_pop, perc_college_complete)
# Un tibble: 300 x 6
   country      continent region           year perc_rural_pop perc_college_complete
   <chr>        <fct>     <fct>           <dbl>          <dbl>                 <dbl>
 1 Portugal     Europe    Southern Europe  2000          45.6                   7.26
 2 Armenia      Asia      Western Asia     2001          35.6                  20.4 
 3 Bulgaria     Europe    Eastern Europe   2001          30.8                  18.0 
 4 Portugal     Europe    Southern Europe  2001          45.0                   7.57
 5 Qatar        Asia      Western Asia     2004           2.91                 20.9 
 6 Arabia Saudita Asia    Western Asia     2004          19.2                  14.9 
 7 Pakistan     Asia      Southern Asia    2005          66.0                   3.92
# ... con altre 293 righe
Programmare con dplyr

Filtra le righe per continente

continents_vector <- c("Africa", "Asia")
asia_africa_results <- world_bank_data %>%
    select(country, continent, region, year, perc_rural_pop, perc_college_complete) %>%
    filter(continent %in% continents_vector)
Programmare con dplyr

Risultati del filtro righe

asia_africa_results
# Un tibble: 111 x 6
   country      continent region              year perc_rural_pop perc_college_complete
   <chr>        <fct>     <fct>              <dbl>          <dbl>                 <dbl>
 1 Armenia      Asia      Western Asia        2001          35.6                  20.4 
 2 Qatar        Asia      Western Asia        2004           2.91                 20.9 
 3 Arabia Saudita Asia    Western Asia        2004          19.2                  14.9 
 4 Pakistan     Asia      Southern Asia       2005          66.0                   3.92
 5 Nigeria      Africa    Western Africa      2006          60.1                   9.04
 6 Pakistan     Asia      Southern Asia       2006          65.8                   6.30
 7 Singapore    Asia      South-Eastern Asia  2006           0                    19.6 
 8 Azerbaigian  Asia      Western Asia        2007          47.2                  14.9 
 9 Qatar        Asia      Western Asia        2007           2.08                 25.1 
10 Singapore    Asia      South-Eastern Asia  2007           0                    20.1 
# ... con altre 101 righe
Programmare con dplyr

Crea una nuova colonna con mutate

asia_africa_results <- asia_africa_results %>%
    mutate(perc_urban_pop = 100 - perc_rural_pop)
Programmare con dplyr

Risultati di mutate

# Un tibble: 111 x 7
   country      continent region              year perc_rural_pop perc_college_complete perc_urban_pop
   <chr>        <fct>     <fct>              <dbl>          <dbl>                 <dbl>          <dbl>
 1 Armenia      Asia      Western Asia        2001          35.6                  20.4            64.4
 2 Qatar        Asia      Western Asia        2004           2.91                 20.9            97.1
 3 Arabia Saudita Asia    Western Asia        2004          19.2                  14.9            80.8
 4 Pakistan     Asia      Southern Asia       2005          66.0                   3.92           34.0
 5 Nigeria      Africa    Western Africa      2006          60.1                   9.04           39.9
 6 Pakistan     Asia      Southern Asia       2006          65.8                   6.30           34.2
 7 Singapore    Asia      South-Eastern Asia  2006           0                    19.6           100  
 8 Azerbaigian  Asia      Western Asia        2007          47.2                  14.9            52.8
 9 Qatar        Asia      Western Asia        2007           2.08                 25.1            97.9
10 Singapore    Asia      South-Eastern Asia  2007           0                    20.1           100  
# ... con altre 101 righe
Programmare con dplyr

Analizza la percentuale urbana per regione

asia_africa_results %>%

group_by(region) %>%
summarize( mean_urban = mean(perc_urban_pop) )
# Un tibble: 9 x 2
  region             mean_urban
  <fct>                   <dbl>
1 Central Asia             49.2
2 Eastern Africa           19.5
3 Eastern Asia             74.2
4 Middle Africa            42.4
5 South-Eastern Asia       79.8
6 Southern Africa          64.8
7 Southern Asia            40.0
8 Western Africa           39.6
9 Western Asia             78.9
Programmare con dplyr

Ayo berlatih!

Programmare con dplyr

Preparing Video For Download...