Produktif dengan dplyr

Pemrograman dengan dplyr

Dr. Chester Ismay

Educator, Data Scientist, and R/Python Consultant

Prasyarat kursus

 

  • Menggabungkan data dengan dplyr

  • Pengantar menulis fungsi di R

Pemrograman dengan dplyr

Garis besar kursus

Bab 1

  • Menyegarkan pipeline dplyr
  • Memilih kolom berdasarkan pola

Bab 2

  • Memindahkan kolom dalam data
  • Transformasi pada banyak kolom data

Bab 3

  • Memperkuat pengetahuan join dplyr
  • Gunakan klausa teori himpunan untuk meningkatkan pemrograman dengan banyak sumber data

Bab 4

  • Membuat fungsi untuk membungkus kode dplyr dan ggplot2
  • Gunakan paket rlang untuk memahami tidy evaluation
Pemrograman dengan dplyr

Tibble world_bank_data

country region year infant_mortality_rate fertility_rate perc_rural_pop
Saudi Arabia Western Asia 2013 13.3 2.64 17.260
Greece Southern Europe 2014 3.7 1.54 22.298
Latvia Northern Europe 2014 7.2 1.62 32.048
Romania Eastern Europe 2014 10.1 1.43 46.100
Netherlands Western Europe 2015 3.2 1.78 9.827
Pemrograman dengan dplyr

Kolom world_bank_data

names(world_bank_data)
 [1] "iso"                   "country"               "continent"            
 [4] "region"                "year"                  "infant_mortality_rate"
 [7] "fertility_rate"        "perc_electric_access"  "perc_college_complete"
[10] "perc_cvd_crd_70"       "unemployment_rate"     "perc_rural_pop" 
Pemrograman dengan dplyr

Pilih beberapa kolom dari world_bank_data

world_bank_data %>%
    select(country, continent, region, year, perc_rural_pop, perc_college_complete)
# A tibble: 300 x 6
   country      continent region           year perc_rural_pop perc_college_complete
   <chr>        <fct>     <fct>           <dbl>          <dbl>                 <dbl>
 1 Portugal     Europe    Southern Europe  2000          45.6                   7.26
 2 Armenia      Asia      Western Asia     2001          35.6                  20.4 
 3 Bulgaria     Europe    Eastern Europe   2001          30.8                  18.0 
 4 Portugal     Europe    Southern Europe  2001          45.0                   7.57
 5 Qatar        Asia      Western Asia     2004           2.91                 20.9 
 6 Saudi Arabia Asia      Western Asia     2004          19.2                  14.9 
 7 Pakistan     Asia      Southern Asia    2005          66.0                   3.92
# ... with 293 more rows
Pemrograman dengan dplyr

Saring baris sesuai nilai benua

continents_vector <- c("Africa", "Asia")
asia_africa_results <- world_bank_data %>%
    select(country, continent, region, year, perc_rural_pop, perc_college_complete) %>%
    filter(continent %in% continents_vector)
Pemrograman dengan dplyr

Hasil penyaringan baris

asia_africa_results
# A tibble: 111 x 6
   country      continent region              year perc_rural_pop perc_college_complete
   <chr>        <fct>     <fct>              <dbl>          <dbl>                 <dbl>
 1 Armenia      Asia      Western Asia        2001          35.6                  20.4 
 2 Qatar        Asia      Western Asia        2004           2.91                 20.9 
 3 Saudi Arabia Asia      Western Asia        2004          19.2                  14.9 
 4 Pakistan     Asia      Southern Asia       2005          66.0                   3.92
 5 Nigeria      Africa    Western Africa      2006          60.1                   9.04
 6 Pakistan     Asia      Southern Asia       2006          65.8                   6.30
 7 Singapore    Asia      South-Eastern Asia  2006           0                    19.6 
 8 Azerbaijan   Asia      Western Asia        2007          47.2                  14.9 
 9 Qatar        Asia      Western Asia        2007           2.08                 25.1 
10 Singapore    Asia      South-Eastern Asia  2007           0                    20.1 
# ... with 101 more rows
Pemrograman dengan dplyr

Mutasi kolom baru

asia_africa_results <- asia_africa_results %>%
    mutate(perc_urban_pop = 100 - perc_rural_pop)
Pemrograman dengan dplyr

Hasil mutasi

# A tibble: 111 x 7
   country      continent region              year perc_rural_pop perc_college_complete perc_urban_pop
   <chr>        <fct>     <fct>              <dbl>          <dbl>                 <dbl>          <dbl>
 1 Armenia      Asia      Western Asia        2001          35.6                  20.4            64.4
 2 Qatar        Asia      Western Asia        2004           2.91                 20.9            97.1
 3 Saudi Arabia Asia      Western Asia        2004          19.2                  14.9            80.8
 4 Pakistan     Asia      Southern Asia       2005          66.0                   3.92           34.0
 5 Nigeria      Africa    Western Africa      2006          60.1                   9.04           39.9
 6 Pakistan     Asia      Southern Asia       2006          65.8                   6.30           34.2
 7 Singapore    Asia      South-Eastern Asia  2006           0                    19.6           100  
 8 Azerbaijan   Asia      Western Asia        2007          47.2                  14.9            52.8
 9 Qatar        Asia      Western Asia        2007           2.08                 25.1            97.9
10 Singapore    Asia      South-Eastern Asia  2007           0                    20.1           100  
# ... with 101 more rows
Pemrograman dengan dplyr

Analisis persentase urban antar wilayah

asia_africa_results %>%

group_by(region) %>%
summarize( mean_urban = mean(perc_urban_pop) )
# A tibble: 9 x 2
  region             mean_urban
  <fct>                   <dbl>
1 Central Asia             49.2
2 Eastern Africa           19.5
3 Eastern Asia             74.2
4 Middle Africa            42.4
5 South-Eastern Asia       79.8
6 Southern Africa          64.8
7 Southern Asia            40.0
8 Western Africa           39.6
9 Western Asia             78.9
Pemrograman dengan dplyr

Ayo berlatih!

Pemrograman dengan dplyr

Preparing Video For Download...