Dasar-dasar Tidy Machine Learning

Machine Learning di Tidyverse

Dmitriy (Dima) Gorenshteyn

Lead Data Scientist, Memorial Sloan Kettering Cancer Center

Inti dari Tidy Machine Learning

Machine Learning di Tidyverse

Inti dari Tidy Machine Learning

Machine Learning di Tidyverse

Alur Kerja Kolom Daftar

Machine Learning di Tidyverse

Dataset Gapminder

  • Paket dslabs
  • Observasi: 77 negara selama 52 tahun per negara (1960–2011)
  • Fitur:
    • year
    • infant_mortality
    • life_expectancy
    • fertility
    • population
    • gdpPercap
Machine Learning di Tidyverse

Alur Kerja Kolom Daftar

Machine Learning di Tidyverse

Langkah 1: Buat Kolom Daftar - Sarangkan Data Anda

Machine Learning di Tidyverse

Langkah 1: Buat Kolom Daftar - Sarangkan Data Anda

Machine Learning di Tidyverse

Mensarangkan per Negara

library(tidyverse)
nested <- gapminder %>%
          group_by(country) %>%
          nest() 

Machine Learning di Tidyverse

Melihat Tibble Tersarang

Machine Learning di Tidyverse

Melihat Tibble Tersarang

> nested$data[[4]]
# A tibble: 52 x 6
    year infant_mortality life_expectancy fertility population gdpPercap
   <int>            <dbl>           <dbl>     <dbl>      <dbl>     <int>
 1  1960             37.3            68.8      2.70    7065525      7415
 2  1961             35.0            69.7      2.79    7105654      7781
 3  1962             32.9            69.5      2.80    7151077      7937
 4  1963             31.2            69.6      2.82    7199962      8209
 5  1964             29.7            70.1      2.80    7249855      8652
 6  1965             28.3            69.9      2.70    7298794      8893
Machine Learning di Tidyverse

Langkah 3: Sederhanakan Kolom Daftar - unnest()

Machine Learning di Tidyverse

Langkah 3: Sederhanakan Kolom Daftar - unnest()

nested %>% 
  unnest(data)

# A tibble: 4,004 x 7
   country  year infant_mortality life_expectancy fertility population   ...
   <fct>   <int>            <dbl>           <dbl>     <dbl>      <dbl>   ...
 1 Algeria  1960              148            47.5      7.65   11124892   ...
 2 Algeria  1961              148            48.0      7.65   11404859   ...
 3 Algeria  1962              148            48.6      7.65   11690152   ...
 4 Algeria  1963              148            49.1      7.65   11985130   ...
 5 Algeria  1964              149            49.6      7.65   12295973   ...
 6 Algeria  1965              149            50.1      7.66   12626953   ...
Machine Learning di Tidyverse

Mari kita mulai!

Machine Learning di Tidyverse

Preparing Video For Download...