Pemodelan dengan tidymodels di R
David Svancer
Data Scientist
Data yang mengkodekan karakteristik atau kelompok
Contoh
Departemen dalam perusahaan
Bahasa ibu
Jenis mobil
Data nominal harus diubah menjadi numerik untuk pemodelan
One-Hot Encoding
Dummy Variable Encoding
recipes
Variabel prediktor nominal - lead_source dan us_location
leads_training
# A tibble: 996 x 7
purchased total_visits total_time pages_per_visit total_clicks lead_source us_location
<fct> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 yes 7 1148 7 59 direct_traffic west
2 no 5 228 2.5 25 email southeast
3 no 7 481 2.33 21 organic_search west
4 no 4 177 4 37 direct_traffic west
5 no 2 1273 2 26 email midwest
# ... with 991 more rows
Fungsi step_dummy()
recipe(purchased ~ ., data = leads_training) %>%step_dummy(lead_source, us_location) %>%prep(training = leads_training) %>%bake(new_data = leads_test)
# A tibble: 332 x 12
total_visits ... lead_source_email lead_source_organic_search lead_source_direct_traffic us_location_southeast ... us_location_west
<dbl> ... <dbl> <dbl> <dbl> <dbl> <dbl>
1 8 ... 0 0 1 0 1
2 4 ... 0 0 1 0 0
3 3 ... 0 1 0 0 1
4 2 ... 1 0 0 0 0
5 9 ... 0 0 1 0 1
# ... with 327 more rows
Memilih berdasarkan tipe kolom menggunakan selektor all_nominal() dan all_outcomes()
-all_outcomes() mengecualikan variabel hasil nominal, purchasedrecipe(purchased ~ ., data = leads_training) %>%step_dummy(all_nominal(), -all_outcomes()) %>%prep(training = leads_training) %>%bake(new_data = leads_test)
# A tibble: 332 x 12
total_visits ... lead_source_email lead_source_organic_search lead_source_direct_traffic ... us_location_west
<dbl> ... <dbl> <dbl> <dbl> <dbl>
1 8 ... 0 0 1 1
2 4 ... 0 0 1 0
3 3 ... 0 1 0 1
4 2 ... 1 0 0 0
5 9 ... 0 0 1 1
# ... with 327 more rows
Mesin pemodelan di R
step_dummy()
Paket recipes memberi cara standar menyiapkan prediktor nominal untuk pemodelan
Pemodelan dengan tidymodels di R