Modellazione con tidymodels in R
David Svancer
Data Scientist
Dati che codificano caratteristiche o gruppi
Esempi
Reparto in un'azienda
Lingua madre
Tipo di auto
I dati nominali vanno trasformati in numerici per il modeling
One-Hot Encoding
Dummy Variable Encoding
recipes
Variabili predittive nominali - lead_source e us_location
leads_training
# A tibble: 996 x 7
purchased total_visits total_time pages_per_visit total_clicks lead_source us_location
<fct> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 yes 7 1148 7 59 direct_traffic west
2 no 5 228 2.5 25 email southeast
3 no 7 481 2.33 21 organic_search west
4 no 4 177 4 37 direct_traffic west
5 no 2 1273 2 26 email midwest
# ... with 991 more rows
La funzione step_dummy()
recipe(purchased ~ ., data = leads_training) %>%step_dummy(lead_source, us_location) %>%prep(training = leads_training) %>%bake(new_data = leads_test)
# A tibble: 332 x 12
total_visits ... lead_source_email lead_source_organic_search lead_source_direct_traffic us_location_southeast ... us_location_west
<dbl> ... <dbl> <dbl> <dbl> <dbl> <dbl>
1 8 ... 0 0 1 0 1
2 4 ... 0 0 1 0 0
3 3 ... 0 1 0 0 1
4 2 ... 1 0 0 0 0
5 9 ... 0 0 1 0 1
# ... with 327 more rows
Selezione per tipo di colonna con i selettori all_nominal() e all_outcomes()
-all_outcomes() esclude la variabile di outcome nominale, purchasedrecipe(purchased ~ ., data = leads_training) %>%step_dummy(all_nominal(), -all_outcomes()) %>%prep(training = leads_training) %>%bake(new_data = leads_test)
# A tibble: 332 x 12
total_visits ... lead_source_email lead_source_organic_search lead_source_direct_traffic ... us_location_west
<dbl> ... <dbl> <dbl> <dbl> <dbl>
1 8 ... 0 0 1 1
2 4 ... 0 0 1 0
3 3 ... 0 1 0 1
4 2 ... 1 0 0 0
5 9 ... 0 0 1 1
# ... with 327 more rows
Motori di modellazione in R
step_dummy()
Il pacchetto recipes offre un modo standard per preparare predittori nominali al modeling
Modellazione con tidymodels in R