Feature Engineering in R
Jorge Zazueta
Research Professor and Head of the Modeling Group at the School of Economics, UASLP
Eliminare variabili irrilevanti o poco informative può portare a vantaggi, tra cui
Addestrare un modello con tutte le feature
lr_recipe_full <-
recipe(Loan_Status ~., data = train) %>%
update_role(Loan_ID, new_role = "ID")
lr_workflow_full <-
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_full)
lr_fit_full <-
lr_workflow_full %>%
fit(data = train)
Grafico vip delle variabili
lr_fit_full %>%
extract_fit_parsnip() %>%
vip(aesthetics = list(fill = "steelblue"))
Importanza delle variabili

Possiamo aggiungere le feature direttamente usando la sintassi base delle formule R.
# Create recipe
recipe_formula <-
recipe(Loan_Status ~ Credit_History + Property_Area +
LoanAmount, data = train)
# Bundle with model
workflow_formula <- # Bundle with model
workflow() %>% add_model(lr_model) %>%
add_recipe(recipe_formula)
Si può passare un vettore di feature per selezionarle prima dell’addestramento.
# Feature vector
features <- c("Credit_History", "Property_Area", "LoanAmount", "Loan_Status")
# Training and testing data
train_features <- train %>% select(all_of(features))
test_features <- test %>% select(all_of(features))
# Create recipe and bundle with model
recipe_features <- recipe(Loan_Status ~., data = train_features)
workflow_features <- workflow() %>% add_model(lr_model) %>%
add_recipe(recipe_features)
Oggetti aumentati per entrambi gli approcci
lr_aug_formula <-
workflow_formula %>%
fit(data = train) %>%
augment(new_data = test)
lr_aug_features <-
workflow_features %>%
fit(data = train_features) %>%
augment(new_data = test_features)
Entrambi i metodi danno gli stessi risultati
all_equal(lr_aug_features,
lr_aug_formula %>%
select(all_of(features),
starts_with(".pred")))
[1] TRUE
Usare tutte le feature
lr_fit_full <- # Fit workflow
lr_workflow_full %>%
fit(data = train)
lr_aug_full <- # Augment
lr_fit_full %>%
augment(test)
lr_aug_full %>% # Evaluate
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_Y)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.842
2 roc_auc binary 0.744
Usare le 3 feature principali*
lr_fit_formula <- # Fit workflow
workflow_formula %>%
fit(train)
lr_aug_formula <- # Augment
lr_fit_formula %>%
augment(new_data = test)
lr_aug_formula %>% # Evaluate
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_Y)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.842
2 roc_auc binary 0.733
Feature Engineering in R