Ridurre le feature del modello

Feature Engineering in R

Jorge Zazueta

Research Professor and Head of the Modeling Group at the School of Economics, UASLP

Perché ridurre il numero di feature

Eliminare variabili irrilevanti o poco informative può portare a vantaggi, tra cui

  • Ridurre la varianza senza aumentare troppo il bias
  • Migliorare le prestazioni out-of-sample
  • Ridurre i tempi di calcolo
  • Diminuire la complessità del modello
  • Migliorare l’interpretabilità
Feature Engineering in R

Filtrare i dati con l’importanza delle variabili

Addestrare un modello con tutte le feature

lr_recipe_full <-
  recipe(Loan_Status ~., data = train) %>%
  update_role(Loan_ID, new_role = "ID")

lr_workflow_full <- 
  workflow() %>%
  add_model(lr_model) %>%
  add_recipe(lr_recipe_full)

lr_fit_full <- 
  lr_workflow_full %>%
  fit(data = train)

Grafico vip delle variabili

lr_fit_full %>%
  extract_fit_parsnip() %>%
  vip(aesthetics = list(fill = "steelblue"))

Importanza delle variabili Grafico a colonne dell’importanza delle variabili.

Feature Engineering in R

Costruire un modello ridotto con la sintassi formula

Possiamo aggiungere le feature direttamente usando la sintassi base delle formule R.

# Create recipe
recipe_formula <- 
  recipe(Loan_Status ~ Credit_History + Property_Area + 
           LoanAmount, data = train)

# Bundle with model
workflow_formula <- # Bundle with model
  workflow() %>% add_model(lr_model) %>%
  add_recipe(recipe_formula)
Feature Engineering in R

Costruire un modello ridotto creando un vettore di feature

Si può passare un vettore di feature per selezionarle prima dell’addestramento.

# Feature vector
features <- c("Credit_History", "Property_Area", "LoanAmount", "Loan_Status") 

# Training and testing data
train_features <- train %>% select(all_of(features))
test_features <- test %>% select(all_of(features))

# Create recipe and bundle with model
recipe_features <- recipe(Loan_Status ~., data = train_features)
workflow_features <- workflow() %>% add_model(lr_model) %>%
  add_recipe(recipe_features) 
Feature Engineering in R

Creare gli oggetti aumentati

Oggetti aumentati per entrambi gli approcci

lr_aug_formula <-
  workflow_formula %>%
  fit(data = train) %>%
  augment(new_data = test)
lr_aug_features <-
  workflow_features %>%
  fit(data = train_features) %>%
  augment(new_data = test_features)

Entrambi i metodi danno gli stessi risultati

all_equal(lr_aug_features, 
lr_aug_formula %>%
select(all_of(features),
starts_with(".pred")))
[1] TRUE
Feature Engineering in R

Confronto tra modello completo e ridotto

Usare tutte le feature

lr_fit_full <- # Fit workflow
  lr_workflow_full %>%
  fit(data = train)
lr_aug_full <- # Augment
  lr_fit_full %>%
  augment(test)
lr_aug_full %>% # Evaluate
  class_evaluate(truth = Loan_Status, 
                 estimate = .pred_class,
                 .pred_Y)
# A tibble: 2 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.842
2 roc_auc  binary         0.744

Usare le 3 feature principali*

lr_fit_formula <- # Fit workflow
  workflow_formula %>%
  fit(train)
lr_aug_formula <- # Augment
  lr_fit_formula %>%
  augment(new_data = test)
lr_aug_formula %>% # Evaluate
  class_evaluate(truth = Loan_Status, 
                 estimate = .pred_class,
                 .pred_Y)
# A tibble: 2 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.842
2 roc_auc  binary         0.733
Feature Engineering in R

Passons à la pratique !

Feature Engineering in R

Preparing Video For Download...