Variable Importance

Feature Engineering in R

Jorge Zazueta

Research Professor. Head of the Modeling Group at the School of Economics, UASLP

Adding more predictors

A more complete model includes many variables.

lr_model <- logistic_reg()
lr_recipe <- 
  recipe(class~ sponsor_code +
         contract_value_band +
         category_code, 
         data = grants_train) %>%
  step_lencode_glm(sponsor_code,
                   contract_value_band,
                   category_code, 
                   outcome = vars(class))

With more appealable results.

lr_aug %>% class_evaluate(truth = class,
               estimate = .pred_class,
               .pred_successful)
# A tibble: 2 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.890
2 roc_auc  binary         0.951
Feature Engineering in R

Which variables matter most?

We can plot features ranked by importance with help from the vip() package.

lr_fit %>%
  extract_fit_parsnip() %>%
  vip(aesthetics = 
      list(fill = "steelblue"))

Variable importance chart

Variable importance bar chart.

Feature Engineering in R

Variable importance and feature engineering

Variable importance can be a powerful feedback mechanism for refining feature engineering based on domain knowledge.

Varaible importance and feature engineering workflow.

Feature Engineering in R

Let's practice!

Feature Engineering in R

Preparing Video For Download...