Feature Engineering in R
Jorge Zazueta
Research Professor. Head of the Modeling Group at the School of Economics, UASLP
A more complete model includes many variables.
lr_model <- logistic_reg()
lr_recipe <-
recipe(class~ sponsor_code +
contract_value_band +
category_code,
data = grants_train) %>%
step_lencode_glm(sponsor_code,
contract_value_band,
category_code,
outcome = vars(class))
With more appealable results.
lr_aug %>% class_evaluate(truth = class,
estimate = .pred_class,
.pred_successful)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.890
2 roc_auc binary 0.951
We can plot features ranked by importance with help from the vip()
package.
lr_fit %>%
extract_fit_parsnip() %>%
vip(aesthetics =
list(fill = "steelblue"))
Variable importance chart
Variable importance can be a powerful feedback mechanism for refining feature engineering based on domain knowledge.
Feature Engineering in R