Feature engineering in R
Jorge Zazueta
Research Professor. Head of the Modeling Group at the School of Economics, UASLP
Een completer model bevat meer variabelen.
lr_model <- logistic_reg()
lr_recipe <-
recipe(class~ sponsor_code +
contract_value_band +
category_code,
data = grants_train) %>%
step_lencode_glm(sponsor_code,
contract_value_band,
category_code,
outcome = vars(class))
Met overtuigendere resultaten.
lr_aug %>% class_evaluate(truth = class,
estimate = .pred_class,
.pred_successful)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.890
2 roc_auc binary 0.951
We kunnen features op belangrijkheid rangschikken en plotten met het pakket vip().
lr_fit %>%
extract_fit_parsnip() %>%
vip(aesthetics =
list(fill = "steelblue"))
Grafiek met variabelebelang

Variabelebelang is krachtige feedback om feature engineering te verfijnen met domeinkennis.

Feature engineering in R