Feature Engineering in R
Jorge Zazueta
Research Professor and Head of the Modeling Group at the School of Economics, UASLP
Rows: 480
Columns: 6
$ Loan_Status <fct> N, Y, Y, Y, Y, Y, N, Y, N, Y, Y, N, Y, Y, N, N, ...
$ ApplicantIncome <dbl> 4583, 3000, 2583, 6000, 5417, 2333, 3036, 4006, ...
$ CoapplicantIncome <dbl> 1508, 0, 2358, 0, 4196, 1516, 2504, 1526, 10968,...
$ LoanAmount <dbl> 128, 66, 120, 141, 267, 95, 158, 168, 349, 70, 2...
$ Loan_Amount_Term <dbl> 360, 360, 360, 360, 360, 360, 360, 360, 360, 360...
$ Credit_History <fct> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, ...
Configure the recipe and set the workflow
lr_recipe_plain <-
recipe(Loan_Status ~., data = train)
lr_workflow_poly <-
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_plain)
Fit and assess the workflow
lr_fit_plain <-
lr_workflow_plain %>% fit(train)
lr_aug_plain <-
lr_fit_plain %>% augment(test)
lr_aug_plain %>%
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
Plain recipe results.
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.75
2 roc_auc binary 0.595
step_poly()
implements a polynomial expansion to one or more variables and passes it to our model.
lr_recipe_poly <-
recipe(Loan_Status ~., data = train) %>%
step_poly(all_numeric_predictors())
lr_workflow_poly <-
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_poly)
Results with step_poly()
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.75
2 roc_auc binary 0.703
step_percentile()
determines the empirical distribution of a variable based on the training set and converts all values to percentiles.
lr_recipe_perc <-
recipe(Loan_Status ~., data = train) %>%
step_percentile(all_numeric_predictors())
lr_workflow_perc <-
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_perc)
Results with step_percentile()
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.769
2 roc_auc binary 0.677
Feature Engineering in R