Feature Engineering in R
Jorge Zazueta
Research Professor and Head of the Modeling Group at the School of Economics, UASLP
Box-Cox
Yeo-Johnson
glimpse(loans_num)
Rows: 480
Columns: 6
$ Loan_Status <fct> N, Y, Y, Y, Y, Y, N, Y, N, Y, Y, N, Y, Y, N...
$ ApplicantIncome <dbl> 4583, 3000, 2583, 6000, 5417, 2333, 3036, 4...
$ CoapplicantIncome <dbl> 1508, 0, 2358, 0, 4196, 1516, 2504, 1526, 1...
$ LoanAmount <dbl> 128, 66, 120, 141, 267, 95, 158, 168, 349, ...
$ Loan_Amount_Term <dbl> 360, 360, 360, 360, 360, 360, 360, 360, 360...
$ Credit_History <fct> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0...
Plain recipe
lr_recipe_plain <- # Define recipe
recipe(Loan_Status ~., data = train)
lr_workflow_plain <- # Bundle workflows
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_plain)
lr_fit_plain <- # fit and augment
lr_workflow_plain %>%
fit(train)
Assess performance
lr_aug_plain %>% # Assess
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.817
2 roc_auc binary 0.641
Box-Cox recipe
lr_recipe_BC <- # Define recipe
recipe(Loan_Status ~., data = train) %>%
step_BoxCox(all_numeric())
lr_workflow_BC <- # Bundle workflows
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_BC)
lr_fit_BC <- # fit and augment
lr_workflow_BC %>%
fit(train)
Warning Message
Box-Cox is unable to process non-positive values
Warning messages:
1: Non-positive values in selected
variable.
2: No Box-Cox transformation could be
estimated for: `CoapplicantIncome`
Box-Cox recipe (take two)
Now, let's deselect CoappliantIncome
to avoid the warning.
lr_recipe_BC <- # Define recipe
recipe(Loan_Status ~., data = train) %>%
step_BoxCox(all_numeric(),
-CoapplicantIncome)
lr_workflow_BC <- # Bundle workflows
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_BC)
lr_fit_BC <- # fit and augment
lr_workflow_BC %>%
fit(train)
Assess performance
lr_aug_BC %>% # Assess
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.817
2 roc_auc binary 0.599
Yeo-Johnson recipe
lr_recipe_YJ <- # Define recipe
recipe(Loan_Status ~., data = train) %>%
step_YeoJohnson(all_numeric())
lr_workflow_YJ <- # Bundle workflows
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_YJ)
lr_fit_YJ <- # fit and augment
lr_workflow_YJ %>%
fit(train)
Assess performance
lr_aug_YJ %>% # Assess
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.817
2 roc_auc binary 0.700
Feature Engineering in R