Feature Engineering in R
Jorge Zazueta
Research Professor and Head of the Modeling Group at the School of Economics, UASLP
Box-Cox

Yeo-Johnson

glimpse(loans_num)
Righe: 480
Colonne: 6
$ Loan_Status <fct> N, Y, Y, Y, Y, Y, N, Y, N, Y, Y, N, Y, Y, N...
$ ApplicantIncome <dbl> 4583, 3000, 2583, 6000, 5417, 2333, 3036, 4...
$ CoapplicantIncome <dbl> 1508, 0, 2358, 0, 4196, 1516, 2504, 1526, 1...
$ LoanAmount <dbl> 128, 66, 120, 141, 267, 95, 158, 168, 349, ...
$ Loan_Amount_Term <dbl> 360, 360, 360, 360, 360, 360, 360, 360, 360...
$ Credit_History <fct> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0...
Ricetta semplice
lr_recipe_plain <- # Definisci recipe
recipe(Loan_Status ~., data = train)
lr_workflow_plain <- # Assembla workflow
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_plain)
lr_fit_plain <- # fit e augment
lr_workflow_plain %>%
fit(train)
Valuta le prestazioni
lr_aug_plain %>% # Valuta
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.817
2 roc_auc binary 0.641
Ricetta Box-Cox
lr_recipe_BC <- # Definisci recipe
recipe(Loan_Status ~., data = train) %>%
step_BoxCox(all_numeric())
lr_workflow_BC <- # Assembla workflow
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_BC)
lr_fit_BC <- # fit e augment
lr_workflow_BC %>%
fit(train)
Messaggio di avviso
Box-Cox non elabora valori non positivi
Warning messages:
1: Non-positive values in selected
variable.
2: No Box-Cox transformation could be
estimated for: `CoapplicantIncome`
Ricetta Box-Cox (bis)
Ora deseleziona CoappliantIncome per evitare l'avviso.
lr_recipe_BC <- # Definisci recipe
recipe(Loan_Status ~., data = train) %>%
step_BoxCox(all_numeric(),
-CoapplicantIncome)
lr_workflow_BC <- # Assembla workflow
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_BC)
lr_fit_BC <- # fit e augment
lr_workflow_BC %>%
fit(train)
Valuta le prestazioni
lr_aug_BC %>% # Valuta
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.817
2 roc_auc binary 0.599
Ricetta Yeo-Johnson
lr_recipe_YJ <- # Definisci recipe
recipe(Loan_Status ~., data = train) %>%
step_YeoJohnson(all_numeric())
lr_workflow_YJ <- # Assembla workflow
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_YJ)
lr_fit_YJ <- # fit e augment
lr_workflow_YJ %>%
fit(train)
Valuta le prestazioni
lr_aug_YJ %>% # Valuta
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_N)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.817
2 roc_auc binary 0.700
Feature Engineering in R