Rekayasa Fitur di R
Jorge Zazueta
Research Professor. Head of the Modeling Group at the School of Economics, UASLP
Langkah pemodelan tingkat tinggi yang umum.

Langkah pemodelan tingkat tinggi yang umum.

Mulai dengan housekeeping dasar dan menyiapkan pembagian data.
loans <- # Basic housekeeping
loans %>%
mutate(across(where(is_character),
as_factor)) %>%
mutate(across(Credit_History,
as_factor))
set.seed(123) # Set up splits
split <- initial_split(loans,
strata = Loan_Status)
test <- testing(split)
train <- training(split)
glimpse(train)
Rows: 460
Columns: 13
$ Loan_ID <fct> LP001003...
$ Gender <fct> Male, Ma...
$ Married <fct> Yes, No,...
$ Dependents <fct> 1, 0, 0,...
$ Education <fct> Graduate...
$ Self_Employed <fct> No, No, ...
$ ApplicantIncome <dbl> 4583, 18...
$ CoapplicantIncome <dbl> 1508, 28...
$ LoanAmount <dbl> 128, 114...
$ Loan_Amount_Term <dbl> 360, 360...
$ Credit_History <fct> 1, 1, 0,...
$ Property_Area <fct> Rural, R...
$ Loan_Status <fct> N, N, N,...
Resep kita bisa sangat singkat atau kompleks.
recipe <- recipe(Loan_Status ~ .,
data = train) %>%
update_role(Loan_ID,
new_role = "ID") %>%
step_normalize(all_numeric_predictors()) %>%
step_impute_knn(all_predictors()) %>%
step_dummy(all_nominal_predictors())
recipe
Recipe
Inputs:
role #variables
ID 1
outcome 1
predictor 11
Operations:
Centering and scaling for all_numeric_predictors()
K-nearest neighbor imputation for all_predictors()
Dummy variables from all_nominal_predictors()
Siapkan workflow
lr_model <- logistic_reg() %>%
set_engine("glmnet") %>%
set_args(mixture = 1, penalty = tune())
lr_penalty_grid <- grid_regular(
penalty(range = c(-3, 1)),
levels = 30)
lr_workflow <-
workflow() %>%
add_model(lr_model) %>%
add_recipe(recipe)
lr_workflow
--Workflow -------------------------------
Preprocessor: Recipe
Model: logistic_reg()
-- Preprocessor --------------------------
3 Recipe Steps
- step_normalize()
- step_impute_knn()
- step_dummy()
-- Model ---------------------------------
Spesifikasi Model Regresi Logistik (klasifikasi)
Argumen Utama:
penalty = tune()
mixture = 1
Mesin komputasi: glmnet
Tuning penalty untuk Lasso
lr_tune_output <- tune_grid(
lr_workflow,
resamples = vfold_cv(train, v = 5),
metrics = metric_set(roc_auc),
grid = penalty_grid)
autoplot(tune_output)
ROC_AUC vs. Regularisasi

Melatih model final
best_penalty <-
select_by_one_std_err(lr_tune_output,
metric = 'roc_auc', desc(penalty))
lr_final_fit<-
finalize_workflow(lr_workflow, best_penalty) %>%
fit(data = train)
lr_final_fit %>%
augment(test) %>%
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_Y)
Metrik kinerja
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.818
2 roc_auc binary 0.813
Rekayasa Fitur di R