Rekayasa Fitur di R
Jorge Zazueta
Research Professor and Head of the Modeling Group at the School of Economics, UASLP
Menghapus variabel yang tidak relevan atau berinformasi rendah dapat memberi manfaat, antara lain
Melatih model dengan semua fitur
lr_recipe_full <-
recipe(Loan_Status ~., data = train) %>%
update_role(Loan_ID, new_role = "ID")
lr_workflow_full <-
workflow() %>%
add_model(lr_model) %>%
add_recipe(lr_recipe_full)
lr_fit_full <-
lr_workflow_full %>%
fit(data = train)
Memplot vip variabel
lr_fit_full %>%
extract_fit_parsnip() %>%
vip(aesthetics = list(fill = "steelblue"))
Kepentingan variabel

Kita dapat menambahkan fitur langsung dengan sintaks formula R dasar.
# Buat recipe
recipe_formula <-
recipe(Loan_Status ~ Credit_History + Property_Area +
LoanAmount, data = train)
# Gabungkan dengan model
workflow_formula <- # Bundle with model
workflow() %>% add_model(lr_model) %>%
add_recipe(recipe_formula)
Vektor fitur dapat digunakan untuk memilih fitur sebelum pelatihan.
# Feature vector
features <- c("Credit_History", "Property_Area", "LoanAmount", "Loan_Status")
# Data latih dan uji
train_features <- train %>% select(all_of(features))
test_features <- test %>% select(all_of(features))
# Buat recipe dan gabungkan dengan model
recipe_features <- recipe(Loan_Status ~., data = train_features)
workflow_features <- workflow() %>% add_model(lr_model) %>%
add_recipe(recipe_features)
Objek augmented untuk kedua pendekatan
lr_aug_formula <-
workflow_formula %>%
fit(data = train) %>%
augment(new_data = test)
lr_aug_features <-
workflow_features %>%
fit(data = train_features) %>%
augment(new_data = test_features)
Keduanya memberi hasil yang sama
all_equal(lr_aug_features,
lr_aug_formula %>%
select(all_of(features),
starts_with(".pred")))
[1] TRUE
Menggunakan semua fitur
lr_fit_full <- # Fit workflow
lr_workflow_full %>%
fit(data = train)
lr_aug_full <- # Augment
lr_fit_full %>%
augment(test)
lr_aug_full %>% # Evaluasi
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_Y)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.842
2 roc_auc binary 0.744
Menggunakan 3 fitur teratas*
lr_fit_formula <- # Fit workflow
workflow_formula %>%
fit(train)
lr_aug_formula <- # Augment
lr_fit_formula %>%
augment(new_data = test)
lr_aug_formula %>% # Evaluasi
class_evaluate(truth = Loan_Status,
estimate = .pred_class,
.pred_Y)
# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.842
2 roc_auc binary 0.733
Rekayasa Fitur di R