Mengurangi fitur model

Rekayasa Fitur di R

Jorge Zazueta

Research Professor and Head of the Modeling Group at the School of Economics, UASLP

Alasan mengurangi jumlah fitur

Menghapus variabel yang tidak relevan atau berinformasi rendah dapat memberi manfaat, antara lain

  • Mengurangi varians model tanpa banyak menambah bias
  • Meningkatkan kinerja di data uji
  • Mempercepat komputasi
  • Menurunkan kompleksitas model
  • Meningkatkan interpretabilitas
Rekayasa Fitur di R

Menyaring data dengan kepentingan variabel

Melatih model dengan semua fitur

lr_recipe_full <-
  recipe(Loan_Status ~., data = train) %>%
  update_role(Loan_ID, new_role = "ID")

lr_workflow_full <- 
  workflow() %>%
  add_model(lr_model) %>%
  add_recipe(lr_recipe_full)

lr_fit_full <- 
  lr_workflow_full %>%
  fit(data = train)

Memplot vip variabel

lr_fit_full %>%
  extract_fit_parsnip() %>%
  vip(aesthetics = list(fill = "steelblue"))

Kepentingan variabel Bagan kolom kepentingan variabel.

Rekayasa Fitur di R

Bangun model tereduksi dengan sintaks formula

Kita dapat menambahkan fitur langsung dengan sintaks formula R dasar.

# Buat recipe
recipe_formula <- 
  recipe(Loan_Status ~ Credit_History + Property_Area + 
           LoanAmount, data = train)

# Gabungkan dengan model
workflow_formula <- # Bundle with model
  workflow() %>% add_model(lr_model) %>%
  add_recipe(recipe_formula)
Rekayasa Fitur di R

Bangun model tereduksi dengan membuat vektor fitur

Vektor fitur dapat digunakan untuk memilih fitur sebelum pelatihan.

# Feature vector
features <- c("Credit_History", "Property_Area", "LoanAmount", "Loan_Status") 

# Data latih dan uji
train_features <- train %>% select(all_of(features))
test_features <- test %>% select(all_of(features))

# Buat recipe dan gabungkan dengan model
recipe_features <- recipe(Loan_Status ~., data = train_features)
workflow_features <- workflow() %>% add_model(lr_model) %>%
  add_recipe(recipe_features) 
Rekayasa Fitur di R

Membuat objek augmented

Objek augmented untuk kedua pendekatan

lr_aug_formula <-
  workflow_formula %>%
  fit(data = train) %>%
  augment(new_data = test)
lr_aug_features <-
  workflow_features %>%
  fit(data = train_features) %>%
  augment(new_data = test_features)

Keduanya memberi hasil yang sama

all_equal(lr_aug_features, 
lr_aug_formula %>%
select(all_of(features),
starts_with(".pred")))
[1] TRUE
Rekayasa Fitur di R

Membandingkan model penuh vs. tereduksi

Menggunakan semua fitur

lr_fit_full <- # Fit workflow
  lr_workflow_full %>%
  fit(data = train)
lr_aug_full <- # Augment
  lr_fit_full %>%
  augment(test)
lr_aug_full %>% # Evaluasi
  class_evaluate(truth = Loan_Status, 
                 estimate = .pred_class,
                 .pred_Y)
# A tibble: 2 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.842
2 roc_auc  binary         0.744

Menggunakan 3 fitur teratas*

lr_fit_formula <- # Fit workflow
  workflow_formula %>%
  fit(train)
lr_aug_formula <- # Augment
  lr_fit_formula %>%
  augment(new_data = test)
lr_aug_formula %>% # Evaluasi
  class_evaluate(truth = Loan_Status, 
                 estimate = .pred_class,
                 .pred_Y)
# A tibble: 2 × 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.842
2 roc_auc  binary         0.733
Rekayasa Fitur di R

Ayo berlatih!

Rekayasa Fitur di R

Preparing Video For Download...