Model random forest

Reduksi Dimensi di R

Matt Pickard

Owner, Pickard Predictives, LLC

Random Forest

  • Model ansambel
    • pendekatan “kebijaksanaan kerumunan”
  • Menggabungkan prediksi banyak pohon acak
  • Pohon acak tak berkorelasi mengurangi galat
  • Menghindari overfitting
  • Akurat
  • Melakukan seleksi fitur

Diagram yang menunjukkan model ansambel berisi beberapa pohon keputusan dan bagaimana suaranya digabungkan menjadi satu suara final.

Reduksi Dimensi di R

Random Forest

Diagram ini menunjukkan bagaimana subpohon berbeda dibuat dari subset fitur yang berbeda.

Reduksi Dimensi di R

Latih Random Forest

library(tidymodels)

rf <- rand_forest(mode = "classification", trees = 200) %>% set_engine("ranger", importance = "impurity")
rf_fit <- rf %>% fit(credit_score ~ ., data = train)
predict_df <- test %>% bind_cols(predict = predict(rf_fit, test))
Reduksi Dimensi di R

Evaluasi model

f_meas(predict_df, credit_score, .pred_class)
0.6895
Reduksi Dimensi di R

Pentingnya variabel

library(vip)

rf_fit %>% vip()

Bagan batang pentingnya variabel.

Reduksi Dimensi di R

Masker fitur

top_features <- rf_fit %>% 
  vi(rank = TRUE) %>% 
  filter(Importance <= 10) %>% 
  pull(Variable)

top_features
 [1] "outstanding_debt"        "interest_rate"          
 [3] "delay_from_due_date"     "changed_credit_limit"   
 [5] "credit_history_months"   "num_credit_card"        
 [7] "monthly_balance"         "num_of_delayed_payment" 
 [9] "annual_income"           "amount_invested_monthly"
Reduksi Dimensi di R

Kurangi data

train_reduced <- train[top_features]
test_reduced <- test[top_features]
Reduksi Dimensi di R

Kinerja

rf_fit <- rf %>% 
  fit(credit_score ~ ., data = train_reduced) 

predict_reduced_df <- test_reduced %>% bind_cols(predict = predict(rf_fit, test_reduced))
f_meas(predict_reduced_df, credit_score, .pred_class)
0.6738 

F-score model tanpa reduksi:

0.6895 
Reduksi Dimensi di R

Ayo berlatih!

Reduksi Dimensi di R

Preparing Video For Download...