Random forest models

Dimensionality Reduction in R

Matt Pickard

Owner, Pickard Predictives, LLC

Random Forest

  • An ensemble model
    • a "wisdom of the crowds" approach
  • Aggregates predictions of many random trees
  • Random uncorrelated trees mitigate error
  • Avoids overfitting
  • Accurate
  • Performs feature selection

A diagram showing an ensemble model consisting of several decision trees and how the their votes are combined into one final vote.

Dimensionality Reduction in R

Random Forest

This diagram shows how different subtrees are created using different subsets of features.

Dimensionality Reduction in R

Train a Random Forest

library(tidymodels)

rf <- rand_forest(mode = "classification", trees = 200) %>% set_engine("ranger", importance = "impurity")
rf_fit <- rf %>% fit(credit_score ~ ., data = train)
predict_df <- test %>% bind_cols(predict = predict(rf_fit, test))
Dimensionality Reduction in R

Evaluate the Model

f_meas(predict_df, credit_score, .pred_class)
0.6895
Dimensionality Reduction in R

Variable Importance

library(vip)

rf_fit %>% vip()

A variable importance bar chart.

Dimensionality Reduction in R

Feature Mask

top_features <- rf_fit %>% 
  vi(rank = TRUE) %>% 
  filter(Importance <= 10) %>% 
  pull(Variable)

top_features
 [1] "outstanding_debt"        "interest_rate"          
 [3] "delay_from_due_date"     "changed_credit_limit"   
 [5] "credit_history_months"   "num_credit_card"        
 [7] "monthly_balance"         "num_of_delayed_payment" 
 [9] "annual_income"           "amount_invested_monthly"
Dimensionality Reduction in R

Reduce the data

train_reduced <- train[top_features]
test_reduced <- test[top_features]
Dimensionality Reduction in R

Performance

rf_fit <- rf %>% 
  fit(credit_score ~ ., data = train_reduced) 

predict_reduced_df <- test_reduced %>% bind_cols(predict = predict(rf_fit, test_reduced))
f_meas(predict_reduced_df, credit_score, .pred_class)
0.6738 

F-score of the unreduced model:

0.6895 
Dimensionality Reduction in R

Let's practice!

Dimensionality Reduction in R

Preparing Video For Download...