Building and tuning a random forest model

Machine Learning nel tidyverse

Dmitriy (Dima) Gorenshteyn

Lead Data Scientist, Memorial Sloan Kettering Cancer Center

Cross Validation Performance

Machine Learning nel tidyverse

Cross Validation Performance

Machine Learning nel tidyverse

Cross Validation Performance

Machine Learning nel tidyverse

Cross Validation Performance

Machine Learning nel tidyverse

Linear Regression Model

 

Validate Mean Absolute Error:

1.5 Years

Machine Learning nel tidyverse

Another Model

Machine Learning nel tidyverse

Random Forest Benefits

  • Can handle non-linear relationships
  • Can handle interactions
Machine Learning nel tidyverse

Basic Random Forest Tools

Model
rf_model <- ranger(formula = ___, data = ___, seed = ___)

 

Prediction
prediction <- predict(rf_model, new_data)$predictions
Machine Learning nel tidyverse

Build Basic Random Forest Models

library(ranger)
cv_models_rf <- cv_data %>% 
 mutate(model = map(train, ~ranger(formula = life_expectancy~., 
                                    data = .x, seed = 42)))
cv_prep_rf <- cv_models_rf %>% 
 mutate(validate_predicted = map2(model, validate, 
                                  ~predict(.x, .y)$predictions))
Machine Learning nel tidyverse

ranger Hyper-Parameters

Model
rf_model <- ranger(formula, data, seed, mtry, num.trees)
Hyper-Parameters
name range default
mtry $1:number\ of\ features$ $\sqrt{number\ of\ feat}$
num.trees $1:\infty$ $500$
Machine Learning nel tidyverse

Tune The Hyper-Parameters

cv_tune <- cv_data %>% 
  crossing(mtry = 1:5)

cv_tune
# A tibble: 25 x 5
   splits       id    train                validate            mtry
   <list>       <chr> <list>               <list>             <int>
 1 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     1
 2 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     2
 3 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     3
 4 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     4
 5 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     5
 6 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [601 × 7]>     1
 7 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [601 × 7]>     2
Machine Learning nel tidyverse

Tune The Hyper-Parameters

cv_model_tunerf <- cv_tune %>% 
  mutate(model = map2(train, mtry, ~ranger(formula = life_expectancy~., 
                                           data = .x, mtry = .y)))

cv_model_tunerf
# A tibble: 25 x 6
   splits       id    train                validate      mtry  model       
 * <list>       <chr> <list>               <list>        <int> <list>      
 1 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   1    <S3: ranger>
 2 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   2    <S3: ranger>
 3 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   3    <S3: ranger>
 4 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   4    <S3: ranger>
 5 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   5    <S3: ranger>
 6 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [60…   1    <S3: ranger>
 7 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [60…   2    <S3: ranger>
Machine Learning nel tidyverse

Let's practice!

Machine Learning nel tidyverse

Preparing Video For Download...