Building and tuning a random forest model

Tidyverse ile Machine Learning

Dmitriy (Dima) Gorenshteyn

Lead Data Scientist, Memorial Sloan Kettering Cancer Center

Cross Validation Performance

Tidyverse ile Machine Learning

Cross Validation Performance

Tidyverse ile Machine Learning

Cross Validation Performance

Tidyverse ile Machine Learning

Cross Validation Performance

Tidyverse ile Machine Learning

Linear Regression Model

 

Validate Mean Absolute Error:

1.5 Years

Tidyverse ile Machine Learning

Another Model

Tidyverse ile Machine Learning

Random Forest Benefits

  • Can handle non-linear relationships
  • Can handle interactions
Tidyverse ile Machine Learning

Basic Random Forest Tools

Model
rf_model <- ranger(formula = ___, data = ___, seed = ___)

 

Prediction
prediction <- predict(rf_model, new_data)$predictions
Tidyverse ile Machine Learning

Build Basic Random Forest Models

library(ranger)
cv_models_rf <- cv_data %>% 
 mutate(model = map(train, ~ranger(formula = life_expectancy~., 
                                    data = .x, seed = 42)))
cv_prep_rf <- cv_models_rf %>% 
 mutate(validate_predicted = map2(model, validate, 
                                  ~predict(.x, .y)$predictions))
Tidyverse ile Machine Learning

ranger Hyper-Parameters

Model
rf_model <- ranger(formula, data, seed, mtry, num.trees)
Hyper-Parameters
name range default
mtry $1:number\ of\ features$ $\sqrt{number\ of\ feat}$
num.trees $1:\infty$ $500$
Tidyverse ile Machine Learning

Tune The Hyper-Parameters

cv_tune <- cv_data %>% 
  crossing(mtry = 1:5)

cv_tune
# A tibble: 25 x 5
   splits       id    train                validate            mtry
   <list>       <chr> <list>               <list>             <int>
 1 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     1
 2 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     2
 3 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     3
 4 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     4
 5 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [601 × 7]>     5
 6 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [601 × 7]>     1
 7 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [601 × 7]>     2
Tidyverse ile Machine Learning

Tune The Hyper-Parameters

cv_model_tunerf <- cv_tune %>% 
  mutate(model = map2(train, mtry, ~ranger(formula = life_expectancy~., 
                                           data = .x, mtry = .y)))

cv_model_tunerf
# A tibble: 25 x 6
   splits       id    train                validate      mtry  model       
 * <list>       <chr> <list>               <list>        <int> <list>      
 1 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   1    <S3: ranger>
 2 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   2    <S3: ranger>
 3 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   3    <S3: ranger>
 4 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   4    <S3: ranger>
 5 <S3: rsplit> Fold1 <tibble [2,402 × 7]> <tibble [60…   5    <S3: ranger>
 6 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [60…   1    <S3: ranger>
 7 <S3: rsplit> Fold2 <tibble [2,402 × 7]> <tibble [60…   2    <S3: ranger>
Tidyverse ile Machine Learning

Let's practice!

Tidyverse ile Machine Learning

Preparing Video For Download...