Selecting the best model

Modeling with tidymodels in R

David Svancer

Data Scientist

Detailed tuning results

The collect_metrics() function provides summarized results by default

  • Passing summarize = FALSE will provide all hyperparameter tuning results
dt_tuning %>% 
  collect_metrics(summarize = FALSE)
# A tibble: 150 x 8
 id     cost_complexity tree_depth min_n .metric  ...  .estimate  .config
<chr>        <dbl>         <int>   <int>  <chr>   ...    <dbl>      <chr>  
Fold01    0.0000000758     14       39    sens    ...     0.75     Model1 
Fold01    0.0000000758     14       39    spec    ...     0.906    Model1 
Fold01    0.0000000758     14       39    roc_auc ...     0.888    Model1 
.....     ............     ..       ..    ......  ...     .....    ......
Fold10    0.00380          5        36    roc_auc ...     0.789    Model5
Modeling with tidymodels in R

Exploring tuning results

Selecting summarise = FALSE within collect_metrics() returns a tibble

  • Easy to explore results with dplyr
  • Exploring ROC AUC
    • Select roc_auc metric
    • Form groups by id column
    • Calculate .estimate summary statistics
dt_tuning %>% 
  collect_metrics(summarize = FALSE) %>% 

filter(.metric == 'roc_auc') %>%
group_by(id) %>%
summarize(min_roc_auc = min(.estimate), median_roc_auc = median(.estimate), max_roc_auc = max(.estimate))
# A tibble: 10 x 4
 id     min_roc_auc  median_roc_auc  max_roc_auc
<chr>      <dbl>          <dbl>       <dbl>
Fold01     0.830          0.885       0.888
Fold02     0.857          0.882       0.885
Fold03     0.818          0.836       0.836
......     ....           ....        ....
Fold10     0.762          0.790       0.813
Modeling with tidymodels in R

Viewing the best performing models

The show_best() function

  • Displays the top n performing models based on average value of metric
  • Model1 is the winner
dt_tuning %>% 
  show_best(metric = 'roc_auc', n = 5)
# A tibble: 5 x 9
cost_complexity  tree_depth  min_n  .metric .estimator   mean    n    std_err  .config
    <dbl>           <int>    <int>    <chr>   <chr>      <dbl>  <int>  <dbl>    <chr>
0.0000000758         14       39     roc_auc  binary     0.827   10   0.0147   Model1 
0.00380               5       36     roc_auc  binary     0.825   10   0.0146   Model5 
0.0243                5       34     roc_auc  binary     0.823   10   0.0147   Model2 
0.00000443           11       8      roc_auc  binary     0.816   10   0.00786  Model3 
0.000000600           3       5      roc_auc  binary     0.814   10   0.0131   Model4
Modeling with tidymodels in R

Selecting a model

The select_best() function

  • Pass dt_tuning results to select_best()
  • Select the metric on which to evaluate performance

 

Returns a tibble with the best performing model and hyperparameter values

best_dt_model <- dt_tuning %>% 
  select_best(metric = 'roc_auc')

best_dt_model

 

# A tibble: 1 x 4
cost_complexity tree_depth  min_n  .config
     <dbl>         <int>    <int>   <chr>  
0.0000000758        14       39     Model1
Modeling with tidymodels in R

Finalizing the workflow

The finalize_workflow() function will finalize a workflow that contains a model object with tuning parameters

  • Pass workflow object
  • A tibble with one row of final model hyperparameter values
    • Column names must match hyperparameters in model object

 

Returns a workflow object with set hyperparameter values

final_leads_wkfl <- leads_tune_wkfl %>% 
  finalize_workflow(best_dt_model)

final_leads_wkfl
== Workflow ========================================
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor ------------------------------------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model --------------------------------------------
Decision Tree Model Specification (classification)
Main Arguments:
  cost_complexity = 0.0000000758
  tree_depth = 14
  min_n = 39
Computational engine: rpart
Modeling with tidymodels in R

Model fitting

Finalized workflow object can be trained with last_fit() and original data split object, leads_split

 

Behind the scenes

  • Training and test datasets created
  • recipe trained and applied
  • Tuned decision tree trained with entire training dataset
  • Predictions and metrics on test data
leads_final_fit <- final_leads_wkfl %>% 
  last_fit(split = leads_split)

leads_final_fit %>% collect_metrics()

 

# A tibble: 2 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.771
2 roc_auc  binary         0.793
Modeling with tidymodels in R

Let's practice!

Modeling with tidymodels in R

Preparing Video For Download...