Optimalkan boosted ensemble

Machine Learning dengan Model Berbasis Pohon di R

Sandro Raabe

Data Scientist

Titik awal: kinerja tanpa tuning

collect_metrics(cv_results)
# A tibble: 1 x 3
  .metric   .mean       n   
  <chr>     <dbl>   <int>             
1 roc_auc   0.951       5  
  • 95% — lumayan untuk model tanpa tuning!
Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

fungsi tune

Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

pembuatan grid

Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

cross-validation

Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

fungsi tune_grid

Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

fungsi select_best

Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

alur tuning

Machine Learning dengan Model Berbasis Pohon di R

Alur tuning

fungsi final fit

Machine Learning dengan Model Berbasis Pohon di R

Langkah 1: Buat spesifikasi tuning

# Create the specification with placeholders
boost_spec <- boost_tree(

trees = 500, learn_rate = tune(), tree_depth = tune(), sample_size = tune()) %>%
set_mode("classification") %>% set_engine("xgboost")
Boosted Tree Model Specification (classification)

Main Arguments:
  trees = 500
  tree_depth = tune()
  learn_rate = tune()
  sample_size = tune()

Computational engine: xgboost
  • Spesifikasi biasa
  • Perbedaan utama: gunakan tune() untuk membuat placeholder nilai yang akan di-tuning
Machine Learning dengan Model Berbasis Pohon di R

Langkah 2: Buat grid tuning

# Create a regular grid
tunegrid_boost <- grid_regular(parameters(boost_spec), 

levels = 2)
# A tibble: 8 x 3
  tree_depth     learn_rate sample_size
       <int>          <dbl>       <dbl>
1          1   0.0000000001         0.1
2         15   0.0000000001         0.1
3          1   0.1                  0.1
4         15   0.1                  0.1
5          1   0.0000000001         1  
6         15   0.0000000001         1  
7          1   0.1                  1  
8         15   0.1                  1
# Create a random grid
grid_random(parameters(boost_spec),

size = 8)
# A tibble: 8 x 3
  tree_depth    learn_rate   sample_size
       <int>         <dbl>         <dbl>
1         11   0.0000000249        0.858
2         12   0.00000000392       0.856
3         15   0.0000000131        0.220
4         15   0.0000216           0.125
5         10   0.00000000537       0.759
6         14   0.0395              0.270
7          2   0.000000828         0.904
8          9   0.0000254           0.473
Machine Learning dengan Model Berbasis Pohon di R

Langkah 3: Proses tuning

Argumen untuk tune_grid():

  • Spesifikasi dummy
  • Formula model
  • Resampling/fold
  • Grid parameter
  • Daftar metrik dalam metric_set()

Panggilan fungsi:

# Tune sepanjang grid
tune_results <- tune_grid(

boost_spec,
still_customer ~ .,
resamples = vfold_cv(customers_train, v = 6),
grid = tunegrid_boost,
metrics = metric_set(roc_auc))
Machine Learning dengan Model Berbasis Pohon di R

Visualisasikan hasil

# Plot the results
autoplot(tune_results)

hasil tuning

Machine Learning dengan Model Berbasis Pohon di R

Langkah 4: Finalkan model

# Select the final hyperparameters
best_params <- select_best(tune_results)

best_params
# A tibble: 1 x 4
  tree_depth learn_rate sample_size .config
       <int>      <dbl>       <dbl> <chr> 
1          8        0.1        0.55 Model17
# Finalize the specification
final_spec <- finalize_model(boost_spec, 
                             best_params)
final_spec
Boosted Tree Model Specification

Main Arguments:
  trees = 500
  tree_depth = 8
  learn_rate = 0.1
  sample_size = 0.55

Computational engine: xgboost
Machine Learning dengan Model Berbasis Pohon di R

Langkah terakhir: Latih model final

final_model <- final_spec %>% fit(formula = still_customer ~ .,
                                  data = customers_train)

final_model
Fit time:  2.3s 
##### xgb.Booster
raw: 343.8 Kb 
nfeatures : 37 
evaluation_log:
    iter training_error
       1       0.046403                  
     100       0.002592
Machine Learning dengan Model Berbasis Pohon di R

Giliran Anda!

Machine Learning dengan Model Berbasis Pohon di R

Preparing Video For Download...