Pemodelan dengan tidymodels di R
David Svancer
Data Scientist
Parameter model yang nilainya ditetapkan sebelum pelatihan dan mengontrol kompleksitas model
parsnip decision tree
cost_complexitytree_depthmin_n
Fungsi decision_tree() menetapkan nilai hiperparameter default
cost_complexity disetel ke 0.01tree_depth disetel ke 30min_n disetel ke 20Ini mungkin bukan nilai terbaik untuk semua dataset
dt_model <- decision_tree() %>%
set_engine('rpart') %>%
set_mode('classification')
Fungsi tune() dari paket tune
tune() dalam spesifikasi model parsnipdt_tune_model <- decision_tree(cost_complexity = tune(), tree_depth = tune(), min_n = tune()) %>% set_engine('rpart') %>% set_mode('classification')dt_tune_model
Decision Tree Model Specification (classification)
Main Arguments:
cost_complexity = tune()
tree_depth = tune()
min_n = tune()
Computational engine: rpart
Objek workflow mudah diperbarui
leads_wkfl sebelumnyaleads_wkfl ke update_model() dan sediakan model decision tree baru dengan parameter tuningleads_tune_wkfl <- leads_wkfl %>%update_model(dt_tune_model)leads_tune_wkfl
== Workflow ===============
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor -----------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model ------------------
Decision Tree Model Specification (classification)
Main Arguments: cost_complexity = tune()
tree_depth = tune()
min_n = tune()
Computational engine: rpart
Metode paling umum untuk menala hiperparameter
| cost_complexity | tree_depth | min_n |
|---|---|---|
| 0.001 | 20 | 35 |
| 0.001 | 20 | 15 |
| 0.001 | 35 | 35 |
| 0.001 | 35 | 15 |
| 0.2 | 20 | 35 |
| ... | ... | ... |
Fungsi parameters() dari paket dials
parsniptune(), jika adadialsparameters(dt_tune_model)
Collection of 3 parameters for tuning
identifier type object
cost_complexity cost_complexity nparam[+]
tree_depth tree_depth nparam[+]
min_n min_n nparam[+]
Membangkitkan kombinasi acak
Fungsi grid_random()
parameters()size menetapkan jumlah kombinasi acak yang dibangkitkanset.seed() sebelum grid_random() agar replikabelset.seed(214) grid_random(parameters(dt_tune_model),size = 5)
# A tibble: 5 x 3
cost_complexity tree_depth min_n
<dbl> <int> <int>
1 0.0000000758 14 39
2 0.0243 5 34
3 0.00000443 11 8
4 0.000000600 3 5
5 0.00380 5 36
Langkah pertama dalam penalaan hiperparameter
dt_grid berisi 5 kombinasi acak nilai hiperparameterset.seed(214) dt_grid <- grid_random(parameters(dt_tune_model), size = 5)dt_grid
# A tibble: 5 x 3
cost_complexity tree_depth min_n
<dbl> <int> <int>
1 0.0000000758 14 39
2 0.0243 5 34
3 0.00000443 11 8
4 0.000000600 3 5
5 0.00380 5 36
Fungsi tune_grid() melakukan penalaan hiperparameter
Menerima argumen berikut:
workflow atau model parsnipresamplesgridmetrics opsionalMengembalikan tibble hasil
.metricsdt_tuning <- leads_tune_wkfl %>%tune_grid(resamples = leads_folds,grid = dt_grid,metrics = leads_metrics)
dt_tuning
# Tuning results
# 10-fold cross-validation using stratification
# A tibble: 10 x 4
splits id .metrics ..
<list> <chr> <list> ..
<split [896/100]> Fold01 <tibble [15 x 7]> ..
................ ...... ............... ..
<split [897/99]> Fold09 <tibble [15 x 7]> ..
<split [897/99]> Fold10 <tibble [15 x 7]> ..
Fungsi collect_metrics() secara default memberi ringkasan hasil
dt_tuning %>%
collect_metrics()
# A tibble: 15 x 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0.0000000758 14 39 roc_auc binary 0.827 10 0.0147 Model1
2 0.0000000758 14 39 sens binary 0.728 10 0.0277 Model1
3 0.0000000758 14 39 spec binary 0.865 10 0.0156 Model1
4 0.0243 5 34 roc_auc binary 0.823 10 0.0147 Model2
. ...... .. .. .... ...... ..... .. ..... ......
14 0.00380 5 36 sens binary 0.747 10 0.0209 Model5
15 0.00380 5 36 spec binary 0.858 10 0.0161 Model5
Pemodelan dengan tidymodels di R