Pemodelan dengan tidymodels di R
David Svancer
Data Scientist
Fungsi collect_metrics() secara bawaan memberi ringkasan hasil
summarize = FALSE akan menampilkan semua hasil tuning hyperparameterdt_tuning %>%
collect_metrics(summarize = FALSE)
# A tibble: 150 x 8
id cost_complexity tree_depth min_n .metric ... .estimate .config
<chr> <dbl> <int> <int> <chr> ... <dbl> <chr>
Fold01 0.0000000758 14 39 sens ... 0.75 Model1
Fold01 0.0000000758 14 39 spec ... 0.906 Model1
Fold01 0.0000000758 14 39 roc_auc ... 0.888 Model1
..... ............ .. .. ...... ... ..... ......
Fold10 0.00380 5 36 roc_auc ... 0.789 Model5
Memilih summarise = FALSE pada collect_metrics() mengembalikan tibble
dplyrroc_aucid.estimatedt_tuning %>% collect_metrics(summarize = FALSE) %>%filter(.metric == 'roc_auc') %>%group_by(id) %>%summarize(min_roc_auc = min(.estimate), median_roc_auc = median(.estimate), max_roc_auc = max(.estimate))
# A tibble: 10 x 4
id min_roc_auc median_roc_auc max_roc_auc
<chr> <dbl> <dbl> <dbl>
Fold01 0.830 0.885 0.888
Fold02 0.857 0.882 0.885
Fold03 0.818 0.836 0.836
...... .... .... ....
Fold10 0.762 0.790 0.813
Fungsi show_best()
n model teratas berdasarkan nilai rata-rata metricModel1 adalah pemenangnyadt_tuning %>%
show_best(metric = 'roc_auc', n = 5)
# A tibble: 5 x 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
0.0000000758 14 39 roc_auc binary 0.827 10 0.0147 Model1
0.00380 5 36 roc_auc binary 0.825 10 0.0146 Model5
0.0243 5 34 roc_auc binary 0.823 10 0.0147 Model2
0.00000443 11 8 roc_auc binary 0.816 10 0.00786 Model3
0.000000600 3 5 roc_auc binary 0.814 10 0.0131 Model4
Fungsi select_best()
dt_tuning ke select_best()metric untuk mengevaluasi performa
Mengembalikan tibble dengan model dan nilai hyperparameter terbaik
best_dt_model <- dt_tuning %>% select_best(metric = 'roc_auc')best_dt_model
# A tibble: 1 x 4
cost_complexity tree_depth min_n .config
<dbl> <int> <int> <chr>
0.0000000758 14 39 Model1
Fungsi finalize_workflow() memfinalisasi workflow yang memuat objek model dengan parameter tuning
workflow
Mengembalikan objek workflow dengan nilai hyperparameter yang ditetapkan
final_leads_wkfl <- leads_tune_wkfl %>% finalize_workflow(best_dt_model)final_leads_wkfl
== Workflow ========================================
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor ------------------------------------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model --------------------------------------------
Decision Tree Model Specification (classification)
Main Arguments:
cost_complexity = 0.0000000758
tree_depth = 14
min_n = 39
Computational engine: rpart
Objek workflow final dapat dilatih dengan last_fit() dan objek split data asli, leads_split
Di balik layar
recipe dilatih dan diterapkanleads_final_fit <- final_leads_wkfl %>% last_fit(split = leads_split)leads_final_fit %>% collect_metrics()
# A tibble: 2 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.771
2 roc_auc binary 0.793
Pemodelan dengan tidymodels di R