Classificatie met random forests

Machine Learning in de tidyverse

Dmitriy (Dima) Gorenshteyn

Lead Data Scientist, Memorial Sloan Kettering Cancer Center

ranger() voor classificatie

cv_tune <- cv_data %>%
  crossing(mtry = c(2, 4, 8, 16)) 

cv_models_rf <- cv_tune %>% 
  mutate(model = map2(train, mtry, ~ranger(formula = Attrition~., 
                                           data = .x, mtry = .y,
                                           num.trees = 100, seed = 42)))
Machine Learning in de tidyverse

1) Werkelijke klassen voorbereiden

verloop klasse
Yes TRUE
No FALSE
validate$Attrition
No  No  No  No  No  Yes No  Yes ... No  No  No
validate_actual <- validate$Attrition == "Yes"
validate_actual 
FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE ... FALSE FALSE FALSE
Machine Learning in de tidyverse

2) Voorspelde klassen voorbereiden

P(verloop) klasse
Yes TRUE
No FALSE
validate_classes <- predict(rf_model, rf_validate)$predictions
validate_classes
No  No  No  No  No  Yes No  No ... No  No  No
validate_predicted <- validate_classes == "Yes"
validate_predicted
FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE ... FALSE FALSE FALSE
Machine Learning in de tidyverse

Bouw het beste verloopmodel

Machine Learning in de tidyverse

Preparing Video For Download...