Inputselectie op basis van de AUC

Kredietrisicomodellering in R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

ROC-curves voor 4 logistische regressiemodellen

Schermafbeelding 22-06-2020 om 18.31.53.png

Kredietrisicomodellering in R

ROC-curves voor 4 logistische regressiemodellen

Schermafbeelding 22-06-2020 om 18.31.44.png

Kredietrisicomodellering in R

ROC-curves voor 4 logistische regressiemodellen

Schermafbeelding 22-06-2020 om 18.31.33.png

Kredietrisicomodellering in R

Snoeien op basis van AUC

1) Begin met een model met alle variabelen (hier 7) en bereken de AUC

log_model_full <- glm(loan_status ~ loan_amnt + grade + home_ownership + 
                      annual_inc + age + emp_cat + ir_cat, 
                      family = "binomial", data = training_set)

predictions_model_full <- predict(log_model_full, 
                                  newdata = test_set, type ="response")

AUC_model_full <- auc(test_set$loan_status, predictions_model_full)
Area under the curve: 0.6512
Kredietrisicomodellering in R

2) Bouw 7 nieuwe modellen, waarbij je telkens één variabele weglaat, en maak PD-voorspellingen op de testset

log_1_remove_amnt <- glm(loan_status ~ grade + home_ownership + annual_inc + age + emp_cat + ir_cat,
                         family = "binomial",
                         data = training_set)

log_1_remove_grade <- glm(loan_status ~ loan_amnt + home_ownership + annual_inc + age + emp_cat + ir_cat,
                          family = "binomial",
                          data = training_set)

log_1_remove_home <- glm(loan_status ~ loan_amnt + grade + annual_inc + age + emp_cat + ir_cat,
                         family = "binomial",
                         data = training_set)

pred_1_remove_amnt <- predict(log_1_remove_amnt, newdata = test_set, type = "response")
pred_1_remove_grade <- predict(log_1_remove_grade, newdata = test_set, type = "response")
pred_1_remove_home <- predict(log_1_remove_home, newdata = test_set, type = "response")
...
Kredietrisicomodellering in R

3) Behoud het model met de beste AUC (AUC volledig model: 0,6512)

auc(test_set$loan_status, pred_1_remove_amnt)
Area under the curve: 0.6537
auc(test_set$loan_status, pred_1_remove_grade)
Area under the curve: 0.6438
auc(test_set$loan_status, pred_1_remove_home)
Area under the curve: 0.6537

4) Herhaal tot de AUC (significant) daalt

Kredietrisicomodellering in R

Laten we oefenen!

Kredietrisicomodellering in R

Preparing Video For Download...