Input selection based on the AUC

Credit Risk Modeling in R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

ROC curves for 4 logistic regression models

Screen Shot 2020-06-22 at 6.31.53 PM.png

ROC curves for 4 logistic regression models

Screen Shot 2020-06-22 at 6.31.44 PM.png

ROC curves for 4 logistic regression models

Screen Shot 2020-06-22 at 6.31.33 PM.png

AUC-based pruning

1) Start with a model including all variables (in our case, 7) and compute AUC

log_model_full <- glm(loan_status ~ loan_amnt + grade + home_ownership + 
                      annual_inc + age + emp_cat + ir_cat, 
                      family = "binomial", data = training_set)

predictions_model_full <- predict(log_model_full, 
                                  newdata = test_set, type ="response")

AUC_model_full <- auc(test_set$loan_status, predictions_model_full)

Area under the curve: 0.6512

2) Build 7 new models, where each time one of the variables is removed, and make PD-predictions using the test set

log_1_remove_amnt <- glm(loan_status ~ grade + home_ownership + annual_inc + age + emp_cat + ir_cat,
                         family = "binomial",
                         data = training_set)

log_1_remove_grade <- glm(loan_status ~ loan_amnt + home_ownership + annual_inc + age + emp_cat + ir_cat,
                          family = "binomial",
                          data = training_set)

log_1_remove_home <- glm(loan_status ~ loan_amnt + grade + annual_inc + age + emp_cat + ir_cat,
                         family = "binomial",
                         data = training_set)

pred_1_remove_amnt <- predict(log_1_remove_amnt, newdata = test_set, type = "response")
pred_1_remove_grade <- predict(log_1_remove_grade, newdata = test_set, type = "response")
pred_1_remove_home <- predict(log_1_remove_home, newdata = test_set, type = "response")
...

3) Keep the model that led to the best AUC (AUC full model: 0.6512)

auc(test_set$loan_status, pred_1_remove_amnt)

Area under the curve: 0.6537

auc(test_set$loan_status, pred_1_remove_grade)

Area under the curve: 0.6438

auc(test_set$loan_status, pred_1_remove_home)

Area under the curve: 0.6537

4) Repeat until AUC decreases (significantly)

Let's practice!

Credit Risk Modeling in R