Logistic regression: predicting the probability of default

Credit Risk Modeling in R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

An example with "age" and "home ownership"

log_model_small <- glm(loan_status ~ age + home_ownership,  family = "binomial", data = training_set)
log_model_small
Call:  glm(formula = loan_status ~ age + home_ownership, 
           family = "binomial", data = training_set)
Coefficients:
 (Intercept)             age          home_ownershipOTHER    home_ownershipOWN   home_ownershipRENT  
 -1.886396            -0.009308             0.129776            -0.019384             0.158581  
Degrees of Freedom: 19393 Total (i.e. Null);  19389 Residual
Null Deviance:    13680 
Residual Deviance: 13660 AIC: 13670

$$P({\text{loan status}}=1|\text{age}, \text{home ownership}) = \frac{1}{1+e^{-(\hat{\beta_0} + \hat{\beta_1} \text{age} + \hat{\beta_2} \text{OTHER} + \hat{\beta_3} \text{OWN} +\hat{\beta_4} \text{RENT} )}}$$

Credit Risk Modeling in R

Test set example

$P({\text{loan status}}=1|\text{age} = 33, \text{home ownership} = \text{RENT}) $

$= \dfrac{1}{1+e^{-(\hat{\beta_0} + \hat{\beta_1} 33 + \hat{\beta_2} 0 + \hat{\beta_3} 0 +\hat{\beta_4} 1 )}}$

$= \dfrac{1}{1+e^{(-(1.886396 + (-0.009308) \times 33 + (0.158581) \times 1))}}$

$= 0.115579$

Credit Risk Modeling in R
test_case <- as.data.frame(test_set[1,])
test_case
  loan_status loan_amnt  grade home_ownership annual_inc  age  emp_cat   ir_cat
1      0        5000       B         RENT        24000     33    0-15     8-11
predict(log_model_small, newdata = test_case)
       1 
-2.03499

$${-\hat{\beta_0} + \hat{\beta_1} age + \hat{\beta_2} \text{OTHER} + \hat{\beta_3} \text{OWN} +\hat{\beta_4} \text{RENT} }$$

predict(log_model_small, newdata = test_case, type = "response")
        1 
0.1155779
Credit Risk Modeling in R

Let's practice!

Credit Risk Modeling in R

Preparing Video For Download...