Credit Risk Modeling in R
Lore Dirick
Manager of Data Science Curriculum at Flatiron School
str(training_set)
'data.frame':\t19394 obs. of 8 variables:
$ loan_status : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ loan_amnt : int 25000 16000 8500 9800 3600 6600 3000 7500 6000 22750 ...
$ grade : Factor w/ 7 levels "A","B","C","D",..: 2 4 1 2 1 1 1 2 1 1 ...
$ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 1 1 1 3 4 3 4 1 ...
$ annual_inc : num 91000 45000 110000 102000 40000 ...
$ age : int 34 25 29 24 59 35 24 24 26 25 ...
$ emp_cat : Factor w/ 5 levels "0-15","15-30",..: 1 1 1 1 1 2 1 1 1 1 ...
$ ir_cat : Factor w/ 5 levels "0-8","11-13.5",..: 2 3 1 4 1 1 1 4 1 1 ...
$$P({\text{loan status}}=1|x_1,...,x_m) = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_m x_m)}}$$
loan_amnt grade age annual_inc home_ownership emp_cat ir_cat
$\beta_0,...\beta_m$: Parameters to be estimated
$\beta_0 + \beta_1 x_1 + ... + \beta_m x_m$: Linear predictor
log_model <- glm(loan_status ~ age ,
family= "binomial", data = training_set)
log_model
Call: glm(formula = loan_status ~ age,
family = "binomial", data = training_set)
Coefficients:
(Intercept) age
-1.793566 -0.009726
Degrees of Freedom: 19393 Total (i.e. Null); 19392 Residual
Null Deviance:\t 13680
Residual Deviance: 13670 \tAIC: 13670
$$P({\text{loan status}}=1|\text{age}) = \frac{1}{1+e^{-(\hat{\beta_0} + \hat{\beta_1} \text{age})}}$$
$$P({\text{loan status}}=1|x_1,...,x_m) = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_m x_m)}} = \frac{e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}{1 + e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}$$
$$
$$P({\text{loan status}}=0|x_1,...,x_m) = 1- \frac{e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}{1 + e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}} = \frac{1}{1+e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}$$
$$
$$\frac{P({\text{loan status}}=1|x_1,...,x_m)}{P({loan \space status}=0|x_1,...,x_m)} = e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}$$
loan_status = 1
Applied to our model:
age
goes up by 1Credit Risk Modeling in R