Regresi logistik: pengantar

Pemodelan Risiko Kredit di R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

Struktur data akhir

str(training_set)
'data.frame':\t19394 obs. of  8 variables:
 $ loan_status   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ loan_amnt     : int  25000 16000 8500 9800 3600 6600 3000 7500 6000 22750 ...
 $ grade         : Factor w/ 7 levels "A","B","C","D",..: 2 4 1 2 1 1 1 2 1 1 ...
 $ home_ownership: Factor w/ 4 levels "MORTGAGE","OTHER",..: 4 4 1 1 1 3 4 3 4 1 ...
 $ annual_inc    : num  91000 45000 110000 102000 40000 ...
 $ age           : int  34 25 29 24 59 35 24 24 26 25 ...
 $ emp_cat       : Factor w/ 5 levels "0-15","15-30",..: 1 1 1 1 1 2 1 1 1 1 ...
 $ ir_cat        : Factor w/ 5 levels "0-8","11-13.5",..: 2 3 1 4 1 1 1 4 1 1 ...
Pemodelan Risiko Kredit di R

Apa itu regresi logistik?

  • Model regresi dengan keluaran antara 0 dan 1

$$P({\text{loan status}}=1|x_1,...,x_m) = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_m x_m)}}$$

  • $x_1,...,x_m$:
loan_amnt  grade  age  annual_inc  home_ownership  emp_cat  ir_cat
  • $\beta_0,...\beta_m$: Parameter yang diestimasi

  • $\beta_0 + \beta_1 x_1 + ... + \beta_m x_m$: Prediktor linear

Pemodelan Risiko Kredit di R

Memasang model logistik di R

log_model <- glm(loan_status ~ age , 
                 family= "binomial", data = training_set)
log_model
Call:  glm(formula = loan_status ~ age, 
           family = "binomial", data = training_set)
Coefficients:
(Intercept)          age  
  -1.793566    -0.009726  
Degrees of Freedom: 19393 Total (i.e. Null);  19392 Residual
Null Deviance:\t    13680 
Residual Deviance: 13670 \tAIC: 13670

$$P({\text{loan status}}=1|\text{age}) = \frac{1}{1+e^{-(\hat{\beta_0} + \hat{\beta_1} \text{age})}}$$

Pemodelan Risiko Kredit di R

Probabilitas gagal bayar

$$P({\text{loan status}}=1|x_1,...,x_m) = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_m x_m)}} = \frac{e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}{1 + e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}$$

$$

$$P({\text{loan status}}=0|x_1,...,x_m) = 1- \frac{e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}{1 + e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}} = \frac{1}{1+e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}}$$

$$

$$\frac{P({\text{loan status}}=1|x_1,...,x_m)}{P({loan \space status}=0|x_1,...,x_m)} = e^{\beta_0 + \beta_1 x_1 + ... + \beta_m x_m}$$

  • Odds untuk loan_status = 1
Pemodelan Risiko Kredit di R

Interpretasi koefisien

  • Jika variabel $x_j$ naik 1 satuan
    • Odds dikalikan $e^{\beta j}$
  • $\beta_j < 0$
    • $e^{\beta j} < 1$
    • Odds turun saat $x_j$ naik
  • $\beta_j > 0$
    • $e^{\beta j} > 1$
    • Odds naik saat $x_j$ naik

Diterapkan pada model kita:

  • Jika variabel age naik 1
    • Odds dikalikan $e^{-0.009726}$
    • Odds dikalikan 0.991
Pemodelan Risiko Kredit di R

Ayo berlatih!

Pemodelan Risiko Kredit di R

Preparing Video For Download...