Logistic regression for probability of default

Credit Risk Modeling in Python

Michael Crabtree

Data Scientist, Ford Motor Company

Probability of default

The likelihood that someone will default on a loan is the probability of default
A probability value between 0 and 1 like 0.86
loan_status of 1 is a default or 0 for non-default

The likelihood that someone will default on a loan is the probability of default
A probability value between 0 and 1 like 0.86
loan_status of 1 is a default or 0 for non-default

Probabilities of default as an outcome from machine learning
- Learn from data in columns (features)
Classification models (default, non-default)
Two most common models:
- Logistic regression
- Decision tree

Example of logistic regression and decision tree

Formula for linear regression and logistic regression

Example graph of linear regression and logistic regression

from sklearn.linear_model import LogisticRegression

clf_logistic = LogisticRegression(solver='lbfgs')

clf_logistic.fit(training_columns, np.ravel(training_labels))

Data Subset	Usage	Portion
Train	Learn from the data to generate predictions	60%
Test	Test learning on new unseen data	40%

X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123)

Credit Risk Modeling in Python