Credit Risk Modeling in Python
Michael Crabtree
Data Scientist, Ford Motor Company
loan_status
probability of defaultLoan | True loan status | Pred. Loan Status | Loan payoff value | Selling Value | Gain/Loss |
---|---|---|---|---|---|
1 | 0 | 1 | $1,500 | $250 | -$1,250 |
2 | 0 | 1 | $1,200 | $250 | -$950 |
xgboost
Python package, called xgb
here.fit()
just like the logistic regression model# Create a logistic regression model
clf_logistic = LogisticRegression()
# Train the logistic regression
clf_logistic.fit(X_train, np.ravel(y_train))
# Create a gradient boosted tree model
clf_gbt = xgb.XGBClassifier()
# Train the gradient boosted tree
clf_gbt.fit(X_train,np.ravel(y_train))
.predict()
and .predict_proba()
.predict_proba()
produces a value between 0
and 1
.predict()
produces a 1
or 0
for loan_status
# Predict probabilities of default
gbt_preds_prob = clf_gbt.predict_proba(X_test)
# Predict loan_status as a 1 or 0
gbt_preds = clf_gbt.predict(X_test)
# gbt_preds_prob
array([[0.059, 0.940], [0.121, 0.989]])
# gbt_preds
array([1, 1, 0...])
learning_rate
: smaller values make each step more conservativemax_depth
: sets how deep each tree can go, larger means more complexxgb.XGBClassifier(learning_rate = 0.2,
max_depth = 4)
Credit Risk Modeling in Python