Credit model performance

Credit Risk Modeling in Python

Michael Crabtree

Data Scientist, Ford Motor Company

Model accuracy scoring

Calculate accuracy

Formula for accuracy

Use the .score() method from scikit-learn

# Check the accuracy against the test data
clf_logistic1.score(X_test,y_test)

0.81

81% of values for loan_status predicted correctly

ROC curve charts

Receiver Operating Characteristic curve
- Plots true positive rate (sensitivity) against false positive rate (fall-out)

fallout, sensitivity, thresholds = roc_curve(y_test, prob_default)
plt.plot(fallout, sensitivity, color = 'darkorange')

Example ROC chart

Analyzing ROC charts

Area Under Curve (AUC): area between curve and random prediction

ROC chart example with annotation for lift and AUC

Default thresholds

Threshold: at what point a probability is a default

Diagram of probability threshold

Setting the threshold

Relabel loans based on our threshold of 0.5

preds = clf_logistic.predict_proba(X_test)
preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default'])
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0)

Data sample with probabilities and loan status

Credit classification reports

classification_report() within scikit-learn

from sklearn.metrics import classification_report
classification_report(y_test, preds_df['loan_status'], target_names=target_names)

Example classification report

Selecting classification metrics

Select and store specific components from the classification_report()
Use the precision_recall_fscore_support() function from scikit-learn

Example classification report with default recall

from sklearn.metrics import precision_recall_fscore_support
precision_recall_fscore_support(y_test,preds_df['loan_status'])[1][1]

Let's practice!

Credit Risk Modeling in Python