Column selection for credit risk

Credit Risk Modeling in Python

Michael Crabtree

Data Scientist, Ford Motor Company

Choosing specific columns

  • We've been using all columns for predictions
# Selects a few specific columns
X_multi = cr_loan_prep[['loan_int_rate','person_emp_length']]
# Selects all data except loan_status
X = cr_loan_prep.drop('loan_status', axis = 1)
  • How you can tell how important each column is
    • Logistic Regression: column coefficients
    • Gradient Boosted Trees: ?
Credit Risk Modeling in Python

Column importances

  • Use the .get_booster() and .get_score() methods
    • Weight: the number of times the column appears in all trees
# Train the model
clf_gbt.fit(X_train,np.ravel(y_train))
# Print the feature importances
clf_gbt.get_booster().get_score(importance_type = 'weight')
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}
Credit Risk Modeling in Python

Column importance interpretation

# Column importances from importance_type = 'weight'
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}

Decision tree using XGBoost

Credit Risk Modeling in Python

Plotting column importances

  • Use the plot_importance() function
xgb.plot_importance(clf_gbt, importance_type = 'weight')
{'person_income': 315, 'loan_int_rate': 195, 'loan_percent_income': 146}

Plot of feature importances

Credit Risk Modeling in Python

Choosing training columns

  • Column importance is used to sometimes decide which columns to use for training
  • Different sets affect the performance of the models
Columns Importances Model Accuracy Model Default Recall
loan_int_rate, person_emp_length (100, 100) 0.81 0.67
loan_int_rate, person_emp_length, loan_percent_income (98, 70, 5) 0.84 0.52
Credit Risk Modeling in Python

F1 scoring for models

  • Thinking about accuracy and recall for different column groups is time consuming
  • F1 score is a single metric used to look at both accuracy and recall

Formula for F1 score

  • Shows up as a part of the classification_report()

Classification report with F1 score highlighted

Credit Risk Modeling in Python

Let's practice!

Credit Risk Modeling in Python

Preparing Video For Download...