Gradient boosted trees with XGBoost

Credit Risk Modeling in Python

Michael Crabtree

Data Scientist, Ford Motor Company

Decision trees

  • Creates predictions similar to logistic regression
  • Not structured like a regression

Example of a decision tree

Credit Risk Modeling in Python

Decision trees for loan status

  • Simple decision tree for predicting loan_status probability of default

Decision tree prediction example

Credit Risk Modeling in Python

Decision tree impact

Decision tree prediction example

Loan True loan status Pred. Loan Status Loan payoff value Selling Value Gain/Loss
1 0 1 $1,500 $250 -$1,250
2 0 1 $1,200 $250 -$950
Credit Risk Modeling in Python

A forest of trees

  • XGBoost uses many simplistic trees (ensemble)
  • Each tree will be slightly better than a coin toss

Ensemble of example tree predictors

Credit Risk Modeling in Python

Creating and training trees

  • Part of the xgboost Python package, called xgb here
  • Trains with .fit() just like the logistic regression model
# Create a logistic regression model
clf_logistic = LogisticRegression()
# Train the logistic regression
clf_logistic.fit(X_train, np.ravel(y_train))
# Create a gradient boosted tree model
clf_gbt = xgb.XGBClassifier()
# Train the gradient boosted tree
clf_gbt.fit(X_train,np.ravel(y_train))
Credit Risk Modeling in Python

Default predictions with XGBoost

  • Predicts with both .predict() and .predict_proba()
    • .predict_proba() produces a value between 0 and 1
    • .predict() produces a 1 or 0 for loan_status
# Predict probabilities of default
gbt_preds_prob = clf_gbt.predict_proba(X_test)
# Predict loan_status as a 1 or 0
gbt_preds = clf_gbt.predict(X_test)
# gbt_preds_prob
array([[0.059, 0.940], [0.121, 0.989]])
# gbt_preds
array([1, 1, 0...])
Credit Risk Modeling in Python

Hyperparameters of gradient boosted trees

  • Hyperparameters: model parameters (settings) that cannot be learned from data
  • Some common hyperparameters for gradient boosted trees
    • learning_rate: smaller values make each step more conservative
    • max_depth: sets how deep each tree can go, larger means more complex
xgb.XGBClassifier(learning_rate = 0.2,
                  max_depth = 4)
Credit Risk Modeling in Python

Let's practice!

Credit Risk Modeling in Python

Preparing Video For Download...