Modelagem de Risco de Crédito em Python
Michael Crabtree
Data Scientist, Ford Motor Company
DMatrix, estrutura interna otimizada para XGBoost# Set the number of folds
n_folds = 2
# Set early stopping number
early_stop = 5
# Set any specific parameters for cross validation
params = {'objective': 'binary:logistic',
'seed': 99, 'eval_metric':'auc'}
'binary':'logistic' especifica classificação para loan_status'eval_metric':'auc' faz o XGBoost avaliar desempenho pela AUC# Restructure the train data for xgboost
DTrain = xgb.DMatrix(X_train, label = y_train)
# Perform cross validation
xgb.cv(params, DTrain, num_boost_round = 5, nfold=n_folds,
early_stopping_rounds=early_stop)
DMatrix() cria um objeto especial para xgboost otimizado para treinocross_val_score() do scikit-learn# Import the module
from sklearn.model_selection import cross_val_score
# Create a gbt model
xg = xgb.XGBClassifier(learning_rate = 0.4, max_depth = 10)
# Use cross valudation and accuracy scores 5 consecutive times
cross_val_score(gbt, X_train, y_train, cv = 5)
array([0.92748092, 0.92575308, 0.93975392, 0.93378608, 0.93336163])
Modelagem de Risco de Crédito em Python