Credit Risk Modeling in Python
Michael Crabtree
Data Scientist, Ford Motor Company
0.86
loan_status
of 1
is a default or 0
for non-default0.86
loan_status
of 1
is a default or 0
for non-defaultProbability of Default | Interpretation | Predicted loan status |
---|---|---|
0.4 | Unlikely to default | 0 |
0.90 | Very likely to default | 1 |
0.1 | Very unlikely to default | 0 |
0
and 1
from sklearn.linear_model import LogisticRegression
clf_logistic = LogisticRegression(solver='lbfgs')
.fit()
to trainclf_logistic.fit(training_columns, np.ravel(training_labels))
loan_status
loan_status
(0,1)Data Subset | Usage | Portion |
---|---|---|
Train | Learn from the data to generate predictions | 60% |
Test | Test learning on new unseen data | 40% |
X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]
train_test_split()
function already within sci-kit learnX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123)
test_size
: percentage of data for test setrandom_state
: a random seed value for reproducibilityCredit Risk Modeling in Python