Credit Risk Modeling in R
Lore Dirick
Manager of Data Science Curriculum at Flatiron School
printcp()
, plotcp()
for pruning purposesprintcp(tree_undersample)
Classification tree:
rpart(formula = loan_status ~ ., data = undersampled_training_set, method = "class",
control = rpart.control(cp = 0.001))
Variables actually used in tree construction:
age annual_inc emp_cat grade home_ownership ir_cat loan_amnt
Root node error: 2190/6570 = 0.33333
n= 6570
CP nsplit rel error xerror xstd
1 0.0059361 0 1.00000 1.00000 0.017447
2 0.0044140 4 0.97443 0.99909 0.017443
3 0.0036530 7 0.96119 0.98174 0.017366
4 0.0031963 8 0.95753 0.98904 0.017399
...
16 0.0010654 76 0.84247 1.02511 0.017554
17 0.0010000 79 0.83927 1.02511 0.017554
$$
$CP = 0.003653$
ptree_undersample=prune(tree_undersample,
cp = 0.003653)
plot(ptree_undersample,
uniform=TRUE)
text(ptree_undersample)
ptree_undersample=prune(tree_undersample,
cp = 0.003653)
plot(ptree_undersample,
uniform=TRUE)
text(ptree_undersample,
use.n=TRUE)
library(rpart.plot)
prp(ptree_undersample)
library(rpart.plot)
prp(ptree_undersample, extra = 1)
Credit Risk Modeling in R