Potare l’albero di decisione

Credit Risk Modeling in R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

Problemi con alberi di decisione grandi

  • Troppo complesso: non più chiaro
  • Overfitting sul test set
  • Soluzione: usa printcp(), plotcp() per potare
Credit Risk Modeling in R

Printcp e tree_undersample

printcp(tree_undersample)
Classification tree:
rpart(formula = loan_status ~ ., data = undersampled_training_set, method = "class",
 control = rpart.control(cp = 0.001))
Variables actually used in tree construction:
age    annual_inc     emp_cat     grade    home_ownership   ir_cat     loan_amnt     
Root node error: 2190/6570 = 0.33333
n= 6570 
        CP    nsplit  rel error   xerror      xstd
1  0.0059361      0    1.00000   1.00000   0.017447
2  0.0044140      4    0.97443   0.99909   0.017443
3  0.0036530      7    0.96119   0.98174   0.017366
4  0.0031963      8    0.95753   0.98904   0.017399
               ...  
16 0.0010654     76    0.84247   1.02511   0.017554
17 0.0010000     79    0.83927   1.02511   0.017554
Credit Risk Modeling in R

Plotcp e tree_undersample

Schermata 2020-06-22 alle 17.57.10.png

Credit Risk Modeling in R

Plotcp e tree_undersample

Schermata 2020-06-22 alle 17.56.53.png

$$

$CP = 0.003653$

Credit Risk Modeling in R

Grafica l’albero potato

Schermata 2020-06-23 alle 18.16.04.png

ptree_undersample=prune(tree_undersample,
                        cp = 0.003653)

plot(ptree_undersample,
     uniform=TRUE)

text(ptree_undersample)
Credit Risk Modeling in R

Grafica l’albero potato

Schermata 2020-06-23 alle 18.15.42.png

ptree_undersample=prune(tree_undersample,
                        cp = 0.003653)

plot(ptree_undersample,
     uniform=TRUE)

text(ptree_undersample,
     use.n=TRUE)
Credit Risk Modeling in R

prp() nel pacchetto rpart.plot

Schermata 2020-06-22 alle 18.05.09.png

library(rpart.plot)
prp(ptree_undersample)
Credit Risk Modeling in R

prp() nel pacchetto part.plot

Schermata 2020-06-22 alle 18.04.33.png

library(rpart.plot)
prp(ptree_undersample, extra = 1)
Credit Risk Modeling in R

Ayo berlatih!

Credit Risk Modeling in R

Preparing Video For Download...