Building decision trees using the rpart()-package

Credit Risk Modeling in R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

Imagine...

decision_tree.gif

Credit Risk Modeling in R

rpart() package! But...

  • Hard building nice decision tree for credit risk data
  • Main reason: unbalanced data
fit_default <- rpart(loan_status ~ ., method = "class", 
                     data = training_set)
plot(fit_default)
Error in plot.rpart(fit_default) : fit is not a tree, just a root
Credit Risk Modeling in R

Three techniques to overcome unbalance

  • Undersampling or oversampling
    • Accuracy issue will disappear
    • Only training set
  • Changing the prior probabilities
  • Including a loss matrix

Validate model to see what is best!

Credit Risk Modeling in R

Let's practice!

Credit Risk Modeling in R

Preparing Video For Download...