Evaluating model performance

Predictive Analytics using Networked Data in R

María Óskarsdóttir, Ph.D.

Post-doctoral researcher

Making predictions

library(pROC)
  • Logistic regression
logPredictions <- predict(logModel, newdata = test_set, type = "response")
  • Random forest
rfPredictions<- predict(rfModel, newdata = test_set, type='prob')
rfPredictions
attr(,"class")
      0     1
C 0.136 0.864
"matrix" "votes"
Predictive Analytics using Networked Data in R

AUC

  • Probability that a randomly chosen churner gets a higher score than a randomly chosen non-churner
  • Displays the trade-off between the model's sensitivity and specificity
  • A number between:
    • 0.5: random model
    • 1: perfect model
library(pROC)
auc(test_set$label, logPredictions)
Predictive Analytics using Networked Data in R

Top decile lift

  • How much better is the prediction model at identifying churners, compared to a random sample of customers
  • Computes the proportion of actual churners amongst the 10% of customers with the highest predicted churn probability
  • Lift value greater than 1 means that the model is better than a random model
  • If, in the top 10% of the highest scores there are 60% churners and in the whole population there are 10% churners, then the lift is $60/10=6$
library(lift)
TopDecileLift(test_set$label, predictions, plot=TRUE)
Predictive Analytics using Networked Data in R

Let's practice!

Predictive Analytics using Networked Data in R

Preparing Video For Download...