Evaluating model performance

Predictive Analytics using Networked Data in R

María Óskarsdóttir, Ph.D.

Post-doctoral researcher

Making predictions

library(pROC)

logPredictions <- predict(logModel, newdata = test_set, type = "response")

rfPredictions<- predict(rfModel, newdata = test_set, type='prob')
rfPredictions
attr(,"class")

      0     1
C 0.136 0.864
"matrix" "votes"

Probability that a randomly chosen churner gets a higher score than a randomly chosen non-churner
Displays the trade-off between the model's sensitivity and specificity
A number between:
- 0.5: random model
- 1: perfect model

library(pROC)
auc(test_set$label, logPredictions)

How much better is the prediction model at identifying churners, compared to a random sample of customers
Computes the proportion of actual churners amongst the 10% of customers with the highest predicted churn probability
Lift value greater than 1 means that the model is better than a random model
If, in the top 10% of the highest scores there are 60% churners and in the whole population there are 10% churners, then the lift is $60/10=6$

library(lift)
TopDecileLift(test_set$label, predictions, plot=TRUE)

Predictive Analytics using Networked Data in R