In-sample model fit and thresholding

Machine Learning for Marketing Analytics in R

Verena Pflieger

Data Scientist at INWT Statistics

Pseudo $R^2$ statistics I

$$ \text{McFadden:~} \quad R^2 =1-\frac{lnL_{\text{full}}}{lnL_{\text{null}}} $$

$$ \text{Cox \& Snell:~} R^2 = 1-\left(\frac{L_{\text{null}}}{L_{\text{full}}}\right)^{\frac{2}{n}} $$

$$ \text{Nagelkerke:~} R^2 = \frac{1-\left(\frac{L_{\text{null}}}{L_{\text{full}}}\right)^{\frac{2}{n}}}{1-(L_{\text{null}})^{\frac{2}{n}}} $$

Interpretation:

  • Reasonable if > 0.2

  • Good if > 0.4

  • Very Good if > 0.5

Machine Learning for Marketing Analytics in R

Pseudo $R^2$ statistics II

library(descr)

LogRegR2(logitModelNew)
Chi2                     1321.717
Df                       19
Sig.                     0
Cox and Snell Index      0.02879553
Nagelkerke Index         0.0469131
McFadden's R2            0.03071032
Machine Learning for Marketing Analytics in R

Predict probabilities

churnData$predNew <- predict(logitModelNew, 
                             type = "response", 
                              na.action = na.exclude)
data %>% select(returnCustomer, predNew) %>% tail()      
      returnCustomer   predNew
45231              0 0.2843944
45232              0 0.1552756
45233              1 0.2522597
45234              1 0.1454276
45235              0 0.2698819
45236              0 0.2886988
Machine Learning for Marketing Analytics in R

Confusion matrix

library(SDMTools)
# Note that `SDMTools` cannot be downloaded from CRAN anymore. 
# Install it instead via `remotes::install_version("SDMTools", "1.1-221.2")`
confMatrixNew <- confusion.matrix(churnData$returnCustomer, 
                    churnData$predNew, threshold = 0.5)
confMatrixNew
     obs
pred 0     1
   0 36921 8242
   1 43    30
Prediction \ Truth negative positive
negative true-negative false-negative
positive false-positive true-positive
Machine Learning for Marketing Analytics in R

Accuracy

accuracyNew <- sum(diag(confMatrixNew)) / sum(confMatrixNew)
accuracyNew
0.8168494
Machine Learning for Marketing Analytics in R

Finding the optimal threshold

Prediction \ Truth returnCustomer = 0 returnCustomer = 1
returnCustomer = 0 5 -15
returnCustomer = 1 0 0

payoff = 5 * true negative - 15 * false negative

Threshold Accuracy Payoff
0.5 0.817 60975
0.4 0.815 62180
[0.3] [0.794] [65740]
0.2 0.668 65670
0.1 0.241 10550
Machine Learning for Marketing Analytics in R

Overfitting

Machine Learning for Marketing Analytics in R

Let's try it out!

Machine Learning for Marketing Analytics in R

Preparing Video For Download...