Modeling and model selection

Machine Learning for Marketing Analytics in R

Verena Pflieger

Data Scientist at INWT Statistics

logitModelFull <- glm(returnCustomer ~ title + newsletter + 
websiteDesign + ..., family = binomial, churnData)
summary(logitModelFull)
## Coefficients:
##                          Estimate  Std.Error  z value  Pr(>|z|)
## (Intercept)              -1.49074  0.04930    -30.239  < 2e-16   ***
## titleCompany             -0.21215  0.05286    -4.013   5.99e-05  ***
## titleMrs                  0.03086  0.02953     1.045   0.29586
## newsletter1               0.52373  0.03031     17.280  < 2e-16   ***
## websiteDesign2           -0.45679  0.16267    -2.808   0.00498    **
## websiteDesign3           -0.28800  0.15899    -1.811   0.07007     .
## paymentMethodCredidCard  -0.24192  0.04843    -4.995   5.89e-07  ***
## tvEquipment               -0.51475  1.08141    -0.476   0.63408
...
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
...
## AIC: 41762
Machine Learning for Marketing Analytics in R

Statistical significance

## Coefficients:
##               Estimate  Std.Error  z value  Pr(>|z|)
## ...
## newsletter1    0.52373    0.03031   17.280  < 2e-16   ***
## ...

Machine Learning for Marketing Analytics in R

Coefficient interpretation

Log odds equation:             $\displaystyle \log \frac{P(\text{returnCustomer}=1)}{P(\text{returnCustomer}=0)} = $ $-1.49 - 0.21 \cdot \text{titleCompany} + 0.52 \cdot \text{newsletter1} + ... $

Transformation to odds:

coefsExp <- coef(logitModelFull) %>% exp() %>% round(2)
coefsExp
## (Intercept)  titleCompany    titleMrs    titleOthers
## 0.23         0.81            1.03        1.77

## newsletter1  websiteDesign2  ...
## 1.69         0.63            ...
Machine Learning for Marketing Analytics in R

Model selection

library(MASS)
logitModelNew <- stepAIC(logitModelFull, trace = 0)
summary(logitModelNew)
## Coefficients:
##                          Estimate  Std.Error  z value  Pr(>|z|)
## (Intercept)              -1.49130  0.04928    -30.260  < 2e-16   ***
## titleCompany             -0.21131  0.05285    -3.998   6.38e-05  ***
## titleMrs                  0.03159  0.02951     1.071   0.28432
## newsletter1               0.52332  0.03030     17.269  < 2e-16   *** 
...
## videogameDownload         0.26474  0.05256     5.037   4.74e-07  ***
## prodRemitted              0.89528  0.07619     11.751  < 2e-16   ***

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
...
## AIC: 41756
Machine Learning for Marketing Analytics in R

Results of the step-AIC function

Removed Variables Remaining Variables
tvEquipment newsletter
prodOthers paymentMethod
dvd
blueray
...
Machine Learning for Marketing Analytics in R

Let's apply what I have shown you!

Machine Learning for Marketing Analytics in R

Preparing Video For Download...