Model validation, model fit, and prediction

Machine Learning for Marketing Analytics in R

Verena Pflieger

Data Scientist at INWT Statistics

Coefficient of Determination $R^2$

Machine Learning for Marketing Analytics in R

$R^2$ and F-test

summary(multipleLM2)
Residual standard error: 13.87 on 4179 degrees of freedom
Multiple R-squared:  0.3522,    Adjusted R-squared:  0.3504 
F-statistic: 206.5 on 11 and 4179 DF,  p-value: < 2.2e-16
Machine Learning for Marketing Analytics in R

Overfitting

Machine Learning for Marketing Analytics in R

Methods to avoid overfitting

  • AIC() from stats package
  • stepAIC() from MASS package
  • Out-of-sample model validation
  • Cross-validation
  • ...
AIC(multipleLM2)
33950.45
Machine Learning for Marketing Analytics in R

New dataset clvData2

head(clvData2)
# A tibble: 6 x 14
  customerID nOrders nItems daysSinceLastOrder margin returnRatio 
       <int>   <int>  <int>              <int>  <dbl>       <dbl>         
1          2      16     40                  2  57.62        0.18
2          3       1      5                124  29.69        1.00
3          4      15     30                 68  56.26        0.16
4          5      23     41                103  58.84        0.03
5          6       2      4                104  29.31        0.00
6          7       6     10                 41  35.72        0.06
#... with 8 more variables: shareOwnBrand <dbl>, shareVoucher <dbl>,
#  shareSale <dbl>, gender <chr>, age <int>, marginPerOrder <dbl>,
#  marginPerItem <dbl>, itemsPerOrder <dbl>
Machine Learning for Marketing Analytics in R

Prediction

predMargin <- predict(multipleLM2, 
                      newdata = clvData2)
head(predMargin)
       1        2        3        4        5        6 
51.10204 31.63335 51.90008 52.62200 36.65194 33.84383
mean(predMargin, na.rm = TRUE)
33.95147
Machine Learning for Marketing Analytics in R

Learnings linear regression

Learnings Linear Regression
You have learned... to predict the future customer lifetime value
to use a linear regression to model a continuous variable
that the variables for modelling and prediction have to carry the same names
Machine Learning for Marketing Analytics in R

Learnings from the model

Learnings from the Model
You have learned... that the margin in one year is a good predictor for the margin in the following year
the longer the time since last order, the smaller the expected margin
characteristics like gender and age don't seem to play a role for the prediction of margin
etc...
Machine Learning for Marketing Analytics in R

Alright, hands on!

Machine Learning for Marketing Analytics in R

Preparing Video For Download...