Multiple linear regression

Machine Learning for Marketing Analytics in R

Verena Pflieger

Data Scientist at INWT Statistics

Omitted variable bias

Omitted variable bias

Machine Learning for Marketing Analytics in R

The more effort, the less success?

Machine Learning for Marketing Analytics in R

The more effort, the more success!

Machine Learning for Marketing Analytics in R
multipleLM <- lm(
    futureMargin ~ margin + nOrders + nItems + daysSinceLastOrder +
    returnRatio + shareOwnBrand + shareVoucher + shareSale + 
    gender + age + marginPerOrder + marginPerItem + 
    itemsPerOrder, data = clvData1)
summary(multipleLM)
Call:
lm(formula = futureMargin ~ margin + ..., data = clvData1)
Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         22.528666  1.435062  15.699  < 2e-16 ***
margin              0.402783   0.027298  14.755  < 2e-16 ***
nOrders            -0.031825   0.122980  -0.259  0.79581    
...
itemsPerOrder       0.102576   0.540835   0.190  0.84958    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 13.85 on 4177 degrees of freedom
Multiple R-squared:  0.3547,    Adjusted R-squared:  0.3527 
F-statistic: 176.6 on 13 and 4177 DF,  p-value: < 2.2e-16
Machine Learning for Marketing Analytics in R

Multicollinearity

Machine Learning for Marketing Analytics in R

Variance Inflation Factors

library(rms)
vif(multipleLM)
            margin            nOrders             nItems 
          3.658257          11.565731          13.141486 
daysSinceLastOrder        returnRatio      shareOwnBrand 
          1.368208           1.311476           1.363515 
      shareVoucher          shareSale         gendermale 
          1.181329           1.148697           1.003452 
               age     marginPerOrder      marginPerItem 
          1.026513           8.977661           7.782651 
     itemsPerOrder 
          6.657435  
Machine Learning for Marketing Analytics in R

New model

multipleLM2 <- lm(futureMargin ~ margin + nOrders + 
                  daysSinceLastOrder + returnRatio + shareOwnBrand + 
                  shareVoucher + shareSale + gender + age + 
                  marginPerItem + itemsPerOrder, 
                  data = clvData1)
vif(multipleLM2)                  
            margin            nOrders daysSinceLastOrder 
          3.561828           2.868060           1.354986 
       returnRatio      shareOwnBrand       shareVoucher 
          1.305490           1.353513           1.176411 
         shareSale         gendermale                age 
          1.146499           1.003132           1.021518 
     marginPerItem      itemsPerOrder 
          1.686746           1.550524 
Machine Learning for Marketing Analytics in R
summary(multipleLM2)
Call:
lm(formula = futureMargin ~ margin + nOrders + ..., data = clvData1)
Residuals:
    Min      1Q  Median      3Q     Max 
-55.659  -8.827   0.483   9.561  50.118 
Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        22.798064   1.287806  17.703  < 2e-16 ***
margin              0.404200   0.026983  14.980  < 2e-16 ***
nOrders             0.220255   0.061347   3.590 0.000334 ***
daysSinceLastOrder -0.017180   0.002675  -6.422 1.49e-10 ***
returnRatio        -1.992829   0.601214  -3.315 0.000925 ***
shareOwnBrand       7.568686   0.677572  11.170  < 2e-16 ***
shareVoucher       -1.750877   0.669017  -2.617 0.008900 ** 
shareSale          -2.942525   0.691108  -4.258 2.11e-05 ***
gendermale          0.203813   0.430136   0.474 0.635643    
age                -0.015158   0.017245  -0.879 0.379462    
marginPerItem      -0.197277   0.051160  -3.856 0.000117 ***
itemsPerOrder      -0.270260   0.261458  -1.034 0.301354    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Machine Learning for Marketing Analytics in R

Let's practice!

Machine Learning for Marketing Analytics in R

Preparing Video For Download...