Multiple linear regression

R ile Pazarlama Analitiği için Machine Learning

Verena Pflieger

Data Scientist at INWT Statistics

Omitted variable bias

Omitted variable bias

R ile Pazarlama Analitiği için Machine Learning

The more effort, the less success?

R ile Pazarlama Analitiği için Machine Learning

The more effort, the more success!

R ile Pazarlama Analitiği için Machine Learning
multipleLM <- lm(
    futureMargin ~ margin + nOrders + nItems + daysSinceLastOrder +
    returnRatio + shareOwnBrand + shareVoucher + shareSale + 
    gender + age + marginPerOrder + marginPerItem + 
    itemsPerOrder, data = clvData1)
summary(multipleLM)
Call:
lm(formula = futureMargin ~ margin + ..., data = clvData1)
Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         22.528666  1.435062  15.699  < 2e-16 ***
margin              0.402783   0.027298  14.755  < 2e-16 ***
nOrders            -0.031825   0.122980  -0.259  0.79581    
...
itemsPerOrder       0.102576   0.540835   0.190  0.84958    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 13.85 on 4177 degrees of freedom
Multiple R-squared:  0.3547,    Adjusted R-squared:  0.3527 
F-statistic: 176.6 on 13 and 4177 DF,  p-value: < 2.2e-16
R ile Pazarlama Analitiği için Machine Learning

Multicollinearity

R ile Pazarlama Analitiği için Machine Learning

Variance Inflation Factors

library(rms)
vif(multipleLM)
            margin            nOrders             nItems 
          3.658257          11.565731          13.141486 
daysSinceLastOrder        returnRatio      shareOwnBrand 
          1.368208           1.311476           1.363515 
      shareVoucher          shareSale         gendermale 
          1.181329           1.148697           1.003452 
               age     marginPerOrder      marginPerItem 
          1.026513           8.977661           7.782651 
     itemsPerOrder 
          6.657435  
R ile Pazarlama Analitiği için Machine Learning

New model

multipleLM2 <- lm(futureMargin ~ margin + nOrders + 
                  daysSinceLastOrder + returnRatio + shareOwnBrand + 
                  shareVoucher + shareSale + gender + age + 
                  marginPerItem + itemsPerOrder, 
                  data = clvData1)
vif(multipleLM2)                  
            margin            nOrders daysSinceLastOrder 
          3.561828           2.868060           1.354986 
       returnRatio      shareOwnBrand       shareVoucher 
          1.305490           1.353513           1.176411 
         shareSale         gendermale                age 
          1.146499           1.003132           1.021518 
     marginPerItem      itemsPerOrder 
          1.686746           1.550524 
R ile Pazarlama Analitiği için Machine Learning
summary(multipleLM2)
Call:
lm(formula = futureMargin ~ margin + nOrders + ..., data = clvData1)
Residuals:
    Min      1Q  Median      3Q     Max 
-55.659  -8.827   0.483   9.561  50.118 
Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        22.798064   1.287806  17.703  < 2e-16 ***
margin              0.404200   0.026983  14.980  < 2e-16 ***
nOrders             0.220255   0.061347   3.590 0.000334 ***
daysSinceLastOrder -0.017180   0.002675  -6.422 1.49e-10 ***
returnRatio        -1.992829   0.601214  -3.315 0.000925 ***
shareOwnBrand       7.568686   0.677572  11.170  < 2e-16 ***
shareVoucher       -1.750877   0.669017  -2.617 0.008900 ** 
shareSale          -2.942525   0.691108  -4.258 2.11e-05 ***
gendermale          0.203813   0.430136   0.474 0.635643    
age                -0.015158   0.017245  -0.879 0.379462    
marginPerItem      -0.197277   0.051160  -3.856 0.000117 ***
itemsPerOrder      -0.270260   0.261458  -1.034 0.301354    
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R ile Pazarlama Analitiği için Machine Learning

Let's practice!

R ile Pazarlama Analitiği için Machine Learning

Preparing Video For Download...