Predicting customer transactions

Machine Learning for Marketing in Python

Karolis Urbonas

Head of Analytics & Science, Amazon

Modeling approach

  • Linear regression to predict next month's transactions.
  • Same modeling steps as with logistic regression.
Machine Learning for Marketing in Python

Modeling steps

  1. Split data to training and testing
  2. Initialize the model
  3. Fit the model on the training data
  4. Predict values on the testing data
  5. Measure model performance on testing data
Machine Learning for Marketing in Python

Regression performance metrics

Key metrics:

  • Root mean squared error (RMSE) - Square root of the average squared difference between prediction and actuals
  • Mean absolute error (MAE) - Average absolute difference between prediction and actuals
  • Mean absolute percentage error (MAPE) - Average percentage difference between prediction and actuals (actuals can't be zeros)
Machine Learning for Marketing in Python

Additional regression and supervised learning metrics

  • R-squared - statistical measure that represents the percentage proportion of variance that is explained by the model. Only applicable to regression, not classification. Higher is better.

  • Coefficient p-values - probability that the regression (or classification) coefficient is observed due to chance. Lower is better. Typical thresholds are 5% and 10%.

Machine Learning for Marketing in Python

Fitting the model

# Import the linear regression module
from sklearn.linear_model import LinearRegression

# Initialize the regression instance linreg = LinearRegression()
# Fit model on the training data linreg.fit(train_X, train_Y)
# Predict values on both training and testing data train_pred_Y = linreg.predict(train_X) test_pred_Y = linreg.predict(test_X)
Machine Learning for Marketing in Python

Measuring model performance

# Import performance measurement functions
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

# Calculate metrics for training data rmse_train = np.sqrt(mean_squared_error(train_Y, train_pred_Y)) mae_train = mean_absolute_error(train_Y, train_pred_Y)
# Calculate metrics for testing data rmse_test = np.sqrt(mean_squared_error(test_Y, test_pred_Y)) mae_test = mean_absolute_error(test_Y, test_pred_Y)
# Print performance metrics print('RMSE train: {:.3f}; RMSE test: {:.3f}\nMAE train: {:.3f}, MAE test: {:.3f}'.format( rmse_train, rmse_test, mae_train, mae_test))
RMSE train: 0.717; RMSE test: 1.216
MAE train: 0.514, MAE test: 0.555
Machine Learning for Marketing in Python

Interpreting coefficients

  • Need to assess statistical significance
  • Introduction to statsmodels library
  • Gives in-depth model summary
Machine Learning for Marketing in Python

Build regression model with statsmodels

# Import the library
import statsmodels.api as sm

# Convert target variable to `numpy` array train_Y = np.array(train_Y)
# Initialize and fit the model olsreg = sm.OLS(train_Y, train_X) olsreg = olsreg.fit()
# Print model summary print(olsreg.summary())
Machine Learning for Marketing in Python

Regression summary table

OLS summary

Machine Learning for Marketing in Python

Interpreting R-squared

R-squared

Machine Learning for Marketing in Python

Interpreting coefficient p-values

Coefficient p-values

Machine Learning for Marketing in Python

Let's build some regression models!

Machine Learning for Marketing in Python

Preparing Video For Download...