Machine Learning for Marketing in Python
Karolis Urbonas
Head of Analytics & Science, Amazon
Key metrics:
R-squared - statistical measure that represents the percentage proportion of variance that is explained by the model. Only applicable to regression, not classification. Higher is better.
Coefficient p-values - probability that the regression (or classification) coefficient is observed due to chance. Lower is better. Typical thresholds are 5% and 10%.
# Import the linear regression module from sklearn.linear_model import LinearRegression
# Initialize the regression instance linreg = LinearRegression()
# Fit model on the training data linreg.fit(train_X, train_Y)
# Predict values on both training and testing data train_pred_Y = linreg.predict(train_X) test_pred_Y = linreg.predict(test_X)
# Import performance measurement functions from sklearn.metrics import mean_absolute_error from sklearn.metrics import mean_squared_error
# Calculate metrics for training data rmse_train = np.sqrt(mean_squared_error(train_Y, train_pred_Y)) mae_train = mean_absolute_error(train_Y, train_pred_Y)
# Calculate metrics for testing data rmse_test = np.sqrt(mean_squared_error(test_Y, test_pred_Y)) mae_test = mean_absolute_error(test_Y, test_pred_Y)
# Print performance metrics print('RMSE train: {:.3f}; RMSE test: {:.3f}\nMAE train: {:.3f}, MAE test: {:.3f}'.format( rmse_train, rmse_test, mae_train, mae_test))
RMSE train: 0.717; RMSE test: 1.216
MAE train: 0.514, MAE test: 0.555
statsmodels
library# Import the library import statsmodels.api as sm
# Convert target variable to `numpy` array train_Y = np.array(train_Y)
# Initialize and fit the model olsreg = sm.OLS(train_Y, train_X) olsreg = olsreg.fit()
# Print model summary print(olsreg.summary())
Machine Learning for Marketing in Python