Machine Learning for Marketing in Python
Karolis Urbonas
Head of Analytics & Science, Amazon
Import the Logistic Regression classifier
from sklearn.linear_model import LogisticRegression
Initialize Logistic Regression instance
logreg = LogisticRegression()
Fit the model on the training data
logreg.fit(train_X, train_Y)
Key metrics:
from sklearn.metrics import accuracy_score
pred_train_Y = logreg.predict(train_X) pred_test_Y = logreg.predict(test_X)
train_accuracy = accuracy_score(train_Y, pred_train_Y) test_accuracy = accuracy_score(test_Y, pred_test_Y)
print('Training accuracy:', round(train_accuracy,4)) print('Test accuracy:', round(test_accuracy, 4))
Training accuracy: 0.8108
Test accuracy: 0.8009
from sklearn.metrics import precision_score, recall_score
train_precision = round(precision_score(train_Y, pred_train_Y), 4) test_precision = round(precision_score(test_Y, pred_test_Y), 4)
train_recall = round(recall_score(train_Y, pred_train_Y), 4) test_recall = round(recall_score(test_Y, pred_test_Y), 4)
print('Training precision: {}, Training recall: {}'.format(train_precision, train_recall)) print('Test precision: {}, Test recall: {}'.format(train_recall, test_recall))
Training precision: 0.6725, Training recall: 0.5736
Test precision: 0.5736, Test recall: 0.4835
LogisticRegression
from sklearn
performs L2 regularization by defaultfrom sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(penalty='l1', C=0.1, solver='liblinear')
logreg.fit(train_X, train_Y)
C
parameter needs to be tuned to find the optimal valueC = [1, .5, .25, .1, .05, .025, .01, .005, .0025] l1_metrics = np.zeros((len(C), 5)) l1_metrics[:,0] = C
for index in range(0, len(C)): logreg = LogisticRegression(penalty='l1', C=C[index], solver='liblinear') logreg.fit(train_X, train_Y) pred_test_Y = logreg.predict(test_X)
l1_metrics[index,1] = np.count_nonzero(logreg.coef_) l1_metrics[index,2] = accuracy_score(test_Y, pred_test_Y) l1_metrics[index,3] = precision_score(test_Y, pred_test_Y) l1_metrics[index,4] = recall_score(test_Y, pred_test_Y)
col_names = ['C','Non-Zero Coeffs','Accuracy','Precision','Recall'] print(pd.DataFrame(l1_metrics, columns=col_names)
Machine Learning for Marketing in Python