Model review and comparison

Predicting CTR with Machine Learning in Python

Kevin Huo

Instructor

Model review

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
  • Logistic regression: linear classifier identifying decision boundary
  • Decision trees: tree-format of conditions
  • Random Forests: ensemble of Decision Trees
  • Neural Networks (MLPs): layers using linear combinations of features with a nonlinear activation function
Predicting CTR with Machine Learning in Python

Model implementation

Similarities
  • Feature transformation and regularization
  • Fitting via classifier.fit(X_train, y_train)
  • Predictions via predict_proba() and predict()
Differences
  • Decision Trees: max_depth, min_samples_split
  • Random Forests: n_estimators, oob_score
  • Logistic Regression: fit_intercept, class_weight
  • Neural Networks: hidden_layer_sizes, max_iter
Predicting CTR with Machine Learning in Python

Model evaluation

  • Key evaluation metrics:
    • Confusion matrix: confusion_matrix(y_test, y_pred)
    • Precision: precision_score(y_test, y_pred)
    • Recall: precision_score(y_test, y_pred)
    • F-beta score: fbeta_score(y_test, y_pred, beta = 0.5)
    • AUC of ROC curve: roc_auc_score(y_test, y_score[:, 1])
Predicting CTR with Machine Learning in Python

Main pros and cons of using neural networks

Pros

  • Scalability with data
  • Less need to do feature engineering
  • More transferable across domains

Cons

  • Less powerful on smaller datasets
  • Difficult to interpret
  • Computationally and financially cheaper
Predicting CTR with Machine Learning in Python

Let's practice!

Predicting CTR with Machine Learning in Python

Preparing Video For Download...