Variable selection

Introduction to Predictive Analytics in Python

Nele Verbiest, Ph.D

Data Scientist @PythonPredictions

Candidate predictors

  • age
  • max_gift
  • income_low
  • min_gift, mean_gift, median_gift
  • country_USA, country_India, country_UK
  • number_gift_min50, number_gift_min100, number_gift_min150
Introduction to Predictive Analytics in Python

Variable selection: motivation

Drawbacks of models with many variables:

  • Over-fitting
  • Hard to maintain or implement
  • Hard to interpret, multi-collinearity
Introduction to Predictive Analytics in Python

Model evaluation: AUC

import numpy as np
from sklearn.metrics import roc_auc_score
roc_auc_score(true_target, prob_target)
Introduction to Predictive Analytics in Python

Let's practice!

Introduction to Predictive Analytics in Python

Preparing Video For Download...