Deciding on the number of variables

Introduction to Predictive Analytics in Python

Nele Verbiest, Ph.D

Data Scientist @PythonPredictions

Evaluating the AUC

auc_values = []
variables_evaluate = []

for v in variables_forward:
    variables_evaluate.append(v)
    auc_value = auc(variables_evaluate, ["target"], basetable)
    auc_values.append(auc_value)
Introduction to Predictive Analytics in Python

Evaluating the AUC

Introduction to Predictive Analytics in Python

Over-fitting

Introduction to Predictive Analytics in Python

Detecting over-fitting

Introduction to Predictive Analytics in Python

Partitioning

from sklearn.model_selection import train_test_split

X = basetable.drop("target", 1) y = basetable["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, stratify = Y)
train = pd.concat([X_train, y_train], axis=1) test = pd.concat([X_test, y_test], axis=1)
Introduction to Predictive Analytics in Python

Deciding the cut-off

  • High test AUC
  • Low number of variables
Introduction to Predictive Analytics in Python

Deciding the cut-off

Introduction to Predictive Analytics in Python

Let's practice!

Introduction to Predictive Analytics in Python

Preparing Video For Download...