Training and evaluating classification models

Artificial Intelligence (AI) Concepts in Python

Nemanja Radojkovic

Senior Data Scientist

Train/test splitting

Test data ? training data

Simplest approach (Hold-out method)

  • 60% of all data used for training
  • remaining 40% of data used for testing

Code example:

from sklearn.model_selection \
    import train_test_split

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.4)

Artificial Intelligence (AI) Concepts in Python

Model training

Use the default model configuration/hyper-parameters:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

Use a custom model configuration/hyper-parameters:

model = RandomForestClassifier(n_estimators=500, # Number of trees
                               max_depth=20)     # Tree depth

Start the training procedure:

model.fit(X_train, y_train)
Artificial Intelligence (AI) Concepts in Python

Model testing

Generic syntax

model.predict(X=X_test)

Example: News title classifier

model.predict(X=['Denver Nuggets win against GSW and clinch playoff spot!'])
Out: ['Sport']
Artificial Intelligence (AI) Concepts in Python

Inspecting model outputs

y_predicted = model.predict(X_test_all)

Is y_predicted == y_true ?

from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_predicted)
Artificial Intelligence (AI) Concepts in Python

Inspecting model outputs

y_predicted = model.predict(X_test_all)

Is y_predicted == y_true ?

from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_predicted)

The confusion matrix:

REALITY: YES REALITY: NO
PREDICTION: YES 560 80
PREDICTION: NO 50 210
Artificial Intelligence (AI) Concepts in Python

Confusion matrix: True positives

Diabetes present No diabetes
Diabetes predicted TRUE POSITIVES
No diabetes predicted

 

TRUE POSITIVE = the model predicts diabetes and the patient is actually suffering from it.

Artificial Intelligence (AI) Concepts in Python

Confusion matrix: True negatives

Diabetes present No diabetes
Diabetes predicted true positives
No diabetes predicted TRUE NEGATIVES

 

TRUE POSITIVE = the model predicts diabetes and the patient is actually suffering from it.

TRUE NEGATIVE = model predicts no diabetes and the patient is actually healthy.

Artificial Intelligence (AI) Concepts in Python

Confusion matrix: False positives

Diabetes present No diabetes
Diabetes predicted true positives FALSE POSITIVES
No diabetes predicted true negatives

 

TRUE POSITIVE = the model predicts diabetes and the patient is actually suffering from it.

TRUE NEGATIVE = model predicts no diabetes and the patient is actually healthy.

FALSE POSITIVE = model predicts diabetes but the patient is actually healthy (Type I error).

Artificial Intelligence (AI) Concepts in Python

Confusion matrix: False negatives

Diabetes present No diabetes
Diabetes predicted true positives false positives
No diabetes predicted FALSE NEGATIVES true negatives

 

TRUE POSITIVE = the model predicts diabetes and the patient is really suffering from it.

TRUE NEGATIVE = model predicts no diabetes and the patient is really healthy.

FALSE POSITIVE = model predicts diabetes but the patient is actually healthy (Type I error).

FALSE NEGATIVE = diabetes present but not detected by the model (Type II error).

Artificial Intelligence (AI) Concepts in Python

Accuracy, precision, recall

Metrics:

  • Accuracy: "How often did I make the correct diagnosis?"
  • Precision: "How often was I correct when I said a person has diabetes?" (= 1 - T1 error)
  • Recall: "What percentage of actual diabetes cases did my model detect?" (= 1 - T2 error)
Artificial Intelligence (AI) Concepts in Python

Code example using Python + Scikit-learn

Using Python and scikit-learn:

from sklearn.metrics import accuracy_score, precision_score, recall_score

accuracy_score(y_true, y_predicted) # Same arguments for precision and recall
Result: 0.88
Artificial Intelligence (AI) Concepts in Python

Knowledge check!

Artificial Intelligence (AI) Concepts in Python

Preparing Video For Download...