Novelty detection

Designing Machine Learning Workflows in Python

Dr. Chris Anagnostopoulos

Honorary Associate Professor

One-class classification

Training data without anomalies:

Two clusters of black points.

Future / test data with anomalies:

Two clusters of black points with some isolated points, too.

Workaround

preds = lof().fit_predict(
   np.concatenate([X_train, X_test]))

preds = preds[X_train.shape[0]:]

Two clusters of black points with some isolated red points.

Novelty LoF

clf = lof(novelty=True)

clf.fit(X_train)
y_pred = clf.predict(X_test)

Two clusters of black points with some isolated red points.

clf = OneClassSVM()

clf.fit(X_train)
y_pred = clf.predict(X_test)

y_pred[:4]

array([ 1,  1,  1, -1])

Two clusters of points with some isolated points. Most points are red, both from within the clusters and the isolated points.

clf = OneClassSVM()
clf.fit(X_train)
y_scores = clf.score_samples(X_test)

threshold = np.quantile(y_scores, 0.1)

y_pred = y_scores <= threshold

Two clusters of black points with some isolated red points.

clf = IsolationForest()
clf.fit(X_train)
y_scores = clf.score_samples(X_test)

clf = LocalOutlierFactor(novelty=True)
clf.fit(X_train)
y_scores = clf.score_samples(X_test)

Two clusters of black points with some isolated red points.

clf_lof = LocalOutlierFactor(novelty=True).fit(X_train)
clf_isf = IsolationForest().fit(X_train)
clf_svm = OneClassSVM().fit(X_train)

roc_auc_score(y_test, clf_lof.score_samples(X_test)

0.9897

roc_auc_score(y_test, clf_isf.score_samples(X_test))

0.9692

roc_auc_score(y_test, clf_svm.score_samples(X_test))

0.9948

clf_lof = LocalOutlierFactor(novelty=True).fit(X_train)
clf_isf = IsolationForest().fit(X_train)
clf_svm = OneClassSVM().fit(X_train)

accuracy_score(y_test, clf_lof.predict(X_test))

0.9318

accuracy_score(y_test, clf_isf.predict(X_test))

0.9545

accuracy_score(y_test, clf_svm.predict(X_test))

0.5

Designing Machine Learning Workflows in Python