Review of classification methods

Fraud Detection in Python

Charlotte Werger

Data Scientist

What is classification?

Goal of classification: Use known fraud cases to train a model to recognize new fraud cases

Examples:

  • Email spam/Not spam
  • Transaction online fraudulent: Yes/No
  • Tumor Malignant/Benign?

Variable to predict: $y \in {0,1} $

0: Negative class ("majority" normal cases)

1: Positive class ("minority" fraud cases)

Fraud Detection in Python

Classification methods commonly used for fraud detection

  • Logistic regression

Fraud Detection in Python

Classification methods commonly used for fraud detection

  • Neural network

Fraud Detection in Python

Classification methods commonly used for fraud detection

  • Decision trees
  • Random forests

Fraud Detection in Python

Decision trees and random forests

  • Random forests are a collection of trees on random subsets of features

Fraud Detection in Python

Random forests for fraud detection

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
predicted = model.predict(X_test)
print (metrics.accuracy_score(y_test, predicted))
0.991324200913242
Fraud Detection in Python

Let's practice!

Fraud Detection in Python

Preparing Video For Download...