Fraud detection algorithms in action

Fraud Detection in Python

Charlotte Werger

Data Scientist

Traditional fraud detection with rules based systems

Fraud Detection in Python

Drawbacks of using rules based systems

Rules based systems have their limitations:

  1. Fixed thresholds per rule to determine fraud
  2. Limited to yes/no outcomes
  3. Fail to capture interaction between features
Fraud Detection in Python

Why use machine learning for fraud detection?

  1. Machine learning models adapt to the data, and thus can change over time
  2. Uses all the data combined rather than a threshold per feature
  3. Can give a score, rather than a yes/no
  4. Will typically have a better performance and can be combined with rules

Fraud Detection in Python

Refresher on machine learning models

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

# Step 1: split your features and labels into train and test data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Step 2: Define which model you want to use model = LinearRegression()
# Step 3: Fit the model to your training data model.fit(X_train, y_train)
# Step 4: Obtain model predictions from your test data y_predicted = model.predict(X_test)
# Step 5: Compare y_test to predictions and obtain performance metrics print (metrics.r2_score(y_test, y_predicted))
0.821206237313
Fraud Detection in Python

What you'll be doing in the upcoming chapters

  • Chapter 2. Supervised learning: train a model using existing fraud labels

  • Chapter 3. Unsupervised learning: use your data to determine what is 'suspicious' behavior without labels

  • Chapter 4. Fraud detection using text data: Learn how to augment your fraud detection models with text mining and topic modeling

Fraud Detection in Python

Let's practice!

Fraud Detection in Python

Preparing Video For Download...