Other clustering fraud detection methods

Fraud Detection in Python

Charlotte Werger

Data Scientist

There are many different clustering methods

Fraud Detection in Python

And different ways of flagging fraud: using smallest clusters

Fraud Detection in Python

In reality it looks more like this

Fraud Detection in Python

DBSCAN versus K-means

  • No need to predefine amount of clusters
  • Adjust maximum distance between points within clusters
  • Assign minimum amount of samples in clusters
  • Better performance on weirdly shaped data
  • But.. higher computational costs
Fraud Detection in Python

Implementing DBSCAN

from sklearn.cluster import DBSCAN
db = DBSCAN(eps=0.5, min_samples=10, n_jobs=-1).fit(X_scaled)

# Get the cluster labels (aka numbers) pred_labels = db.labels_
# Count the total number of clusters n_clusters_ = len(set(pred_labels)) - (1 if -1 in pred_labels else 0) # Print model results print('Estimated number of clusters: %d' % n_clusters_)
Estimated number of clusters: 31
Fraud Detection in Python

Checking the size of the clusters

# Print model results
print("Silhouette Coefficient: %0.3f" % metrics.silhouette_score(X_scaled, pred_labels))
Silhouette Coefficient: 0.359
# Get sample counts in each cluster 
counts = np.bincount(pred_labels[pred_labels>=0])
print (counts)
[ 763  496  840  355 1086  676   63  306  560  134   28   18  262  128  332  22  
   22   13   31   38   36   28   14   12   30   10   11   10   21   10    5]
Fraud Detection in Python

Let's practice!

Fraud Detection in Python

Preparing Video For Download...