Assigning fraud versus non-fraud

Fraud Detection in Python

Charlotte Werger

Data Scientist

Starting with clustered data

Fraud Detection in Python

Assign the cluster centroids

Fraud Detection in Python

Define distances from the cluster centroid

Fraud Detection in Python

Flag fraud for those furthest away from cluster centroid

Fraud Detection in Python

Flagging fraud based on distance to centroid

# Run the kmeans model on scaled data
kmeans = KMeans(n_clusters=6, random_state=42).fit(X_scaled)

# Get the cluster number for each datapoint X_clusters = kmeans.predict(X_scaled)
# Save the cluster centroids X_clusters_centers = kmeans.cluster_centers_
# Calculate the distance to the cluster centroid for each point dist = [np.linalg.norm(x-y) for x,y in zip(X_scaled, X_clusters_centers[X_clusters])]
# Create predictions based on distance km_y_pred = np.array(dist) km_y_pred[dist>=np.percentile(dist, 93)] = 1 km_y_pred[dist<np.percentile(dist, 93)] = 0
Fraud Detection in Python

Validating your model results

  • Check with the fraud analyst
  • Investigate and describe cases that are flagged in more detail
  • Compare to past known cases of fraud
Fraud Detection in Python

Let's practice!

Fraud Detection in Python

Preparing Video For Download...