Fraud Detection in Python
Charlotte Werger
Data Scientist
# Run the kmeans model on scaled data kmeans = KMeans(n_clusters=6, random_state=42).fit(X_scaled)
# Get the cluster number for each datapoint X_clusters = kmeans.predict(X_scaled)
# Save the cluster centroids X_clusters_centers = kmeans.cluster_centers_
# Calculate the distance to the cluster centroid for each point dist = [np.linalg.norm(x-y) for x,y in zip(X_scaled, X_clusters_centers[X_clusters])]
# Create predictions based on distance km_y_pred = np.array(dist) km_y_pred[dist>=np.percentile(dist, 93)] = 1 km_y_pred[dist<np.percentile(dist, 93)] = 0
Fraud Detection in Python