Fraud Detection in Python
Charlotte Werger
Data Scientist




# Run the kmeans model on scaled data kmeans = KMeans(n_clusters=6, random_state=42).fit(X_scaled)# Get the cluster number for each datapoint X_clusters = kmeans.predict(X_scaled)# Save the cluster centroids X_clusters_centers = kmeans.cluster_centers_# Calculate the distance to the cluster centroid for each point dist = [np.linalg.norm(x-y) for x,y in zip(X_scaled, X_clusters_centers[X_clusters])]# Create predictions based on distance km_y_pred = np.array(dist) km_y_pred[dist>=np.percentile(dist, 93)] = 1 km_y_pred[dist<np.percentile(dist, 93)] = 0
Fraud Detection in Python