Differentially private clustering models

Data Privacy and Anonymization in Python

Rebeca Gonzalez

Data engineer

Comparing models

Image showing 3 clusters in data, after clustering with non-private k-means

Image showing 3 clusters in data, after clustering with private k-means

Comparing the models

Image showing the difference between the two clustering results displayed as pink points

Difference between the models resulting clusters
They have the majority of results in common

Building differently private clustering models

from diffprivlib.models import KMeans


# Computing the clusters with the DP model
model = KMeans(epsilon=1, n_clusters=3)


# Run the model and obtain clusters
clusters = model.fit_predict(X)

Improving DP clustering models

We can pre-process data before doing clustering.
Feature scaling such as StandardScaler and dimensionality reduction methods like PCA.
- To reduce the inertia of the model
- Get more accurate segmentation groups
We do this with diffprivlib just as you would do with sklearn models.

Improving DP clustering models

from sklearn.decomposition import PCA


# Initialize PCA
pca = PCA()


# Fit transform data with PCA
X = pca.fit_transform(X)


# Computing the clusters with the DP model
model = dp_Kmeans(epsilon=1, n_clusters=3)


# Run the model and obtain clusters
clusters = model.fit_predict(X)

Improving DP clustering models

Image showing two resulting scatter plots with the clusters

Improving DP clustering models

Image showing the difference between the two clustering results displayed as pink points

Improved results by using data transformations

Elbow method

Image of the resulting plot after applying the elbow method on data

Epsilon

from diffprivlib.models import KMeans as model


# Computing the clusters with the DP model
model = dp_Kmeans(epsilon=0.2, n_clusters=3)


# Run the model and obtain clusters
clusters = model.fit_predict(X)

Epsilon

Image showing two resulting scatter plots with the clusters

Let's practice!

Data Privacy and Anonymization in Python