Riservatezza dei dati e anonimizzazione in Python
Rebeca Gonzalez
Data Engineer


$$
$$

# Import the scikit-learn naive Bayes classifier from sklearn.naive_bayes import GaussianNB# Import the differentially private naive Bayes classifier from diffprivlib.models import GaussianNB
from sklearn.naive_bayes import GaussianNB# Crea il classificatore non privato nonprivate_clf = GaussianNB()# Allena il modello sui dati nonprivate_clf.fit(X_train, y_train)print("The accuracy of the non-private model is ", nonprivate_clf.score(X_test, y_test))
The accuracy of the non-private model is 0.8333333333333334
from diffprivlib.models import GaussianNB as dp_GaussianNB# Crea il classificatore privato con costruttore vuoto private_clf = dp_GaussianNB()# Allena il modello e guarda lo score private_clf.fit(X_train, y_train)print("The accuracy of the private model is ", private_clf.score(X_test, y_test))
The accuracy of the private model is 0.7
PrivacyLeakWarning: Bounds have not been specified and will be calculated
on the data provided. This will result in additional privacy leakage.
To ensure differential privacy and no additional privacy leakage, specify bounds for each dimension.
"privacy leakage, specify bounds for each dimension.", PrivacyLeakWarning)
Per evitare leakage, possiamo sostituire min e max passando l’argomento bounds. Può essere:
(0,100)
([0,1,0,2],[10,80,5,70])
# Imposta bounds a coprire almeno min e max bounds = (X_train.min(axis=0) - 1, X_train.max(axis=0) + 1)# Crea il classificatore con epsilon 0.5 dp_clf = dp_GaussianNB(epsilon=0.5, bounds=bounds)# Allena il modello e guarda lo score dp_clf.fit(X_train, y_train) print("The accuracy of the private model is ", private_clf.score(X_test, y_test))
The accuracy of the private model is 0.807000
# Importa il modulo random import random # Imposta i bounds con min e max dei dati più un po’ di rumore bounds = (X_train.min(axis=0) - random.sample(range(0, 30), 12), X_train.max(axis=0) + random.sample(range(0, 30), 12))# Crea il classificatore con epsilon 0.5 dp_clf = dp_GaussianNB(epsilon=0.5, bounds=bounds)# Allena il modello e guarda lo score dp_clf.fit(X_train, y_train) print("The accuracy of private classifier with bounds is ", dp_clf.score(X_test, y_test))
The accuracy of private classifier with bounds is 0.7544444444

Riservatezza dei dati e anonimizzazione in Python