Data Privacy and Anonymization in Python
Rebeca Gonzalez
Instructor
$$
$$








Differential privacy is a mathematical definition of privacy.






Greek letter epsilon $\epsilon$: How private and how noise a data release is.
For example $\epsilon$ = 1.
$\epsilon^1=2.72$

# Get counts and bars for non-private histogram of salaries counts, bins = np.histogram(salaries)# Normalize counts to get proportions of the height proportions = counts / counts.sum()# Draw the histogram of proportions plt.bar(bins[:-1], height=proportions, width=(bins[1] - bins[0])) plt.show()

import diffprivlib.tools# Get counts and bars for private histogram of salaries with epsilon of 0.1 dp_counts, dp_bins = tools.histogram(salaries, epsilon=0.1)# Normalize counts to get proportions dp_proportions = dp_counts / dp_counts.sum()# Draw the histogram of proportions and see differences plt.bar(dp_bins[:-1], dp_proportions, width=(dp_bins[1] - dp_bins[0])) plt.show()
Non-private histogram

Resulting private histogram

Data Privacy and Anonymization in Python