Data Privacy and Anonymization in Python
Rebeca Gonzalez
Instructor
$$
$$
Differential privacy is a mathematical definition of privacy.
Greek letter epsilon $\epsilon$: How private and how noise a data release is.
For example $\epsilon$ = 1.
$\epsilon^1=2.72$
# Get counts and bars for non-private histogram of salaries counts, bins = np.histogram(salaries)
# Normalize counts to get proportions of the height proportions = counts / counts.sum()
# Draw the histogram of proportions plt.bar(bins[:-1], height=proportions, width=(bins[1] - bins[0])) plt.show()
import diffprivlib.tools
# Get counts and bars for private histogram of salaries with epsilon of 0.1 dp_counts, dp_bins = tools.histogram(salaries, epsilon=0.1)
# Normalize counts to get proportions dp_proportions = dp_counts / dp_counts.sum()
# Draw the histogram of proportions and see differences plt.bar(dp_bins[:-1], dp_proportions, width=(dp_bins[1] - dp_bins[0])) plt.show()
Non-private histogram
Resulting private histogram
Data Privacy and Anonymization in Python