Introduction to differential privacy

Data Privacy and Anonymization in Python

Rebeca Gonzalez

Instructor

What is differential privacy (DP)?

Do you dye your hair?

$$ $$ Image of a hand pointing to one of two buttons, selecting the one with a check mark

Drawing of blond woman

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Image of a coin

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Diagram of a coin pointing to the word heads at one of the branches, in the left

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Diagram of a coin pointing to heads at one of the branches, in the left. Then followed by the phrase "real answer"

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Diagram of a coin pointing to tails in the right side of the branches.

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Diagram of a coin pointing to tails in the right side of the branches. Followed by options head and tails

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Diagram of a coin pointing to tails in the right side of the branches. Followed by options head and tails. Heads has the ending word of "no" while tails has "yes"

Data Privacy and Anonymization in Python

What is differential privacy (DP)?

Differential privacy is a mathematical definition of privacy.

Drawing of a face divided in half. The right side has a question mark

Data Privacy and Anonymization in Python

Who uses differential privacy (DP)?

Logo of Apple

Data Privacy and Anonymization in Python

Who uses differential privacy (DP)?

Apple's emoji keyword

Apple's logo

Data Privacy and Anonymization in Python

Global differential privacy

  • Trusted curator protects data
  • Noise added to the output

Diagram of global differential privacy

Data Privacy and Anonymization in Python

Local differential privacy

  • No trusted party.
  • Adding noise before sharing it.

Diagram of local differential privacy

Data Privacy and Anonymization in Python

Epsilon-differential privacy

Greek letter epsilon $\epsilon$: How private and how noise a data release is.

  • Higher values of $\epsilon$ indicate more accurate and less private data
  • Low-$\epsilon$ systems give highly random data
Data Privacy and Anonymization in Python

Epsilon is exponential

For example $\epsilon$ = 1.

$\epsilon^1=2.72$

  • It's almost three times more private than $\epsilon$ = 2.
    • $\epsilon^2=7.39$
  • And over 8,000 times more private than $\epsilon$ = 10.
    • $\epsilon^10=22000$
Data Privacy and Anonymization in Python

K-anonymity and differential privacy

k-anonymity provides "syntactic" guarantees

  • Still widely used
  • Not sufficient in many cases

Differential privacy is the current de-facto privacy model

  • Preferred by companies: Apple, Uber, Google
  • Privacy degradation of releases can be exactly quantified
Data Privacy and Anonymization in Python

Introduction to diffprivlib

diffprivlib v0.3 from IBM

IBM's logo

Data Privacy and Anonymization in Python

Histograms

# Get counts and bars for non-private histogram of salaries
counts, bins = np.histogram(salaries)


# Normalize counts to get proportions of the height proportions = counts / counts.sum()
# Draw the histogram of proportions plt.bar(bins[:-1], height=proportions, width=(bins[1] - bins[0])) plt.show()
Data Privacy and Anonymization in Python

Histograms

Resulting non-private histogram

Data Privacy and Anonymization in Python

Private histogram

import diffprivlib.tools

# Get counts and bars for private histogram of salaries with epsilon of 0.1 dp_counts, dp_bins = tools.histogram(salaries, epsilon=0.1)
# Normalize counts to get proportions dp_proportions = dp_counts / dp_counts.sum()
# Draw the histogram of proportions and see differences plt.bar(dp_bins[:-1], dp_proportions, width=(dp_bins[1] - dp_bins[0])) plt.show()
Data Privacy and Anonymization in Python

Private histogram

Non-private histogram Resulting non-private histogram

Resulting private histogram Resulting private histogram

Data Privacy and Anonymization in Python

Let's practice!

Data Privacy and Anonymization in Python

Preparing Video For Download...