Privacy budgets

Data Privacy and Anonymization in Python

Rebeca Gonzalez

Instructor

Definition of differential privacy

  • Cynthia Dwork presents differential privacy with a mathematical definition.

Diagram representing how the output of a differentially private mechanism is basically the same no matter if someone's in the original dataset or not

  • Epsilon and accuracy are the most important quantities.
Data Privacy and Anonymization in Python

$\epsilon$ the privacy parameter

  • A metric of privacy loss
  • The smaller the value, the better privacy protection
Data Privacy and Anonymization in Python

Privacy budget

Drawing of a data curator on the left

Data Privacy and Anonymization in Python

Privacy budget

Drawing of a data curator on the left, and a third-party person. An arrow is pointing to the data curator with an epsilon value of 1 on top of the arrow, representing a query to the database

Data Privacy and Anonymization in Python

Privacy budget

Drawing of a data curator on the left, and a third-party person. Another arrow is pointing to the data curator with an epsilon value of 1 on top of the arrow, representing another query to the database Making the same private query with $\epsilon$ = 1 twice, it's like making a query with privacy $\epsilon$ = 2

Data Privacy and Anonymization in Python

Privacy budget

Drawing of the third-party performing calculations on the previously requested data Third-parties can average answers together, filtering out the noise.

Data Privacy and Anonymization in Python

Privacy budget

  • Limit on the privacy loss that any individual or group is allowed to accrue
  • Track the queries to the data

Diagram of a team performing data extraction to a database that adds noise before answering the queries

Data Privacy and Anonymization in Python

What's private enough?

  • Its goodness depends on the query as well as the data
  • Wide range of possible values for epsilon
Data Privacy and Anonymization in Python

What's private enough?

Epsilon $\epsilon$

  • Values between 0 and 1 are considered very good
  • Values above 10 are not good
  • Values between 1 and 10 are "better than nothing"

Remember that epsilon is exponential.

  • A system with $\epsilon$ = 1 is over 8,000 times more private than $\epsilon$ = 10.
Data Privacy and Anonymization in Python

What's private enough?

Image showing a bar plot of emoji uses next to apple's logo

1 Screenshot of the top emoji for US English speakers according to data collected by Apple.
Data Privacy and Anonymization in Python

Privacy budget: how to track it

from diffprivlib import BudgetAccountant

acc = BudgetAccountant(epsilon=5) acc
BudgetAccountant(epsilon=5)
Data Privacy and Anonymization in Python

Privacy budget: how to track it

# Compute a private mean of the salaries using epsilon of 0.5
# Use the Budget Accountant acc and set bounds to be from 0 to 230000
dp_mean = tools.mean(salaries, epsilon=0.5, accountant=acc, bounds=(0, 230000))

# Print the resulting private mean print("Private mean: ", dp_mean)
Private mean: 82524.72611901595
Data Privacy and Anonymization in Python

Privacy budget: how to track it

# Total privacy spent 
print("Total spent: ", acc.total())

# Privacy budget remaining print("Remaining budget: ", acc.remaining())
# Total number of queries done so far print("Number of queries recorded: ", len(acc))
Total spent: (epsilon=0.5, delta=0.0)

Remaining budget: (epsilon=4.5, delta=1.0)
Number of queries recorded: 1
Data Privacy and Anonymization in Python

Privacy budget: how to track it

# Privacy budget remaining for 2 queries
print("Remaining budget for 2 queries: ", acc.remaining(2))
Remaining budget for 2 queries: (epsilon=2.25, delta=1.0)
Data Privacy and Anonymization in Python

Let's practice!

Data Privacy and Anonymization in Python

Preparing Video For Download...