Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Similar documents
Answering Numeric Queries

Lecture 1 Introduction to Differential Privacy: January 28

The Optimal Mechanism in Differential Privacy

1 Cryptographic hash functions

Answering Many Queries with Differential Privacy

1 Cryptographic hash functions

The Optimal Mechanism in Differential Privacy

Differential Privacy and its Application in Aggregation

Privacy in Statistical Databases

Calibrating Noise to Sensitivity in Private Data Analysis

1 Hoeffding s Inequality

Lecture 11- Differential Privacy

Report on Differential Privacy

[Title removed for anonymity]

Differential Privacy

Lecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures

What Can We Learn Privately?

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

Lecture 18: Message Authentication Codes & Digital Signa

Lattice Cryptography

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

CMPUT651: Differential Privacy

Notes on Property-Preserving Encryption

Differential Privacy and Pan-Private Algorithms. Cynthia Dwork, Microsoft Research

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

Secure Multiparty Computation from Graph Colouring

Lecture 4: Perfect Secrecy: Several Equivalent Formulations

Approximate and Probabilistic Differential Privacy Definitions

Concurrent Non-malleable Commitments from any One-way Function

Privacy in Statistical Databases

SIGNATURE SCHEMES & CRYPTOGRAPHIC HASH FUNCTIONS. CIS 400/628 Spring 2005 Introduction to Cryptography

Pr[C = c M = m] = Pr[C = c] Pr[M = m] Pr[M = m C = c] = Pr[M = m]

Database Privacy: k-anonymity and de-anonymization attacks

Cryptography CS 555. Topic 25: Quantum Crpytography. CS555 Topic 25 1

arxiv: v1 [cs.cr] 28 Apr 2015

Lecture 5, CPA Secure Encryption from PRFs

Lecture 2: Perfect Secrecy and its Limitations

1 Distributional problems

Chapter 2 : Perfectly-Secret Encryption

Lecture 20: Introduction to Differential Privacy

5 Pseudorandom Generators

CS 151 Complexity Theory Spring Solution Set 5

Privacy-Preserving Data Mining

Probabilistically Checkable Arguments

Notes on the Exponential Mechanism. (Differential privacy)

1 Indistinguishability for multiple encryptions

12. Perturbed Matrices

On the Complexity of Differentially Private Data Release

The Power of Linear Reconstruction Attacks

Achilles: Now I know how powerful computers are going to become!

Zero-Knowledge Proofs and Protocols

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Lecture 1. 1 Introduction. 2 Secret Sharing Schemes (SSS) G Exposure-Resilient Cryptography 17 January 2007

CPSC 467: Cryptography and Computer Security

Math 4310 Solutions to homework 1 Due 9/1/16

Limitations in Approximating RIP

Quantitative Approaches to Information Protection

SYMMETRIC ENCRYPTION. Mihir Bellare UCSD 1

CPSC 467: Cryptography and Computer Security

The Exponential Mechanism (and maybe some mechanism design) Frank McSherry and Kunal Talwar Microsoft Research, Silicon Valley Campus

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

ASYMMETRIC ENCRYPTION

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Improved Direct Product Theorems for Randomized Query Complexity

HOW TO WRITE PROOFS. Dr. Min Ru, University of Houston

Cryptography CS 555. Topic 22: Number Theory/Public Key-Cryptography

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010

Error Correcting Codes Questions Pool

CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse

Homework 7 Solutions

CLASSICAL ENCRYPTION. Mihir Bellare UCSD 1

Inaccessible Entropy and its Applications. 1 Review: Psedorandom Generators from One-Way Functions

Data Obfuscation. Bimal Kumar Roy. December 17, 2015

Enabling Accurate Analysis of Private Network Data

On the Influence of Message Length in PMAC s Security Bounds

5th March Unconditional Security of Quantum Key Distribution With Practical Devices. Hermen Jan Hupkes

Notes on Complexity Theory Last updated: November, Lecture 10

Provable security. Michel Abdalla

RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response

Lecture Note 2. 1 Bonferroni Principle. 1.1 Idea. 1.2 Want. Material covered today is from Chapter 1 and chapter 4

Math 31 Lesson Plan. Day 5: Intro to Groups. Elizabeth Gillaspy. September 28, 2011

Universally Utility-Maximizing Privacy Mechanisms

Predicting AGI: What can we say when we know so little?

From Secure MPC to Efficient Zero-Knowledge

1 Maintaining a Dictionary

THE RANK METHOD AND APPLICATIONS TO POST- QUANTUM CRYPTOGRAPHY

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign

Modern Cryptography Lecture 4

How Well Does Privacy Compose?

REVIEW: Waves on a String

Lecture 11: Non-Interactive Zero-Knowledge II. 1 Non-Interactive Zero-Knowledge in the Hidden-Bits Model for the Graph Hamiltonian problem

Secret Sharing CPT, Version 3

Propositional Logic: Bottom-Up Proofs

A new security notion for asymmetric encryption Draft #8

Rényi Differential Privacy

Differential Privacy II: Basic Tools

CPSC 91 Computer Security Fall Computer Security. Assignment #3 Solutions

Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs

Transcription:

Privacy of Numeric Queries Via Simple Value Perturbation The Laplace Mechanism

Differential Privacy A Basic Model Let X represent an abstract data universe and D be a multi-set of elements from X. i.e. D can contain multiple copies of an element x X. Convenient to represent D as a histogram: D N X D i = x D x = x i

Differential Privacy A Basic Model i.e for a database of heights D = *5 2, 6 1, 5 8, 5 8, 6 0+,4 8- D =, 1,0,0,0,0,0,2,0,0,0,1,1,0, R 48 5 2 5 8 6 0 6 1

Differential Privacy A Basic Model The size of a database n: As a set: n = D. X As a histogram: n = D 1 = i=1 D i Definition: l 1 (Manhattan) Distance. For v R d d, v 1 = i=1 v i.

Differential Privacy A Basic Model The distance between two databases: As a set: DΔD. As a histogram: D D 1

Differential Privacy A Basic Model i.e for a database of heights D = *5 2, 6 1, 5 8, 5 8, 6 0+,4 8- D =, 1,0,0,0,0,0,2,0,0,0,1,1,0, R 48 5 2 5 8 6 0 6 1 D =, 2,1,0,0,0,0,1,0,0,0,1,1,0, R 48 D 1 = 1 + 2 + 1 + 1 = 5 D = 2 + 1 + 1 + 1 + 1 = 6 1 D D = 1 + 1 + 1 = 3 1

Basic Lower Bound: Blatant Non-Privacy How much noise is necessary to guarantee privacy? A simple model. For simplicity, D 0,1 X (i.e. no repeated elts) A query is a bit vector Q 0,1 X Q D = Q, D = D,i- i Q i =1 A subset sum query For S,n- write Q S for the vector: 1, i S Q S i = 0, i S

Basic Lower Bound: Blatant Non-Privacy Definition: A mechanism M: 0,1 n R is blatantly non-private if on any database D, an adversary can use y = M(D) to reconstruct D = A y that agrees with D on all but o(n) entries: D D 1 o(n)

Basic Lower Bound: Blatant Non-Privacy Answering all subset-sum queries requires linear noise. Definition: A mechanism M: 0,1 X R adds noise bounded by α if for every D 0,1 X and for every query S,n-: M D = y such that Q S D Q S y α

Basic Lower Bound: Blatant Non-Privacy Theorem: Let M be a mechanism that adds noise bounded by α. Then there exists an adversary that given M(D) can construct a database D such that D D 0 4α So adding noise o(n) leads to blatant non-privacy

Basic Lower Bound: Blatant Non-Privacy Proof: Consider the following adversary. Claim 1: The algorithm always outputs some D. Yes: D = D passes all tests. Claim 2: D D 0 4α Let S0 = *x X x D, x D+ Let S1 = *x X x D, x D + Observe D D 1 = S0 + S1 Let r = M(D) For each D *0,1+ X If Q S D Q S r α for all S X then: Output D So: If D D 1 > 4α then max S0, S1 > 2α. WLOG assume S0 > 2α. We know Q S0 D = 0, so by accuracy: Q S0 r α. But Q S0 D > 2α, so it must be: Q S0 D Q S0 r > 2α α = α So it would have failed one of the tests

Bad news! Basic Lower Bound: Blatant Non-Privacy Accuracy n/2 is trivial. Accuracy n/40 already lets an adversary reconstruct 9/10ths of the database entries! But that attack required answering all possible queries Guiding lower bound: Going forward, we will only try to be accurate for restricted classes of queries.

Differential Privacy A Basic Model Definition: A randomized algorithm with domain N X and range R M: N X R is (ε, δ)-differentially private if: 1) For all pairs of databases D, D N X such that D D 1 and, Differing in 1 1 person s data 2) For all events S R: Pr M D S e ε Pr M D S + δ. Private algorithms must be randomized

Resilience to Post Processing Proposition: Let M: N X R be (ε, δ)- differentially private and let f: R R be an arbitrary function. Then: f M: N X R is (ε, δ)-differentially private. Thinking about the output of M can t make it less private.

Resilience to Post Processing Proof: 1) Consider any pair of databases D, D N X with D D 1 1. 2) Consider any event S R. 3) Let T R be defined as T = *r R f r S+. Now: Pr f M D S = Pr,M D T- e ε Pr M D T + δ = e ε Pr f M D S + δ Randomized mappings f are just convex combinations of functions.

Resilience to Post Processing Take away message: 1) f as the adversaries analysis: can incorporate arbitrary auxiliary information the adversary may have. Privacy guarantee holds no matter what he does. 2) f as our algorithm: If we access the database in a differentially private way, we don t have to worry about how our algorithm post-processes the result. We only have to worry about the data access steps.

Answering Numeric Queries Suppose we have some numeric question about the private database that we want to know the answer to: Q: N X R k. Q D =? How do we do it privately? How much noise do we have to add? What are the relevant properties of Q?

Answering Numeric Queries Definition: The l 1 -sensitivity of a query Q: N X R k is: GS Q = max D,D : D D 1 1 Q D Q D 1 i.e. how much can 1 person affect the value of the query? How many people in this room have brown eyes : Sensitivity 1 How many have brown eyes, how many have blue eyes, how many have green eyes, and how many have red eyes : Sensitivity 1 How many have brown eyes and how many are taller than 6 : Sensitivity 2

Answering Numeric Queries The Laplace Distribution: Lap(b) is the probability distribution with p.d.f.: p x b) = 1 x exp 2b b i.e. a symmetric exponential distribution Y Lap b, E Y = b Pr Y t b = e t

Answering Numeric Queries: The Laplace Mechanism Laplace(D, Q: N X R k, ε): 1. Let Δ = GS(Q). 2. For i = 1 to k: Let Y i Lap( Δ ε ). 3. Output Q D + (Y 1,, Y k ) Independently perturb each coordinate of the output with Laplace noise scaled to the sensitivity of the function. Idea: This should be enough noise to hide the contribution of any single individual, no matter what the database was.

Answering Numeric Queries: The Laplace Mechanism Laplace(D, Q: N X R k, ε): 1. Let Δ = GS(Q). 2. For i = 1 to k: Let Y i Lap( Δ ). ε 3. Output Q D + (Y 1,, Y k )

To Ponder Where is there room for improvement? The Laplace mechanism adds independent noise to every coordinate What happens if the user asks (essentially) the same question in every coordinate? Read [Dinur,Nissim03]: a computationally efficient attack that gives blatant non-privacy for a mechanism that adds noise bounded by o( n).