Why (Bayesian) inference makes (Differential) privacy easy Christos Dimitrakakis 1 Blaine Nelson 2,3 Aikaterini Mitrokotsa 1 Benjamin Rubinstein 4 Zuhe Zhang 4 1 Chalmers university of Technology 2 University of Potsdam 3 Google 4 University of Melbourne 10/3/2015
Overview Example (Health insurance) We collect data x about treatments and patients. We disclose conclusions about treatment effectiveness. We want to hide individual patient information. Encryption does not help The general problem We wish to estimate something from a dataset x S. We wish to communicate what we learn to a third party. How much can they learn about x?
Bayesian inference and differential privacy Bayesian estimation What are its robustness and privacy properties? How important is the selection of the prior? Limiting the communication channel How should we communicate information about our posterior? How much can an adversary learn from our posterior?
Setting Dramatis personae x data. B a (Bayesian) statistician. ξ the statistician s prior belief. θ a parameter A an adversary. He knows ξ, should not learn x.
Setting Dramatis personae x data. B a (Bayesian) statistician. ξ the statistician s prior belief. θ a parameter A an adversary. He knows ξ, should not learn x. The game 1. B selects a model family (F) and a prior (ξ). 2. B observes data x and calculates the posterior ξ(θ x). 3. A queries B. 4. B responds with a function of the posterior ξ(θ x). 5. Goto 3.
Two related problem viewpoints Bayesian inference view Prior Data Posterior Observer Mechanism design view Mechanism Data Query Observer
Two related problem viewpoints Bayesian inference view Prior Data Posterior Observer B has a prior belief about how well a drug works. B obtains some data about the drug. B s conclusion from the data is a posterior belief. B s communicates its conclusion to an observer which: 1. Must be accurate. 2. Must not reveal a lot about the data.
Two related problem viewpoints Mechanism design view Mechanism Data Query Observer We want to design a mechanism. The observer can ask queries. The mechanism responds with a response that: 1. Is useful. 2. Does not reveal a lot about the data.
Two related problem viewpoints Bayesian inference view Prior Data Posterior Observer Mechanism design view Mechanism Data Query Observer We ll discuss a specific mechanism that provides privacy. The amount of privacy depends on the prior.
Privacy properties of Bayesian inference Bayesian inference Robustness of the posterior distribution Posterior sampling query model
Bayesian inference Estimating a coin s bias A fair coin comes heads 50% of the time. We want to test an unknown coin, which we think may not be completely fair. 4 prior 3 2 1 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ about the coin bias θ.
Bayesian inference 4 prior 3 2 1 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ about the coin bias θ. For a sequence of throws x t {0, 1}, P θ (x) θ x t (1 θ) 1 x t t = θ #Heads (1 θ) #Tails
Bayesian inference 10 8 prior likelihood 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ about the coin bias θ and likelihood of θ for the data. Say we throw the coin 100 times and obtain 70 heads. Then we plot the likelihood P θ (x) of different models.
Bayesian inference 10 8 prior likelihood posterior 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ(θ) about the coin bias θ, likelihood of θ for the data, and posterior belief ξ(θ x) From these, we calculate a posterior distribution over the correct models. This represents our conclusion given our prior and the data.
Bayesian inference Setting Dataset space S. Distribution family F { P θ θ Θ }. Each P θ is a distribution on S. We wish to identify which θ generated the observed data x. Prior distribution ξ on Θ (i.e. initial belief) Posterior given data x S (i.e. conclusion) ξ(θ x) = P θ(x)ξ(θ) φ(x) φ(x) P θ (x)ξ(θ). θ Θ (posterior) (marginal) Standard calculation that can be done exactly or approximately.
What we want to show If we assume the family F is well-behaved...... or that the prior ξ is focused on the nice parts of F
What we want to show If we assume the family F is well-behaved...... or that the prior ξ is focused on the nice parts of F Inference is robust. Our knowledge is private. There are also well-known F satisfying our assumptions.
What we want to show If we assume the family F is well-behaved...... or that the prior ξ is focused on the nice parts of F Inference is robust. Our knowledge is private. There are also well-known F satisfying our assumptions. First, we must generalise differential privacy...
Differential privacy of conditional distribution ξ( x) Definition ((ɛ, δ)-differential privacy) ξ( x) is (ɛ, δ)-differentially private if, x S = X n, B Θ ξ(b x) e ɛ ξ(b y) + δ, for all y in the hamming-1 neighbourhood of x.
Differential privacy of conditional distribution ξ( x) Definition ((ɛ, δ)-differential privacy) ξ( x) is (ɛ, δ)-differentially private if, x S = X n, B Θ ξ(b x) e ɛ ξ(b y) + δ, for all y in the hamming-1 neighbourhood of x. Definition ((ɛ, δ)-differential privacy under ρ.) ξ( x) is (ɛ, δ)-differentially private under a pseudo-metric ρ : S S R + if, B Θ and x S, ξ(b x) e ɛρ(x,y) ξ(b y) + δρ(x, y), y S If two datasets x, y are close, then the distributions ξ( x) and ξ( y) are also close.
Sufficient conditions Assumption (F is Lipschitz) For a given ρ on S, L > 0 s.t. θ Θ: ln P θ(x) P θ (y) Lρ(x, y), x, y S, (1) 0.1 0.08 x y 0.06 0.04 0.02 0 0 0.2 0.4 0.6 0.8 1 theta
Sufficient conditions Assumption (F is Lipschitz) For a given ρ on S, L > 0 s.t. θ Θ: ln P θ(x) P θ (y) Lρ(x, y), x, y S, (1) 1e+0 1e-10 x y 1e-20 1e-30 1e-40 1e-50 1e-60 1e-70 0 0.2 0.4 0.6 0.8 1 theta
Stochastic Lipschitz condition Assumption (The prior is concentrated on nice parts of F) Let the set of L-Lipschitz parameters be Θ L. Then c > 0 s.t. ξ(θ L ) 1 exp( cl), L (2) 1e+0 1e-10 x y 1e-20 1e-30 1e-40 1e-50 1e-60 1e-70 0 0.2 0.4 0.6 0.8 1 theta
Stochastic Lipschitz condition Assumption (The prior is concentrated on nice parts of F) Let the set of L-Lipschitz parameters be Θ L. Then c > 0 s.t. ξ(θ L ) 1 exp( cl), L (2) 1e+50 1e+00 x y prior 1e-50 1e-100 1e-150 1e-200 0 0.2 0.4 0.6 0.8 1 theta
Robustness of the posterior distribution Definition (KL divergence) D (P Q) ln dp dp. (3) dq Theorem
Robustness of the posterior distribution Definition (KL divergence) D (P Q) ln dp dp. (3) dq Theorem (i) Under Assumption 1, D (ξ( x) ξ( y)) 2Lρ(x, y) (4)
Robustness of the posterior distribution Definition (KL divergence) D (P Q) ln dp dp. (3) dq Theorem (i) Under Assumption 1, D (ξ( x) ξ( y)) 2Lρ(x, y) (4) (ii) Under Assumption 2, D (ξ( x) ξ( y)) κc ξ c ρ(x, y) (5)
Differential privacy of the posterior Theorem 1. Under Assumption 1, B S Θ : ξ(b x) e 2Lρ(x,y) ξ(b y) (6) i.e. the posterior is (2L, 0)-DP under ρ.
Differential privacy of the posterior Theorem 1. Under Assumption 1, B S Θ : ξ(b x) e 2Lρ(x,y) ξ(b y) (6) i.e. the posterior is (2L, 0)-DP under ρ. 2. Under Assumption 2, for all x, y S, B S Θ : κcξ ξ(b x) ξ(b y) ρ(x, y) 2c ( ) κcξ i.e. the posterior is 0, 2c -DP under ρ.
Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior.
Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior. Interactive access model At time t, the adversary observes a sample from the posterior: θ t ξ(θ x),
Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior. Interactive access model At time t, the adversary observes a sample from the posterior: θ t ξ(θ x), More generally, A can select a question q : Θ R, where R is a response space: r t = q(θ t )
Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior. Interactive access model At time t, the adversary observes a sample from the posterior: θ t ξ(θ x), More generally, A can select a question q : Θ R, where R is a response space: r t = q(θ t ) Non-interactive access model We simply release n samples from the posterior θ t ξ(θ x), t { 1,..., n }
Other differentially private mechanisms Laplace mechanism Add noise to responses to queries. r = q(x) + ω, ω Laplace(λ) Exponential mechanism Define a utility function u(x, r) p(r) e ɛu(x,r) µ(r). Subsampling Perform inference on a random sample of the data. Random projections / compressed sensing Perform a random linear projection.
Exponential mechanism Define a utility function u(x, r) p(r) e ɛu(x,r) µ(r). Respond with r with probability p(r). Connection to posterior mechanism Responses are parameters θ. Take u(θ, x) = log P θ (x). Take µ(θ) = ξ(θ). Then p(θ) = ξ(θ x). Rather than tuning ɛ, we can tune The prior ξ. The number of samples n.
Reduction of privacy to a testing problem How many samples n should we release? Different datasets x, y give different ξ( x), ξ( y). How many samples are needed to differentiate them?
Reduction of privacy to a testing problem How many samples n should we release? Different datasets x, y give different ξ( x), ξ( y). How many samples are needed to differentiate them? Theorem Under Assumption 1, the adversary can distinguish between data x, y with probability 1 δ if: Under Assumption 2, this becomes: ρ(x, y) 3 4Ln ln 1 δ. (7) ρ(x, y) 3c 2κC ξ n ln 1 δ. (8)
Conclusion Bayesian inference is inherently robust and private. Privacy is achieved by posterior sampling [Dimitrakakis et al] It suffices to select the right prior. But how? By construction for certain cases [Zhang et al] If we know constants L, c, C ξ, we can tune n. But similar problems exist more generally for DP mechanisms [Dwork] References C. Dimitrakakis, B. Nelson, A. Mitrokotsa, B. Rubinstein, Robust and Private Bayesian Inference, ALT 2014. Z. Zhang, B. Rubinstein, C. Dimitrakakis, Hide-And-Peek: Differential Privacy in Probabilistic Graphical Models, under review. C. Dwork, Differential privacy, ICALP 2006.
When do these assumptions hold? Example (Exponential families) Family F, with sufficient statistic T. { } p θ (x) = h(x) exp ηθ T (x) A(η θ) For a given θ Θ, we want to test if: ln(h(x)/h(y)) + ηθ (T (x) T (y)) Lρ(x, y), x, y X.
Example (Exponential distribution) Exponential prior with parameter c > 0, satisfies Assumption 2. Example (Discrete Bayesian networks) ln P θ(x) P θ (y) ln 1 ρ(x, y), ɛ where: ρ(x, y) is the number of edges and nodes influenced by the differences in x, y and ɛ the smallest P θ -mass placed to any event.
The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator.
The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator. The family of posterior measures Ξ { ξ( x) x S }, (9)
The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator. The family of posterior measures Ξ { ξ( x) x S }, (9) Lemma (Le Cam s method) Let ψ(θ) be an estimator of x. Let S 1, S 2 be 2δ-separated and say x S i ξ( x) Ξ i Ξ. Then: sup x S E ξ (ρ(ψ, x) x) δ sup ξ i co(ξ i ) ξ 1 ξ 2. (10)
The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator. The family of posterior measures Ξ { ξ( x) x S }, (9) Lemma (Le Cam s method) Let ψ(θ) be an estimator of x. Let S 1, S 2 be 2δ-separated and say x S i ξ( x) Ξ i Ξ. Then: sup x S E ξ (ρ(ψ, x) x) δ sup ξ i co(ξ i ) ξ 1 ξ 2. (10) Expected distance between the real and guessed data: E ξ (ρ(ψ, x) x) = ρ(ψ(θ), x) dξ(θ x), Θ