Why (Bayesian) inference makes (Differential) privacy easy

Similar documents
Differential Privacy for Bayesian Inference through Posterior Sampling

Robust and Private Bayesian Inference

Analysis of Networks: Privacy in Bayesian Networks and Problems in Lattice Models

On the Differential Privacy of Bayesian Inference

Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

1 Differential Privacy and Statistical Query Learning

Classical and Bayesian inference

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

PMR Learning as Inference

CS 361: Probability & Statistics

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Part 1: Expectation Propagation

Calibrating Noise to Sensitivity in Private Data Analysis

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

Differential Privacy without Sensitivity

Probabilistic and Bayesian Machine Learning

Privacy-Preserving Data Mining

Outline Lecture 2 2(32)

Tutorial on Probabilistic Programming with PyMC3

an introduction to bayesian inference

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Bayesian Machine Learning

Bayesian Models in Machine Learning

Bayesian Inference by Density Ratio Estimation

Confident Bayesian Sequence Prediction. Tor Lattimore

Introduction to Probabilistic Machine Learning

Semiparametric posterior limits

Computational Perception. Bayesian Inference

Notes on pseudo-marginal methods, variational Bayes and ABC

Lecture 5: GPs and Streaming regression

Introduction to Bayesian inference

CS 361: Probability & Statistics

Privacy in Statistical Databases

1 Bayesian Linear Regression (BLR)

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Robustness and duality of maximum entropy and exponential family distributions

Bayesian Inference and MCMC

COMP90051 Statistical Machine Learning

Variational Inference via Stochastic Backpropagation

Bayesian Inference. Introduction

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Differential Privacy

CSC321 Lecture 18: Learning Probabilistic Models

Lecture 20: Introduction to Differential Privacy

Intelligent Systems:

CS 188: Artificial Intelligence Spring Today

The Bayesian approach to inverse problems

Lecture 11- Differential Privacy

LEARNING WITH BAYESIAN NETWORKS

CPSC 540: Machine Learning

Posterior Regularization

Locally Private Bayesian Inference for Count Models

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Lecture 35: December The fundamental statistical distances

Terminology. Experiment = Prior = Posterior =

Sample complexity bounds for differentially private learning

What are the Findings?

Lecture : Probabilistic Machine Learning

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Introduction to Bayesian Statistics

Bayesian Inference: Posterior Intervals

Probabilistic Graphical Models

Expectation Maximization

Bayesian Analysis (Optional)

How Well Does Privacy Compose?

Bayesian Learning (II)

CPSC 340: Machine Learning and Data Mining

CS 229: Lecture 7 Notes

Lecture 16 Deep Neural Generative Models

Consensus and Distributed Inference Rates Using Network Divergence

Naive Bayes classification

Nonparameteric Regression:

Part 3 Robust Bayesian statistics & applications in reliability networks

GWAS IV: Bayesian linear (variance component) models

Quiz 2 Date: Monday, November 21, 2016

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Inference as Optimization

CPSC 540: Machine Learning

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Answering Many Queries with Differential Privacy

Lecture 6: Model Checking and Selection

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Bayesian RL Seminar. Chris Mansley September 9, 2008

Grundlagen der Künstlichen Intelligenz

CS Lecture 19. Exponential Families & Expectation Propagation

Probabilistic Graphical Networks: Definitions and Basic Results

Density Estimation. Seungjin Choi

GWAS V: Gaussian processes

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Decision Trees. Tirgul 5

Biology as Information Dynamics

Bayesian Methods for Machine Learning

1 Hoeffding s Inequality

Integrated Non-Factorized Variational Inference

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

General Bayesian Inference I

Outline lecture 2 2(30)

Transcription:

Why (Bayesian) inference makes (Differential) privacy easy Christos Dimitrakakis 1 Blaine Nelson 2,3 Aikaterini Mitrokotsa 1 Benjamin Rubinstein 4 Zuhe Zhang 4 1 Chalmers university of Technology 2 University of Potsdam 3 Google 4 University of Melbourne 10/3/2015

Overview Example (Health insurance) We collect data x about treatments and patients. We disclose conclusions about treatment effectiveness. We want to hide individual patient information. Encryption does not help The general problem We wish to estimate something from a dataset x S. We wish to communicate what we learn to a third party. How much can they learn about x?

Bayesian inference and differential privacy Bayesian estimation What are its robustness and privacy properties? How important is the selection of the prior? Limiting the communication channel How should we communicate information about our posterior? How much can an adversary learn from our posterior?

Setting Dramatis personae x data. B a (Bayesian) statistician. ξ the statistician s prior belief. θ a parameter A an adversary. He knows ξ, should not learn x.

Setting Dramatis personae x data. B a (Bayesian) statistician. ξ the statistician s prior belief. θ a parameter A an adversary. He knows ξ, should not learn x. The game 1. B selects a model family (F) and a prior (ξ). 2. B observes data x and calculates the posterior ξ(θ x). 3. A queries B. 4. B responds with a function of the posterior ξ(θ x). 5. Goto 3.

Two related problem viewpoints Bayesian inference view Prior Data Posterior Observer Mechanism design view Mechanism Data Query Observer

Two related problem viewpoints Bayesian inference view Prior Data Posterior Observer B has a prior belief about how well a drug works. B obtains some data about the drug. B s conclusion from the data is a posterior belief. B s communicates its conclusion to an observer which: 1. Must be accurate. 2. Must not reveal a lot about the data.

Two related problem viewpoints Mechanism design view Mechanism Data Query Observer We want to design a mechanism. The observer can ask queries. The mechanism responds with a response that: 1. Is useful. 2. Does not reveal a lot about the data.

Two related problem viewpoints Bayesian inference view Prior Data Posterior Observer Mechanism design view Mechanism Data Query Observer We ll discuss a specific mechanism that provides privacy. The amount of privacy depends on the prior.

Privacy properties of Bayesian inference Bayesian inference Robustness of the posterior distribution Posterior sampling query model

Bayesian inference Estimating a coin s bias A fair coin comes heads 50% of the time. We want to test an unknown coin, which we think may not be completely fair. 4 prior 3 2 1 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ about the coin bias θ.

Bayesian inference 4 prior 3 2 1 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ about the coin bias θ. For a sequence of throws x t {0, 1}, P θ (x) θ x t (1 θ) 1 x t t = θ #Heads (1 θ) #Tails

Bayesian inference 10 8 prior likelihood 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ about the coin bias θ and likelihood of θ for the data. Say we throw the coin 100 times and obtain 70 heads. Then we plot the likelihood P θ (x) of different models.

Bayesian inference 10 8 prior likelihood posterior 6 4 2 0 0 0.2 0.4 0.6 0.8 1 Figure: Prior belief ξ(θ) about the coin bias θ, likelihood of θ for the data, and posterior belief ξ(θ x) From these, we calculate a posterior distribution over the correct models. This represents our conclusion given our prior and the data.

Bayesian inference Setting Dataset space S. Distribution family F { P θ θ Θ }. Each P θ is a distribution on S. We wish to identify which θ generated the observed data x. Prior distribution ξ on Θ (i.e. initial belief) Posterior given data x S (i.e. conclusion) ξ(θ x) = P θ(x)ξ(θ) φ(x) φ(x) P θ (x)ξ(θ). θ Θ (posterior) (marginal) Standard calculation that can be done exactly or approximately.

What we want to show If we assume the family F is well-behaved...... or that the prior ξ is focused on the nice parts of F

What we want to show If we assume the family F is well-behaved...... or that the prior ξ is focused on the nice parts of F Inference is robust. Our knowledge is private. There are also well-known F satisfying our assumptions.

What we want to show If we assume the family F is well-behaved...... or that the prior ξ is focused on the nice parts of F Inference is robust. Our knowledge is private. There are also well-known F satisfying our assumptions. First, we must generalise differential privacy...

Differential privacy of conditional distribution ξ( x) Definition ((ɛ, δ)-differential privacy) ξ( x) is (ɛ, δ)-differentially private if, x S = X n, B Θ ξ(b x) e ɛ ξ(b y) + δ, for all y in the hamming-1 neighbourhood of x.

Differential privacy of conditional distribution ξ( x) Definition ((ɛ, δ)-differential privacy) ξ( x) is (ɛ, δ)-differentially private if, x S = X n, B Θ ξ(b x) e ɛ ξ(b y) + δ, for all y in the hamming-1 neighbourhood of x. Definition ((ɛ, δ)-differential privacy under ρ.) ξ( x) is (ɛ, δ)-differentially private under a pseudo-metric ρ : S S R + if, B Θ and x S, ξ(b x) e ɛρ(x,y) ξ(b y) + δρ(x, y), y S If two datasets x, y are close, then the distributions ξ( x) and ξ( y) are also close.

Sufficient conditions Assumption (F is Lipschitz) For a given ρ on S, L > 0 s.t. θ Θ: ln P θ(x) P θ (y) Lρ(x, y), x, y S, (1) 0.1 0.08 x y 0.06 0.04 0.02 0 0 0.2 0.4 0.6 0.8 1 theta

Sufficient conditions Assumption (F is Lipschitz) For a given ρ on S, L > 0 s.t. θ Θ: ln P θ(x) P θ (y) Lρ(x, y), x, y S, (1) 1e+0 1e-10 x y 1e-20 1e-30 1e-40 1e-50 1e-60 1e-70 0 0.2 0.4 0.6 0.8 1 theta

Stochastic Lipschitz condition Assumption (The prior is concentrated on nice parts of F) Let the set of L-Lipschitz parameters be Θ L. Then c > 0 s.t. ξ(θ L ) 1 exp( cl), L (2) 1e+0 1e-10 x y 1e-20 1e-30 1e-40 1e-50 1e-60 1e-70 0 0.2 0.4 0.6 0.8 1 theta

Stochastic Lipschitz condition Assumption (The prior is concentrated on nice parts of F) Let the set of L-Lipschitz parameters be Θ L. Then c > 0 s.t. ξ(θ L ) 1 exp( cl), L (2) 1e+50 1e+00 x y prior 1e-50 1e-100 1e-150 1e-200 0 0.2 0.4 0.6 0.8 1 theta

Robustness of the posterior distribution Definition (KL divergence) D (P Q) ln dp dp. (3) dq Theorem

Robustness of the posterior distribution Definition (KL divergence) D (P Q) ln dp dp. (3) dq Theorem (i) Under Assumption 1, D (ξ( x) ξ( y)) 2Lρ(x, y) (4)

Robustness of the posterior distribution Definition (KL divergence) D (P Q) ln dp dp. (3) dq Theorem (i) Under Assumption 1, D (ξ( x) ξ( y)) 2Lρ(x, y) (4) (ii) Under Assumption 2, D (ξ( x) ξ( y)) κc ξ c ρ(x, y) (5)

Differential privacy of the posterior Theorem 1. Under Assumption 1, B S Θ : ξ(b x) e 2Lρ(x,y) ξ(b y) (6) i.e. the posterior is (2L, 0)-DP under ρ.

Differential privacy of the posterior Theorem 1. Under Assumption 1, B S Θ : ξ(b x) e 2Lρ(x,y) ξ(b y) (6) i.e. the posterior is (2L, 0)-DP under ρ. 2. Under Assumption 2, for all x, y S, B S Θ : κcξ ξ(b x) ξ(b y) ρ(x, y) 2c ( ) κcξ i.e. the posterior is 0, 2c -DP under ρ.

Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior.

Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior. Interactive access model At time t, the adversary observes a sample from the posterior: θ t ξ(θ x),

Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior. Interactive access model At time t, the adversary observes a sample from the posterior: θ t ξ(θ x), More generally, A can select a question q : Θ R, where R is a response space: r t = q(θ t )

Posterior sampling query model We select a prior ξ. We observe data x. We calculate a posterior ξ( x). An adversary has sampling-based access to the posterior. Interactive access model At time t, the adversary observes a sample from the posterior: θ t ξ(θ x), More generally, A can select a question q : Θ R, where R is a response space: r t = q(θ t ) Non-interactive access model We simply release n samples from the posterior θ t ξ(θ x), t { 1,..., n }

Other differentially private mechanisms Laplace mechanism Add noise to responses to queries. r = q(x) + ω, ω Laplace(λ) Exponential mechanism Define a utility function u(x, r) p(r) e ɛu(x,r) µ(r). Subsampling Perform inference on a random sample of the data. Random projections / compressed sensing Perform a random linear projection.

Exponential mechanism Define a utility function u(x, r) p(r) e ɛu(x,r) µ(r). Respond with r with probability p(r). Connection to posterior mechanism Responses are parameters θ. Take u(θ, x) = log P θ (x). Take µ(θ) = ξ(θ). Then p(θ) = ξ(θ x). Rather than tuning ɛ, we can tune The prior ξ. The number of samples n.

Reduction of privacy to a testing problem How many samples n should we release? Different datasets x, y give different ξ( x), ξ( y). How many samples are needed to differentiate them?

Reduction of privacy to a testing problem How many samples n should we release? Different datasets x, y give different ξ( x), ξ( y). How many samples are needed to differentiate them? Theorem Under Assumption 1, the adversary can distinguish between data x, y with probability 1 δ if: Under Assumption 2, this becomes: ρ(x, y) 3 4Ln ln 1 δ. (7) ρ(x, y) 3c 2κC ξ n ln 1 δ. (8)

Conclusion Bayesian inference is inherently robust and private. Privacy is achieved by posterior sampling [Dimitrakakis et al] It suffices to select the right prior. But how? By construction for certain cases [Zhang et al] If we know constants L, c, C ξ, we can tune n. But similar problems exist more generally for DP mechanisms [Dwork] References C. Dimitrakakis, B. Nelson, A. Mitrokotsa, B. Rubinstein, Robust and Private Bayesian Inference, ALT 2014. Z. Zhang, B. Rubinstein, C. Dimitrakakis, Hide-And-Peek: Differential Privacy in Probabilistic Graphical Models, under review. C. Dwork, Differential privacy, ICALP 2006.

When do these assumptions hold? Example (Exponential families) Family F, with sufficient statistic T. { } p θ (x) = h(x) exp ηθ T (x) A(η θ) For a given θ Θ, we want to test if: ln(h(x)/h(y)) + ηθ (T (x) T (y)) Lρ(x, y), x, y X.

Example (Exponential distribution) Exponential prior with parameter c > 0, satisfies Assumption 2. Example (Discrete Bayesian networks) ln P θ(x) P θ (y) ln 1 ρ(x, y), ɛ where: ρ(x, y) is the number of edges and nodes influenced by the differences in x, y and ɛ the smallest P θ -mass placed to any event.

The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator.

The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator. The family of posterior measures Ξ { ξ( x) x S }, (9)

The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator. The family of posterior measures Ξ { ξ( x) x S }, (9) Lemma (Le Cam s method) Let ψ(θ) be an estimator of x. Let S 1, S 2 be 2δ-separated and say x S i ξ( x) Ξ i Ξ. Then: sup x S E ξ (ρ(ψ, x) x) δ sup ξ i co(ξ i ) ξ 1 ξ 2. (10)

The Le Cam method for lower bounds Idea: Use S for the parameter space of an estimator. The family of posterior measures Ξ { ξ( x) x S }, (9) Lemma (Le Cam s method) Let ψ(θ) be an estimator of x. Let S 1, S 2 be 2δ-separated and say x S i ξ( x) Ξ i Ξ. Then: sup x S E ξ (ρ(ψ, x) x) δ sup ξ i co(ξ i ) ξ 1 ξ 2. (10) Expected distance between the real and guessed data: E ξ (ρ(ψ, x) x) = ρ(ψ(θ), x) dξ(θ x), Θ