Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies

Similar documents
Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies

A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing

DYNAMIC DIFFERENTIAL LOCATION PRIVACY WITH PERSONALIZED ERROR BOUNDS

Differential Privacy and its Application in Aggregation

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

Two sample Test. Paired Data : Δ = 0. Lecture 3: Comparison of Means. d s d where is the sample average of the differences and is the

Differentially Private ANOVA Testing

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies

Sample complexity bounds for differentially private learning

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Bias/variance tradeoff, Model assessment and selec+on

Sta$s$cal sequence recogni$on

Ensemble of Climate Models

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

Some Review and Hypothesis Tes4ng. Friday, March 15, 13

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Announcements. Topics: Homework: - sec0ons 1.2, 1.3, and 2.1 * Read these sec0ons and study solved examples in your textbook!

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Order- Revealing Encryp2on and the Hardness of Private Learning

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

REGRESSION AND CORRELATION ANALYSIS

1 Differential Privacy and Statistical Query Learning

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce

How Well Does Privacy Compose?

STA 4273H: Sta-s-cal Machine Learning

LBT for Procedural and Reac1ve Systems Part 2: Reac1ve Systems Basic Theory

Broadening the scope of Differential Privacy using metrics

On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy

Integra(ng and Ranking Uncertain Scien(fic Data

Least Squares Parameter Es.ma.on

Priors in Dependency network learning

Privacy-Preserving Data Mining

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Privacy in Statistical Databases

Differential Privacy

Lecture 11- Differential Privacy

Image Data Compression. Steganography and steganalysis Alexey Pak, PhD, Lehrstuhl für Interak;ve Echtzeitsysteme, Fakultät für Informa;k, KIT

Descriptive Statistics. Population. Sample

Experimental Designs for Planning Efficient Accelerated Life Tests

Reduced Models for Process Simula2on and Op2miza2on

Rényi Differential Privacy

Correla'on. Keegan Korthauer Department of Sta's'cs UW Madison

Report on Differential Privacy

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

Cri$ques Ø 5 cri&ques in total Ø Each with 6 points

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

Exponen'al func'ons and exponen'al growth. UBC Math 102

Aggrega?on of Epistemic Uncertainty

Data Processing Techniques

Robust Traceability from Trace Amounts

Semantic Security: Privacy Definitions Revisited

Tsybakov noise adap/ve margin- based ac/ve learning

Introduc)on to the Design and Analysis of Experiments. Violet R. Syro)uk School of Compu)ng, Informa)cs, and Decision Systems Engineering

Information-theoretic foundations of differential privacy

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

Short introduc,on to the

CS 6140: Machine Learning Spring 2016

Announcements. Topics: Work On: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook!

Quantum mechanics with indefinite causal order

Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego

Advances in IR Evalua0on. Ben Cartere6e Evangelos Kanoulas Emine Yilmaz

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression

Calibrating Noise to Sensitivity in Private Data Analysis

Enabling Accurate Analysis of Private Network Data

Differentially Private Linear Regression

Graph structure learning for network inference

Structural Equa+on Models: The General Case. STA431: Spring 2013

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Bayesian Methods for Highly Correlated Data. Exposures: An Application to Disinfection By-products and Spontaneous Abortion

Latent Dirichlet Alloca/on

Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri

Exponen'al growth Limi'ng factors Environmental resistance Carrying capacity logis'c growth curve

Differentially Private Real-time Data Release over Infinite Trajectory Streams

Secure Multiparty Computation from Graph Colouring

Towards a Systematic Analysis of Privacy Definitions

T- test recap. Week 7. One- sample t- test. One- sample t- test 5/13/12. t = x " µ s x. One- sample t- test Paired t- test Independent samples t- test

Exponen'al growth and differen'al equa'ons

Computer Vision. Pa0ern Recogni4on Concepts. Luis F. Teixeira MAP- i 2014/15

Structural Equa+on Models: The General Case. STA431: Spring 2015

The Optimal Mechanism in Differential Privacy

Local Differential Privacy

Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models

Informa(onal Subs(tutes and Complements for Predic(on

New Statistical Applications for Differential Privacy

Why (Bayesian) inference makes (Differential) privacy easy

t-closeness: Privacy Beyond k-anonymity and l-diversity

Quantitative Approaches to Information Protection

Machine learning for Dynamic Social Network Analysis

STAD68: Machine Learning

On Differentially Private Frequent Itemsets Mining

Differential Privacy and Verification. Marco Gaboardi University at Buffalo, SUNY

Parallel Repetition of entangled games on the uniform distribution

Class Notes. Examining Repeated Measures Data on Individuals

CS6750: Cryptography and Communica7on Security

Transcription:

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean- Pierre Hubaux ACM CCS 205 Denver, Colorado, USA October 5, 205

Outline Data Privacy and Membership Disclosure Differen+al Privacy Posi+ve Membership Privacy Prior- Belief Families and Equivalence between DP and PMP Bounded Priors Modeling Adversaries with Limited Background Knowledge Example: Inference A[acks for Genome- Wide Associa+on Studies Evalua+on Perturba+on Mechanisms for GWAS Trading Privacy, Medical U+lity and Cost 2

Differen+al Privacy,2 Belonging to a dataset Not belonging to it A mechanism A provides ε- DP iff for any datasets T and T2 differing in a single element, and any S range(a), we have: Pr[A(T ) 2 S] apple e Pr[A(T 2 ) 2 S] Unbounded DP Bounded DP T T 2 T T 2 Dwork. Differen+al privacy. Automata, languages and programming. 2006 2 Dwork et al. Calibra+ng Noise to Sensi+vity in Private Data Analysis. TCC 06. 2006 3

Posi+ve Membership Privacy Data Privacy: protec+on against membership disclosure Adversary should not learn whether an en+ty from a universe U = {t, t2, } belongs to the dataset T Is t in T? I am now sure t is in T! A A(T) t T Privacy: posterior belief prior belief for all en++es Impossible in general! (no free lunch) Li et al. Membership privacy: a unifying framework for privacy defini+ons. CCS 3. 203 4

Prior Belief Families Adversary s prior belief: Distribu+on D over 2 U Range of adversaries: Distribu+on family D A mechanism A sa+sfies (ε, D)- PMP iff for any S range(a), any prior distribu+on D D,, and any en+ty t U, we have Pr[t 2 T A(T ) 2 S] apple e Pr[t 2 T ] Pr[t /2 T A(T ) 2 S] e Pr[t /2 T ] Li et al. Membership privacy: a unifying framework for privacy defini+ons. CCS 3. 203 5

PMP ó DP Mutually Independent Distribu+ons: D D I : each en+ty t is in T, independently with probability p t D D B : Same as above, condi+oned on T =k, for some k Adversary also knows the size of the dataset T Theorem: - unbounded -DP, (, D I )-PMP - bounded -DP, (, D B )- PMP We focus on bounded DP (results hold for unbounded case) Li et al. Membership privacy: a unifying framework for privacy defini+ons. CCS 3. 203 6

Outline Data Privacy and Membership Disclosure Differen+al Privacy Posi+ve Membership Privacy Prior Belief Families and Equivalence between DP and PMP Bounded Priors Modeling Adversaries with Limited Background Knowledge Example: Inference A[acks for Genome- Wide Associa+on Studies Evalua+on Perturba+on Mechanisms for GWAS Trading Privacy, Medical U+lity and Cost 7

Bounded Priors Observa+on: D B includes adversarial priors with arbitrarily high certainty about all en++es: Pr[t 2 T ] 2 {0, }, 8t 6= t 0 2 U Pr[t 0 2 T ] 2 (0, ) Do we care about such strong adversaries? All en++es except t have no privacy a priori (w.r.t membership in T) The membership status of t can also be known with high certainty Membership is extremely rare / extremely likely Or adversary has strong background knowledge How do we model an adversary with limited a priori knowledge? 8

Bounded Priors We consider adversaries with the following priors: En++es are independent (size of dataset possibly known) Pr[t T] {0,} for some en++es Adversary might know membership status of some en++es a Pr[t T] b for other en++es, where a>0 and b< For an unknown en+ty, membership status is uncertain a priori Denoted D [a,b] B (or D a B if a=b) Ques+ons: Is the model relevant in prac'ce? What u'lity can we gain by considering a relaxed adversarial sedng? 9

Bounded Priors In Prac+ce: Example Genome- Wide Associa+on Studies: Case- Control study (typically N case = N ctrl ) Membership in case group pa+ent has some disease Find out which gene+c varia+ons (SNPs) are associated with disease Ex: χ 2 test for each SNP (low p- value conclude SNP is probably associated) Re- iden+fica+on a[acks,2 : Collect published aggregate sta's'cs for the case/control groups Use a vic'm s DNA sample & sta's'cal tes'ng to dis+nguish between: H 0 : vic+m is not in case group H : vic+m is in case group (vic+m has the disease) Assump'ons (some implicit): N case & N ctrl are known (usually published) En++es are independent Prior: Pr[t T] = N case / (N case + N ctrl ) typically ½ in anack evalua'ons ANacks taken seriously! (some sta+s+cs removed from open databases) 3 Homer et al. Resolving individuals contribu+ng trace amounts of DNA to highly complex mixtures using high- density SNP genotyping microarrays. PLoS gene+cs. 2008 2 Wang et al. Learning Your Iden+ty and Disease from Research Papers: Informa+on Leaks in Genome Wide Associa+on Study. CCS 09. 2009 3 Zerhouni and Nabel. "Protec+ng aggregate genomic data. Science. 2008 0

Achieving PMP for Bounded Priors Recall: -DP, (, D B )-PMP (ε, D B )- PMP: Pr[t 2 T A(T ) 2 S] apple e Pr[t 2 T ] Pr[t /2 T A(T ) 2 S] e Pr[t /2 T ] These inequali+es are 'ght iff Pr[t T] {0,} For bounded priors (Pr[t T] [a,b]) we have: -DP ) ( 0, D B [a,b] )-PMP, where 0 < Perturba+on required to achieve ε- PMP depends on [a,b] Minimal perturba+on required when a = b = ½

Privacy U+lity Tradeoff If we consider bounded adversaries with prior in D B [a,b] instead of adversaries with prior in D B : Are we s'll protec'ng against relevant threats? A[acks proposed on GWAS Can we gain in u'lity? Less data perturba+on required Actual gain to be evaluated 2

Outline Data Privacy and Membership Disclosure Differen+al Privacy Posi+ve Membership Privacy Prior Belief Families and Equivalence between DP and PMP Bounded Priors Modeling Adversaries with Limited Background Knowledge Example: Inference A[acks for Genome- Wide Associa+on Studies Evalua+on Perturba+on Mechanisms for GWAS Trading Privacy, Medical U+lity and Cost 3

Evalua+on Sta+s+cal Privacy for GWAS: Laplace / Exponen+al mechanisms based on χ 2 scores,2 Exponen+al mechanism with specialized distance metric 3 Tradeoffs:. Privacy Mi+gate inference a[acks 2. Output U'lity Associated SNPs should be output 3. Dataset Size Privacy and Cost depend on number of pa+ents What we want to achieve:. ε- PMP for: The adversarial seng of Homer et al., Wang et al. Compared to an unbounded adversary 2. High probability of outpung the correct SNPs 3. Also for small studies (N 2000) 4 Uhler, Slavkovic, and Fienberg. Privacy- Preserving Data Sharing for Genome- Wide Associa+on Studies. Journal of Privacy and Confiden+ality. 203 2 Yu et al. Scalable privacy- preserving data sharing methodology for genome- wide associa+on studies. Journal of biomedical informa+cs. 204 3 Johnson and Shma+kov. Privacy- preserving Data Explora+on in Genome- wide Associa+on Studies. KDD 3. 203 4 Spencer et al. Designing genome- wide associa+on studies: sample size, power, imputa+on, and the choice of genotyping chip. PLoS gene+cs. 2009 4

Evalua+on GWAS simula+on with 8532 SNPs, 2 associated SNPs Variable sample size N (N case = N ctrl ) Sa+sfy PMP for ε = ln(.5) Mechanism A protects against adversary with unbounded prior D B A must sa+sfy ε- DP Mechanism A protects against adversary with bounded prior D B ½ It is sufficient for A to sa+sfy ε - DP for ε = ln(2) Exponen+al mechanism from : Probability 0.75 0.5 0.25 A returned associated SNP A returned both associated SNPs A returned associated SNP A returned both associated SNPs 0 500 2000 2500 Sample Size Johnson and Shma+kov. Privacy- preserving Data Explora+on in Genome- wide Associa+on Studies. KDD 3. 203 5

Conclusion Membership privacy is easier to guarantee for adversaries with bounded priors Less perturba'on Higher u'lity For GWAS: BeNer tradeoff between dataset size and u'lity of output We can tailor privacy mechanisms to specific anacks/threats Can we make reasonable assump+ons on the adversary s prior beliefs? For GWAS: known a[acks implicitly rely on such assump+ons Compute appropriate level of noise to guarantee bounds on adversary s posterior beliefs Future Work: Can we build stronger inference a[acks on GWAS? Infer rare membership (disease status is typically rare in a popula+on) Known a[acks are less successful when prior Pr[t T] is very small Direct comparison: anack success rate vs. data perturba'on (u'lity) 2 Promote a prac+ce- oriented study of sta+s+cal privacy Sankararaman et al. "Genomic privacy and limits of individual detec+on in a pool." Nature gene+cs. 2009 2 Fredrikson et al. "Privacy in pharmacogene+cs: An end- to- end case study of personalized warfarin dosing." Proceedings of USENIX Security. 204 6