Pufferfish Privacy Mechanisms for Correlated Data. Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego

Similar documents
Challenges in Privacy-Preserving Analysis of Structured Data Kamalika Chaudhuri

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

Enabling Accurate Analysis of Private Network Data

Sample complexity bounds for differentially private learning

Lecture 20: Introduction to Differential Privacy

The Optimal Mechanism in Differential Privacy

Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies

Lecture 1 Introduction to Differential Privacy: January 28

The Optimal Mechanism in Differential Privacy

Differential Privacy and Verification. Marco Gaboardi University at Buffalo, SUNY

Lecture 11- Differential Privacy

On Node-differentially Private Algorithms for Graph Statistics

arxiv: v1 [cs.db] 29 Sep 2016

Privacy in Statistical Databases

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks. Yang Cao Emory University

Calibrating Noise to Sensitivity in Private Data Analysis

Broadening the scope of Differential Privacy using metrics

Differentially Private Real-time Data Release over Infinite Trajectory Streams

Privacy-preserving logistic regression

Rényi Differential Privacy

What Can We Learn Privately?

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Chapter 7: Simple linear regression

CSE 598A Algorithmic Challenges in Data Privacy January 28, Lecture 2

1 Differential Privacy and Statistical Query Learning

How Well Does Privacy Compose?

[Title removed for anonymity]

Private Empirical Risk Minimization, Revisited

Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies

Alternative Parameterizations of Markov Networks. Sargur Srihari

CMPUT651: Differential Privacy

Directed Graphical Models or Bayesian Networks

Intelligent Systems:

Towards information flow control. Chaire Informatique et sciences numériques Collège de France, cours du 30 mars 2011

Constrained Private Mechanisms for Count Data

Optimal Noise Adding Mechanisms for Approximate Differential Privacy

Semantic Security: Privacy Definitions Revisited

Differential Privacy

Differentially Private ANOVA Testing

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

t-closeness: Privacy Beyond k-anonymity and l-diversity

Notes on Markov Networks

Extremal Mechanisms for Local Differential Privacy

CIS 700 Differential Privacy in Game Theory and Mechanism Design January 17, Lecture 1

The Contact Process on the Complete Graph with Random, Vertex-dependent, Infection Rates

Privacy-Utility Tradeoff for Time-Series with Application to Smart-Meter Data

Information-theoretic foundations of differential privacy

arxiv: v1 [cs.lg] 27 May 2014

arxiv: v4 [cs.db] 4 Feb 2010

Using Graphs to Describe Model Structure. Sargur N. Srihari

Greedy Maximization Framework for Graph-based Influence Functions

CS Lecture 3. More Bayesian Networks

Extremal Mechanisms for Local Differential Privacy

Alternative Parameterizations of Markov Networks. Sargur Srihari

Approximate and Probabilistic Differential Privacy Definitions

Structured Variational Inference

Database Privacy: k-anonymity and de-anonymization attacks

Differentially Private Linear Regression

1 Hoeffding s Inequality

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Near-optimal Differentially Private Principal Components

Privacy in Statistical Databases

Quantitative Approaches to Information Protection

Product rule. Chain rule

Locally Differentially Private Protocols for Frequency Estimation. Tianhao Wang, Jeremiah Blocki, Ninghui Li, Somesh Jha

A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information

Conditional Independence and Factorization

Statistical Privacy For Privacy Preserving Information Sharing

Sketching and Streaming for Distributions

Unpredictable Walks on the Integers

Learning with Temporal Point Processes

Answering Many Queries with Differential Privacy

Zero-Knowledge Private Computation of Node Bridgeness in Social Networks

Matrix Multiplication

The Niakhar Social Network and Health Project (2016)

Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign

Inferential Privacy Guarantees for Differentially Private Mechanisms

Differentially-private Learning and Information Theory

arxiv: v1 [stat.ml] 28 Oct 2017

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

On Finding Planted Cliques in Random Graphs

4: The Pandemic process

Diffusion of Innovation and Influence Maximization

Sampling and Estimation in Network Graphs

Differential Privacy and its Application in Aggregation

Local Differential Privacy

Challenge Paper: Marginal Probabilities for Instances and Classes (Poster Presentation SRL Workshop)

Ignorance is Often Bliss: Envy with Incomplete Information

Why (Bayesian) inference makes (Differential) privacy easy

Efficient Private ERM for Smooth Objectives

Modeling with Exponential Functions

Privacy-Preserving Data Mining

Computational Complexity of Inference

Bayesian Networks (Part II)

Linear Dynamical Systems

On Computational Thinking, Inferential Thinking and Data Science

Is It As Easy To Discover As To Verify?

Please simplify your answers to the extent reasonable without a calculator. Show your work. Explain your answers, concisely.

Sending Quantum Information with Zero Capacity Channels

Transcription:

Pufferfish Privacy Mechanisms for Correlated Data Shuang Song, Yizhen Wang, Kamalika Chaudhuri University of California, San Diego

Sensitive Data with Correlation } Data from social networks, } eg. Spreading of flu } Time series data, } eg. Exercise per week

Why Does Correlation Make Privacy Protection Hard? } D = X $,, X ', X ( = 1(person i has flu) } Goal: } Publish (approximately) # of people w/ flu } Prevent anyone from knowing if a specific person has flu or not } Correlation makes privacy protection hard: } Knowing information about your social network à inferring information about you

Previous Solution 1 } Differential Privacy: } Hide the effect of single individual } # of infected people + noise w/ std ~1 } Assuming best case: everyone is independent } There can be large connect groups and the disease can be highly contagious } No enough protection! Dwork, C., McSherry, F., Nissim, K., & Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference 2006

Previous Solution 2 } Group-Differential Privacy: } Hide the effect of the entire group } # of infected people + noise w/ std ~Group_Size } Assuming worst case: complete correlation within group } Group size can be large } Poor utility!

The Hope for a Better Solution DP: Everyone is independent Group-DP: Everyone is completely correlated No enough protection! No enough utility! } Observation: most real problems have low average correlation

Pufferfish Privacy a framework with correlation taken into consideration Kifer, D., & Machanavajjhala, A. A rigorous and customizable framework for privacy. PODS 2012

3 Components of Pufferfish Privacy: 1. Secrets S: set of information need to be protected } eg. Alice has flu, Bob was sleeping at 10am 2. Secret pair Q S S: set of pairs of secrets need to be indistinguishable } eg. (Alice has flu, Alice is healthy), (Bob was sleeping at 10am, Bob was exercising at 10am) 3. Θ: a set of data distributions that plausibly generate the data (where correlation is captured) } eg. Flu is passed w.p. 0.2, Bob exercises only if he gets up before 8am

ε-pufferfish Privacy } A privacy mechanism M is ε-pufferfish private with Pufferfish parameters (S, Q, Θ) if w Range M, (s (, s N ) Q, θ Θ with X θ e RS p V,W M X = w s (, θ p V,W M X = w s N, θ es when P s ( θ 0, P s N θ 0. } ε measures privacy level: smaller ε à more privacy. p V,W M X s (, θ p V,W M X s N, θ w

Pufferfish Privacy } Allows correlation } Generalizes differential privacy } DP is a special case where Θ contains all independent distributions How to achieve Pufferfish privacy?

Algorithms for Pufferfish Privacy } Algorithms for special Pufferfish instantiations: [KM12, HMD12] } Our contribution: } Wasserstein Mechanism (completely general) } Markov Quilt Mechanism (Bayesian network) } Concurrent work: [GK17] [HMD12] He, X., Machanavajjhala, A., & Ding, B. Blowfish privacy: Tuning privacy-utility trade-offs using policies. SIGMOD 2014 [GK17] Ghosh, A., & Kleinberg, R. Inferential Privacy Guarantees for Differentially Private Mechanisms. ITCS 2017

Outline } Pufferfish privacy definition } Our algorithms: } Wasserstein Mechanism (Please come to our poster for detail J) } Markov Quilt Mechanism } Experimental results

Markov Quilt Mechanism (MQM) } Bayesian network: } X = {X $,, X ' } + DAG G = (X, E) } E represents parent-child relationship

MQM on Markov Chain X $ X g X (R$ X ( X ' } Θ: Markov Chains {X $,, X ' }, X ( [K] } S = {X ( = s i n, s [K]} } Q = { X ( = s, X ( = t i n, s, t K, s t} } f: 1-Lipschitz function } E.g. count } Extend to any Lipschitz function

Markov Quilt Mechanism Intuition Remote X k Almost independent to X ( Neighbor X l Closely related to X ( X $ X g X (Rh X (R$ X ( X (ij X ' Influence of X ( decays Influence of X ( decays Separation of X k and X l

Markov Quilt Mechanism Intuition Remote X Use a small correction k factor Almost independent to X ( Treat Neighbor like complete X l Closely correlation related to X ( X $ X g X (Rh X (R$ X ( X (ij X ' Separation of X k and X l 1. How do we formalize separation? 2. What is the small correction factor for X k? 3. How much noise to add for X l?

1. Formalizing Separation } Markov Quilt X m for X ( : } Deleting X m breaks graph into X l and X k, X ( X l } Given X m, X ( and X k are independent X $ X g X (Rh X (R$ X ( X (ij X ' X m : Separation of X k and X l

2. Small Correction Factor for Remote X $ X g X (Rh X (R$ X ( X (ij X ' } Max-influence of X ( on any set X n under Θ: e X n X ( } Lower max-influence à less correlated = max q,r,s t,u log P X n = x n X ( = s, θ P X n = x n X ( = t, θ } Correction term for X k X m : e X k X m X ( = e(x m X ( )

3. Noise for Neighbor X $ X g X (Rh X (R$ X ( X (ij X ' } ε e X m X ( budget left } Treat as worst case: } Group-DP: X l changes as a group } 1-Lipschitz function changes by at most X l } Final std of noise: X l ε e X m X (

Putting Together Everything - Repeat for all X ( : - Repeat for all X m s - noise X ( = min V z V { SR (V z V } ) - noise D = max ( ['] noise X ( - Output f D + Noise w/ std ~noise D } Theorem: MQM guarantees ε-pufferfish privacy.

Outline } Pufferfish privacy definition } Our algorithms: } Wasserstein Mechanism } Markov Quilt Mechanism } Experimental results

Experiments: Privacy-Utility Trade-off } Data: Markov chains } Query function f: histogram of states } Compare four algorithms: } Group-DP } [GK17] } MQM-Approx } MQM-Exact

On Physical Activity Data } 36 overweight subjects, 16 older subjects, 40 cyclists } ~ every 12 s, ~7 days during daytime (7 chains each of length ~3000) } 4 states: active, standing still, standing moving, sedentary } Q: {(activity a at time t, activity b at time t)} } Θ: obtained from data } Aggregate histograms over each group of subjects Data from Research in Environment, Active Aging & Community Health (REACH), Department of Family Medicine & Public Health, UC San Diego

Relative Frequency 1 0.8 0.6 0.4 0.2 Group-DP MQM Approx MQM Exact Older Relative Frequency 1 0.8 0.6 0.4 0.2 Group-DP MQM Approx MQM Exact Overweight 0 Active Stand Still Stand Moving Sedentary 0 Active Stand Still Stand Moving Sedentary Relative Frequency 1 0.8 0.6 0.4 0.2 0 Group-DP MQM Approx MQM Exact Active Cyclist Stand Still Stand Moving Sedentary } ε = 1 } Four algorithms: } MQM-Exact } MQM-Approx } Group-DP } [GK17]: does not apply

On Electricity Power Data } Electricity power meter reading (Watt) in a single household in the greater Vancouver area, BC, Canada } Every 1 min, ~2 years (1 chain of length ~1,000,000) } Divided into 200:200:10000W (51 states) } Q: {(power level a at time t, power level b at time t)} } Θ: obtained from data Makonin, S., Ellert, B., Bajić, I. V., & Popowich, F. (2016). Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Scientific data, 3.

ε = 1 Normalized l $ error: MQM-Exact: 0.019 MQM-Approx: 0.061 Group-DP: 1 [GK17]: does not apply

Running Time } Running time in second: Overweight Older Cyclist Power Data MQM-Approx 0.0028 0.0060 0.0064 0.0567 MQM-Exact 0.6299 1.2786 1.5186 282.2273 } Use MQM-Approx when } State space is large } Enough data to mitigate the effect of the approximation

Summary } Privacy definition: } Pufferfish privacy framework } Our contributions: } Wasserstein Mechanism for general Pufferfish instantiations } Markov Quilt Mechanism for Bayesian network } Experimental results

Questions?

Wasserstein Mechanism } How sensitive is the query function wrt to secret pair (s (, s N )? } p f X s (, θ vs. p f X s N, θ } Need a right distance measurement for distributions Wasserstein distance

-Wasserstein Distance } Two random variables X~μ, Y~ν W Ž (μ, ν) = inf, max x y s, n } Γ(μ, ν): all possible joint distributions of X, Y } A : the support of γ Γ(μ, ν) } The minimum effort needed to convert μ to ν. W Ž μ, ν = 1

Wasserstein Mechanism } Given any query function f, For any secret pair s (, s N Q, any θ Θ: μ (,u = p f X s (, θ, μ N,u = p f X s N, θ W (,N,u = W Ž (μ (,u, μ N,u ) W = sup q },q š,u W (,N,u M D = f D + Lap(W/ε) } Theorem: Wasserstein Mechanism guarantees ε-pufferfish privacy

Always Better than Group-DP } N: # of infected people N = 0 N = 1 N = 2 N = 3 N = 4 P(N X ( = 0) 0.2 0.225 0.5 0.075 0 P(N X ( = 1) 0 0.075 0.5 0.225 0.2 } Wasserstein Mechanism: noise w/ sd ~2 } Group-DP: noise w/ sd ~4 } Wasserstein Mechanism always no worse than Group-DP