Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Similar documents
Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained ERM

What Can We Learn Privately?

Lecture 20: Introduction to Differential Privacy

Differentially Private Linear Regression

Private Empirical Risk Minimization, Revisited

Privacy-preserving logistic regression

Data Privacy: Tensions and Opportunities

Answering Many Queries with Differential Privacy

Efficient Private ERM for Smooth Objectives

Differentially Private Machine Learning

CMPUT651: Differential Privacy

Lecture 1 Introduction to Differential Privacy: January 28

Extremal Mechanisms for Local Differential Privacy

PAC-learning, VC Dimension and Margin-based Bounds

Sparse PCA in High Dimensions

arxiv: v1 [cs.lg] 27 May 2014

Differential Privacy: a short tutorial. Presenter: WANG Yuxiang

Differentially-private Learning and Information Theory

1 Differential Privacy and Statistical Query Learning

CSE 598A Algorithmic Challenges in Data Privacy January 28, Lecture 2

SYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS. Hafiz Imtiaz and Anand D. Sarwate

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Machine Learning

Overfitting, Bias / Variance Analysis

PAC-learning, VC Dimension and Margin-based Bounds

Machine Learning

Empirical Risk Minimization Algorithms

New Statistical Applications for Differential Privacy

The Bernstein Mechanism: Function Release under Differential Privacy

Report on Differential Privacy

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Information-theoretic foundations of differential privacy

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Wishart Mechanism for Differentially Private Principal Components Analysis

Privacy-Preserving Data Mining

CIS 700 Differential Privacy in Game Theory and Mechanism Design January 17, Lecture 1

Knockoffs as Post-Selection Inference

SYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS. Hafiz Imtiaz and Anand D. Sarwate

Differentially Private ANOVA Testing

Learning Theory Continued

Near-optimal Differentially Private Principal Components

Introduction to Bayesian Learning. Machine Learning Fall 2018

The Optimal Mechanism in Differential Privacy

Extremal Mechanisms for Local Differential Privacy

Rényi Differential Privacy

Enabling Accurate Analysis of Private Network Data

Generalization theory

The Implications of Privacy-Aware Choice

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Differentially private convex optimization with piecewise affine objectives

Differential Privacy without Sensitivity

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

Approximate and Probabilistic Differential Privacy Definitions

Privacy in Statistical Databases

Logistic Regression Logistic

Iterative Constructions and Private Data Release

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Mitigating Bias in Adaptive Data Gathering via Differential Privacy

Boosting and Differential Privacy

Wishart Mechanism for Differentially Private Principal Components Analysis

A Learning Theory Approach to Noninteractive Database Privacy

Private Learning and Sanitization: Pure vs. Approximate Differential Privacy

Learning Theory. Aar$ Singh and Barnabas Poczos. Machine Learning / Apr 17, Slides courtesy: Carlos Guestrin

Safer Data Mining: Algorithmic Techniques in Differential Privacy

ECE521 week 3: 23/26 January 2017

Online Learning, Mistake Bounds, Perceptron Algorithm

CS229 Supplemental Lecture notes

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

Stochastic Gradient Descent

Bounds on the Sample Complexity for Private Learning and Private Data Release

Midterm, Fall 2003

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Differential Privacy

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso

Midterm Exam, Spring 2005

Warm up: risk prediction with logistic regression

Differential Privacy and Pan-Private Algorithms. Cynthia Dwork, Microsoft Research

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear and Logistic Regression. Dr. Xiaowei Huang

Understanding Machine Learning A theory Perspective

Answering Numeric Queries

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

COMS 4771 Introduction to Machine Learning. Nakul Verma

Support vector machines Lecture 4

Computational Learning Theory

CSCI 5520: Foundations of Data Privacy Lecture 5 The Chinese University of Hong Kong, Spring February 2015

Analysis of a Privacy-preserving PCA Algorithm using Random Matrix Theory

The PAC Learning Framework -II

10.1 The Formal Model

Generating Private Synthetic Data: Presentation 2

Computational Learning Theory

[Title removed for anonymity]

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Transcription:

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017 1

many of today s most interesting computations 12 37-5 12 are on individuals data 2

3

3

3

3

3

3

3

3

3

many times, want to do this subject to privacy legal/regulatory requirements avoid subpoena public perception brand identity 4

source: http://edge.alluremedia.com.au/m/l/2015/03/appleprivacy.jpg

sometimes, privacy requirements are clear and binding legal/regulatory requirements avoid subpoena public perception brand identity 6

sometimes, privacy requirements are clear and binding legal/regulatory requirements avoid subpoena public perception brand identity well-studied optimization problem: meet privacy requirements while minimally impacting accuracy 6

sometimes, privacy requirements are clear and binding but often not legal/regulatory requirements avoid subpoena public perception brand identity 7

this paper: how to understand the best privacy we can give, subject to accuracy constraints? 8

today Formalizing privacy: differential privacy (private) Empirical risk minimization (ERM) Accuracy-First Private ERM 9

privacy: what to promise? access to the output should not enable one to learn much more about an individual than could be learned via the same analysis omitting that individual from the database 10

what to promise? name DOB sex weight smoker John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel Waters 9/5/43 F 140 N N lung cancer 18% 11

what to promise? think of output as randomized name DOB sex weight smoker John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel Waters 9/5/43 F 140 N N lung cancer 18% 11

what to promise? think of output as randomized name DOB sex weight smoker John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel Waters 9/5/43 F 140 N N lung cancer 5 8 18 24 32 12

what to promise? think of output as randomized promise: if you leave the database, no outcome will change probability by very much 5 8 18 24 32 13

statistical database model X set of possible entries/rows one row per person database D a set of rows; D X * name DOB sex weight smoker lung cancer John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel 14 Waters 9/5/43 F 140 N N

analyst objective wishes to compute on D X * fit a model, compute a statistic, share sanitized data preserve privacy of individuals design randomized algorithm A mapping D into outcome space, that masks small changes in D 17 19 18 16 20 15

neighboring databases what s a small change? require nearly identical behavior on neighboring databases differing by the addition or removal of a single row: D - D 1 1 for D,D X * 16

differential privacy [DinurNissim03, DworkNissimMcSherrySmith06, Dwork06] A randomized alg A : X * O is ε-differentially private if for every pair of neighboring data sets D,D and for every event S O: Pr[A(D) S] e ε Pr[A(D ) S] 17

differential privacy [DinurNissim03, DworkNissimMcSherrySmith06, Dwork06] A randomized alg A : X * O is ε-differentially private if for every pair of neighboring data sets D,D and for every event S O: Pr[A(D) S] e ε Pr[A(D ) S] e ε ~ (1 + ε) 17

differential privacy Pr[A(D) S] e ε Pr[A(D ) S] name DOB sex weight smoker lung cancer John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel Waters 9/5/43 F 140 N N 16 17 18 19 20 18

differential privacy Pr[A(D) S] e ε Pr[A(D ) S] name DOB sex weight smoker lung cancer John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel Waters 9/5/43 F 140 N N 16 17 18 19 20 18

differential privacy Pr[A(D) S] e ε Pr[A(D ) S] name DOB sex weight smoker lung cancer John Doe 12/1/51 M 185 Y N Jane Smith 3/3/46 F 140 N N Ellen Jones 4/24/59 F 160 Y Y Jennifer Kim 3/1/70 F 135 N N Rachel Waters 9/5/43 F 140 N N 16 17 18 19 20 18

differential privacy Pr[A(D) S] e ε Pr[A(D ) S] C. Dwork

(ε,δ)-differential privacy Pr[A(D) S] e ε Pr[A(D ) S] + δ C. Dwork

post-processing Any subsequent computations on the results of a DP computation maintain the privacy guarantee.

composition [DworkKenthapadiMcSherryMironovNaor06,DworkLei09] If run multiple DP algorithms, the εs and δs add up Allows simple privacy analysis of complex algorithms Holds even if subsequent computations chosen as function of previous results More subtle analysis gives even better guarantees

Laplace mechanism For numeric computations, direct noise addition of a particular form and magnitude preserves differential privacy

today Formalizing privacy: differential privacy (private) Empirical risk minimization (ERM) Accuracy-First Private ERM 24

25

what do companies do with our private data? learn to predict what you ll type learn to predict what ads you ll click on learn to predict what you ll buy 26

what do companies do with our private data? learn to predict what you ll type learn to predict what ads you ll click on learn to predict what you ll buy Extrapolate from lots of data a rule that maps individual behavior into a specific outcome 26

Empirical Risk Minimization (ERM) Setting: there are true labels for points, drawn from an underlying distribution Learner has access to a training set Pick a hypothesis (function mapping points to labels) from among a given set, to minimize mistakes on the training set 27

prior work: private ERM output and objective perturbation [Chaudhuri Monteleoni 2008, Chaudhuri Monteleoni Sarwate 2011, Kifer Smith Thakurta 2012, Rubeinstein Bartlett Huang Taft 2009] covariance perturbation [Smith Upadhyay Thakurta 2017] exponential mechanism [Bassily Smith Thakurta 2014, McSherry Talwar 2007] stochastic gradient descent [Bassily Smith Thakurta 2014, Duchi Jordan Wainwright 2013, Jain Kothari Thakurta 2012, Song Chaudhuri Sarwate 2013, Williams McSherry 2010] 28

Accuracy-First Private ERM: flip the theorem Flip existing utility theorem for existing private ERM algorithm to solve for the smallest epsilon (and other parameters) consistent with accuracy requirement Run existing algorithm with resulting epsilon Only prior theoretically sound approach 29

Accuracy-First Private ERM: flip the theorem Problem: utility theorems are worst-case (algorithms are often providing much better accuracy/privacy than they promise) Sloppy constants Specific dataset may allow for much better privacy/utility tradeoff 30

Accuracy-First Private ERM: search for the best possible epsilon Idea: try values of epsilon until find one that satisfies accuracy constraint Problems: Search is data-dependent, so pays for every attempt Not a priori clear how to bound privacy loss with usual notion search could run a long time (forever?) selected privacy parameter is function of the data 31

today Formalizing privacy: differential privacy (private) Empirical risk minimization (ERM) Accuracy-First Private ERM 32

Accuracy-First Private ERM: this paper A principled version of epsilon search Give a meta-method for this search applicable to several classes of private learning algorithms 33

High-level approach Initially, compute very private hypothesis Degrade the privacy guarantee by doubling until the accuracy guarantee is met 34

High-level approach Initially, compute very private hypothesis Degrade the privacy guarantee by doubling until the accuracy guarantee is met To not pay extra, use correlated noise across rounds, so can subtract noise from previous round to get next 34

High-level approach Initially, compute very private hypothesis Degrade the privacy guarantee by doubling until the accuracy guarantee is met To not pay extra, use correlated noise across rounds, so can subtract noise from previous round to get next New algorithm to minimize costs of checking whether accuracy guarantee is met 34

High-level approach Initially, compute very private hypothesis Degrade the privacy guarantee by doubling until the accuracy guarantee is met To not pay extra, use correlated noise across rounds, so can subtract noise from previous round to get next New algorithm to minimize costs of checking whether accuracy guarantee is met Pay only privacy cost of final hypothesis (earlier attempts are free) + checking 34

Ex post privacy 35

Ex post privacy Doesn t satisfy a priori ε-dp for any fixed ε, but if terminates after k rounds, seems to satisfy bounded ex post privacy loss 35

Ex post privacy Doesn t satisfy a priori ε-dp for any fixed ε, but if terminates after k rounds, seems to satisfy bounded ex post privacy loss 35

Ex post privacy Doesn t satisfy a priori ε-dp for any fixed ε, but if terminates after k rounds, seems to satisfy bounded ex post privacy loss c.f. privacy odometers [Rogers Roth Ullman Vadhan 2016] 35

Ex post privacy Definition. The ex-post privacy loss of a randomized algorithm A : X * O on outcome o is the maximum over pairs of neighboring data sets D,D of log(pr[a(d) = o] / Pr[A(D ) = o]) Definition. Consider function E:O (R 0 { }) on the outcome of algorithm A : X * O. Given outcome o = A(D) we say A satisfies E(o) ex-post differential privacy if for all o O, Loss(o) E(o). 36

Correlated noise: key idea [Koufogiannis Han Pappas 2017] Algorithm: continuous random walk starting at private data v, s.t. marginal distribution at each point in time is Laplace centered at v, with variance increasing over time. More private points can be derived from less private ones Reverse process 37

Checking algorithm: key ideas Existing algorithm AboveThreshold takes dataset and sequence of (adaptively chosen) queries, and privately outputs first query to exceed a given threshold. Pays much less than the composition of the queries. For us, queries depend on the data, so naively would need to publish (and pay for) them all 38

Applicable algorithms Our approach applies to any ERM technique that can be described as a postprocessing of a Laplace mechanism, e.g., output perturbation (add Laplace noise to result) covariance perturbation (perturb covariance matrix of data, then optimize using noisy data) 39

Empirical results: summary Our approach massively outperforms inverting the theory curve Also improves on a baseline epsilondoubling approach 40

Future directions Empirically, privacy loss from testing hypotheses significantly larger than from generating them. Loose analysis? (e.g., currently using a theoretical bound on the maximum norm of any hypothesis to compute the sensitivity of queries) InteractiveAboveThreshold for (ε,δ)- differential privacy 45

h i Theorem 3.2. The instantiation of CovNR(D,{" 1,...," T },, ) outputs a hypothesis p that with probability 1 satisfies L( p ) L( ) apple. Moreover, it is E-ex-post di erentially private, where the privacy loss function E : (([T ][{?}) R p )! (R 0 [{1}) is defined as E((k, )) = " 0 +" k for any k,?, E((?, )) = 1, and " 0 = 16(p 1/ +1) 2 log(2t/ ) n is the privacy loss incurred by IAT. Theorem 3.4. The instantiation of OutputNR(D," 0,{" 1,...," T },, ) is E-ex-post di erentially private and outputs a hypothesis p that with probability 1 satisfies L( p ) L( ) apple, where the privacy loss function E : (([T ] [ {?}) R p )! (R 0 [ {1}) is defined as E((k, )) = " 0 + " k for any k,?, E((?, )) = 1, and " 0 apple 32log(2T/ )p 2log2/ n is the privacy loss incurred by IAT. Theorem A.1. For any sequence of 1-sensitive queries f 1,...,f T such InteractiveAboveThreshold is (, )-accurate for 8 (log(t ) + log(2/ )) =. "

(a) Linear (ridge) regression. (b) Regularized logistic regression. (c) Linear (ridge) regression. (d) Regularized logistic regression. Figure 2: Empirical accuracies. The dashed line shows the requested accuracy level, while the others plot the actual accuracy achieved. Due most likely due to a pessimistic analysis and the need to set a small testing threshold, accuracies are significantly better than requested for both methods.

(a) Linear (ridge) regression. (b) Regularized logistic regression. Figure 3: Privacy breakdowns. Shows the amount of empirical privacy loss due to computing the hypotheses themselves and the losses due to testing their accuracies. (a) Linear (ridge) regression. (b) Regularized logistic regression. Figure 4: L 2 norms of final hypotheses. Shows the average L 2 norm of the output ˆ for each method, versus the theoretical maximum of 1/ p in the case of ridge regression and p 2log(2)/ in the case of regularized logistic regression.