What Can We Learn Privately?

Similar documents
Differential Privacy II: Basic Tools

Privacy in Statistical Databases

1 Differential Privacy and Statistical Query Learning

Bounds on the Sample Complexity for Private Learning and Private Data Release

Bounds on the Sample Complexity for Private Learning and Private Data Release

Order- Revealing Encryp2on and the Hardness of Private Learning

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

Answering Many Queries with Differential Privacy

Lecture 1 Introduction to Differential Privacy: January 28

Private Learning and Sanitization: Pure vs. Approximate Differential Privacy

A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information

Iterative Constructions and Private Data Release

CSE 598A Algorithmic Challenges in Data Privacy January 28, Lecture 2

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

Faster Algorithms for Privately Releasing Marginals

Privacy-preserving logistic regression

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory

Calibrating Noise to Sensitivity in Private Data Analysis

Near-optimal Differentially Private Principal Components

Computational Learning Theory. CS534 - Machine Learning

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Enabling Accurate Analysis of Private Network Data

Lecture 20: Introduction to Differential Privacy

Differential Privacy: a short tutorial. Presenter: WANG Yuxiang

COMS 4771 Introduction to Machine Learning. Nakul Verma

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

Lecture 7: Passive Learning

Boosting and Differential Privacy

Lecture 11- Differential Privacy

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Sample complexity bounds for differentially private learning

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

Privacy-Preserving Data Mining

CIS 700 Differential Privacy in Game Theory and Mechanism Design January 17, Lecture 1

CMPUT651: Differential Privacy

Computational Learning Theory

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Introduction to Computational Learning Theory

Web-Mining Agents Computational Learning Theory

On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy

Differentially Private Linear Regression

arxiv: v1 [cs.cr] 28 Apr 2015

On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries

An Algorithms-based Intro to Machine Learning

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Online Learning, Mistake Bounds, Perceptron Algorithm

Differential Privacy

ICML '97 and AAAI '97 Tutorials

Distributed Private Data Analysis: Simultaneously Solving How and What

CS 6375: Machine Learning Computational Learning Theory

Computational Learning Theory. Definitions

Extremal Mechanisms for Local Differential Privacy

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Privacy in Statistical Databases

On the Sample Complexity of Noise-Tolerant Learning

Maryam Shoaran Alex Thomo Jens Weber. University of Victoria, Canada

Testing by Implicit Learning

Computational Learning Theory (COLT)

Efficient Algorithm for Privately Releasing Smooth Queries

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Lecture 29: Computational Learning Theory

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Learning symmetric non-monotone submodular functions

Statistical Query Learning (1993; Kearns)

Computational Learning Theory

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

Learners that Use Little Information

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

Analyzing Graphs with Node Differential Privacy

On Finding Planted Cliques and Solving Random Linear Equa9ons. Math Colloquium. Lev Reyzin UIC

FORMULATION OF THE LEARNING PROBLEM

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

New Statistical Applications for Differential Privacy

A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis

A Learning Theory Approach to Noninteractive Database Privacy

Generalization in Adaptive Data Analysis and Holdout Reuse. September 28, 2015

Statistical and Computational Learning Theory

Machine Learning: Homework 5

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

PAC Model and Generalization Bounds

Differentially Private Multi-party Computation

Computational Learning Theory: PAC Model

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Optimal Noise Adding Mechanisms for Approximate Differential Privacy

1 The Probably Approximately Correct (PAC) Model

Extremal Mechanisms for Local Differential Privacy

Learning Functions of Halfspaces Using Prefix Covers

Computational Lower Bounds for Statistical Estimation Problems

Generalization, Overfitting, and Model Selection

Private Empirical Risk Minimization, Revisited

Information-theoretic foundations of differential privacy

On Node-differentially Private Algorithms for Graph Statistics

Machine Learning

Sparse PCA in High Dimensions

6.842 Randomness and Computation March 3, Lecture 8

Generalization bounds

Adaptive Sampling Under Low Noise Conditions 1

Transcription:

What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Homin Lee Kobbi Nissim Adam Smith Los Alamos UT Austin Ben Gurion Penn State To appear in SICOMP special issue for FOCS 08 1

Private Learning Goal: machine learning algorithms that protect the privacy of individual examples (people, organizations,...) Desiderata Privacy: Worst-case guarantee (differential privacy) Learning: Distributional guarantee (e.g., PAC learning) This work Characterize classification problems learnable privately Understand power of popular models for private analysis 2

What Can We Compute Privately? Prior work: Function evaluation [DiNi,DwNi,BDMN,EGS,DMNS, ] Statistical Query (SQ) Learning [Blum Dwork McSherry Nissim 05] Learning Mixtures of Gaussians [Nissim Raskhodnikova Smith 08] Mechanism design [McSherry Talwar 07] This work: PAC Learning (in general, not captured by function evaluation) Some subsequent work: Learning [Chaudhuri Monteleoni 08, McSherry Williams 09, Beimel Kasiwiswanathan Nissim 10, Sarwate Chaudhuri Monteleoni] Statistical inference [Smith 08, Dwork Lei 09, Wasserman Zhou 09] Synthetic data [Machanavajjhala Kifer Abowd Gehrke Vilhuber 08,Blum Ligget Roth 08, Dwork Naor Reingold Rothblum Vadhan 09, Roth Roughgarden 10] Combinatorial optimization [Gupta Ligett McSherry Roth Talwar 10] A Tell me f(x) f(x) + noise User 3

Our Results 1: What is Learnable Privately PAC* SQ Halfplanes Conjunctions Parity Privately PAC-learnable PAC-learnable PAC* = Private PAC* PAC* = PAC learnable with poly samples, not necessarily efficiently 4

Basic Privacy Models Local Noninteractive Local (Interactive) Centralized x 3 x n-1 x 2 x 1 R 3 R n-1 R 2 R 1 x 3 x n-1 x 2 x 1 R 3 R n-1 R 2 R 1 x 1 x 2 x 3 x n-1 x n A x n R n x n R n Most work in data mining randomized response, input perturbation, Post Randomization Method (PRAM), Framework for High-Accuracy Strict-Privacy Preserving Mining (FRAPP) [W65, AS00, AA01,EGS03,HH02,MS06] Advantages: private data never leaves person s hands easy distribution of extracted information (e.g., CD, website) 5

Our Results 2: Power of Private Models Local Noninteractive = Nonadaptive SQ Masked Parity Parity Local = SQ Privately PAC-learnable Centralized = PAC* PAC* = PAC learnable with poly samples, not necessarily efficiently 6

Definition: Differential Privacy [DMNS06] x 1 x 2 x 3 Algorithm A A(x) x 1 x 2 x 3 Algorithm A A(x ) x n-1 x n x n-1 x n Intuition: Users learn roughly the same thing about me whether or not my data is in the database. A randomized algorithm A is -differentially private if for all databases x, x that differ in one element for all sets of answers S Pr[A(x) S] e ε Pr[A(x ) S] 7

Properties of Differential Privacy Composition: If algorithms A1 and A2 are ε-differentially private then the algorithm that outputs (A1(x),A2(x)) is 2ε-differentially private Meaningful in the presence of arbitrary external information 8

Learning: An Example* Bank needs to decide which applicants are bad credit risks Goal: given sample of past customers (labeled examples), produce good prediction rule (hypothesis) for future loan applicants example y i % down Recent delinquency? High debt? Mmp/ inc 10 No No 0.32 10 No Yes 0.25 5 Yes No 0.30 20 No No 0.31 5 No No 0.32 10 Yes Yes 0.38 Good Risk? Yes Yes Reasonable hypotheses given this data: Predict YES iff (!Recent Delinquency) AND (% down > 5) Predict YES iff 100*(Mmp/inc) (% down) < 25 No Yes No No label z i *Example taken from Blum, FOCS03 tutorial 9

PAC Learning: The Setting Algorithm draws independent examples from some distribution P, labeled by some target function c. 10

PAC Learning: The Setting Algorithm outputs hypothesis h (a function from points to labels). 11

PAC Learning: The Setting new point drawn from P Hypothesis h is good if it mostly agrees with target c: Pr y~p [h(y) c(y)] α. accuracy confidence Require that h is good with probability at least 1- β. 12

PAC * Learning Definition [Valiant 84] A concept class C is a set of functions {c : D {0,1}} together with their representation. Definition. Algorithm A PAC* learns concept class C if, for all c in C, all distributions P and all α, β in (0,1/2) Given poly(1/α,1/β,size(c)) examples drawn from P, labeled by some c in C A outputs a good hypothesis (of accuracy α) of poly length with probability 1- β in poly time 13

Private Learning Input: Database: x = (x 1,x 2,,x n ) where x i = (y i,z i ), where y i ~ P, z i = c(y i ) (z i is the label of example y i ) Output: a hypothesis e.g., Predict Yes if 100*(Mmp/inc) -(% down)<25 % down Recent delinquency? High debt? Mmp/ inc 10 No No 0.32 10 No Yes 0.25 25 5 Yes No No 0.30 Yes No 20 No No 0.31 Algorithm A privately PAC learns concept class C if: Utility: Algorithm A PAC learns concept class C Average-case guarantee Privacy: Algorithm A is -differentially private Worst-case guarantee Good Risk? Yes Yes Yes 14

How Can We Design Private Learners? Previous privacy work focused on function approximation First attempt: View non-private learner as function to be approximated Problem: Close hypothesis may mislabel many points 15

PAC* = Private PAC* Theorem. Every PAC* learnable concept class can be learned privately, using a poly number of samples. Proof: Adapt exponential mechanism [MT07]: score(x,h) = # of examples in x correctly classified by hypothesis h Output hypothesis h from C with probability e score(x,h) may take exponential time Privacy: for any hypothesis h, = Pr[h is output on input x] Pr[h is output on input x ] e score(x,h) h e score(x,h) score(x,h)=3 e score(x,h) h e score(x,h) e 2 score(x,h)=4 16

PAC* = Private PAC* Private version of Occam s razor Theorem. Every PAC* learnable concept class can be learned privately, using a poly number of samples. Proof: score(x,h) = # of examples in x correctly classified by h Output hypothesis h from C with probability e score(x,h) Utility (learning): Best hypothesis correctly labels all examples: Pr[h] e n Bad hypotheses mislabel > α fraction of examples: Pr[h] e e (1- α )n (# bad hypothesis) C Pr[output h is bad] β e n Sufficient to ensure n (ln C + ln(1/β)) / (εα). Then e α n (1- α )n w/ probability 1- β, output h labels 1- α fraction of examples correctly. Occam s razor : If n (ln C + ln(1/β)) / α then h does well on examples it does well on distribution P 17

Our Results: What is Learnable Privately PAC* SQ [BDMN05] Halfplanes Conjunctions Parity Privately PAC-learnable PAC-learnable PAC* = Private PAC* Note: Parity with noise is thought to be hard Private PAC learnable with noise 18

Efficient Learner for Parity Parity Problems Domain: D = {0,1} d Concepts: c r (x) = <r,x> (mod 2) Input: x =((y 1,c r (y 1 )),.., (y n,c r (y n ))) Each example (y i,c r (y i )) is a linear constraint on r (1101,1) translates to r 1 + r 2 + r 4 (mod 2) =1 Non-private learning algorithm: Find r by solving the set of linear equations over GF(2) imposed by input x 19

The Effect of a Single Example Let V i be space of feasible solutions for the set of equations imposed by (y 1,c r (y 1 )),,(y i,c r (y i )) Add a fresh example (y i+1,c r (y i+1 )) Consider the new solution space V i+1 Then V i+1 V i /2, or V i+1 = 0 (system becomes inconsistent) 001 101 The solution space changes drastically only when the nonprivate learner fails 000 100 100 000 new constraint: second third coordinate is 10 20

Private Learner for Parity Algorithm A 1. With probability ½ output fail. Smooths out extreme jumps in V i 2. Construct x s by picking each example from x with probability. 3. Solve the system of equations imposed by examples in x s. Let V be the set of feasible solutions. 4. If V = Ø, output fail. Otherwise, choose r from V uniformly at random; output c r. Lemma [utility]. Our algorithm PAC-learns parity with n = O((non-private-sample-size)/ε) Proof idea: Conditioned on passing step 1, get the same utility as with εn examples. By repeating a few times, pass step 1 w.h.p. 21

Private Learner for Parity Lemma. Algorithm A is 4ε-differentially private. Proof: For inputs x and x that differ in position i, show that for all outcomes probability ratio 1+4ε. Changed input x i enters the sample with probability ε. Probability of fail goes up or down by ε/2. Pr[A(x) fails] Pr[A(x ) fails]+ε/2 Pr[A(x ) fails] Pr[A(x ) fails] For hypothesis r: Pr[A(x) = r] Pr[A(x ) = r] 1+ε as Pr[A(x ) fails] 1/2. = ε Pr[A(x) = r i S] + (1-ε)Pr[A(x) = r i S] ε Pr[A(x ) 0= r i S] + (1-ε)Pr[A(x ) = r i S] 2 ε/(1- ε) + 1 4ε+1 for ε ½ Pr[A(x) = r i S] Pr[A(x) = r i S] Pr[A(x ) = r i S] Pr[A(x) = r i S] The 2 nd uses: = 2. Intuitively, this follows from V i V i-1 /2. 22

Our Results: What is Learnable Privately PAC* SQ [BDMN05] Halfplanes Conjunctions Parity Privately PAC-learnable PAC-learnable PAC* = Private PAC* Note: Parity with noise is thought to be hard Private PAC learnable with noise 23

Our Results 2: Power of Private Models Local Noninteractive = Nonadaptive SQ Local = SQ Centralized = PAC* PAC* = PAC learnable ignoring computational efficiency 24

Reminder: Local Privacy Preserving Protocols x 3 x 2 x 1 R 1 x n-1 R 2 R 3 User Interactive Non-interactive x n R n 25

Statistical Query (SQ) Learning [Kearns 93] Same guarantees as PAC model, but algorithm no longer has access to individual examples SQ Oracle g: D {0,1} {0,1} E y~p [g(y,c(y))=1] ± Algorithm Probability that a random labeled example (~ P) satisfies g τ > 1/poly(...) g can be evaluated in poly time poly running time Theorem [BDMN05]. Any SQ algorithm can be simulated by a private algorithm Proof: [DMNS06] Perturb query answers using Laplace noise. 26

(Non-interactive) Local =(Non-adaptive) SQ Theorem. Any (non-adaptive) SQ algorithm can be simulated by a (non-interactive) local algorithm Local protocol for SQ: For each i, compute bit R(x i ) Sum of noisy bits allows approximation to answer Participants can compute noisy bits on their own R (applied by each participant) is differentially private If all SQ queries are known in advance (non-adaptive), the protocol is non-interactive R R R A User 27

(Non-interactive) Local =(Non-adaptive) SQ Theorem. Any (non-interactive) local algorithm can be simulated by a (non-adaptive) SQ algorithm. Technique: Rejection sampling Proof idea [non-interactive case]: To simulate randomizer R: D W on entry z i, need to output w in W with probability p(w)=pr z~p [R(z)=w]. Let q(w)=pr[r(0)=w]. (Approximates p(w) up to factor e ε ). 1. Sample w from q(w) 2. With probability p(w)/(q(w)e ε ), output w 3. With the remaining probability repeat from (1) Use SQ queries to estimate p(w). Idea: p(w) = Pr z~p [R(z)=w] = E z~p [Pr[R(z)=w]]. 28

Our Results 2: Power of Private Models Local Noninteractive = Nonadaptive SQ Masked Parity Parity Local = SQ Centralized = PAC* PAC* = PAC learnable ignoring computational efficiency 29

Non-interactive Local Interactive Local Masked Parity Problems Concepts: c r,a : {0,1} d+log d+1 {+1,-1} indexed by r {0,1} d and a {0,1} c r,a (y,i,b) = (-1)r y (mod 2) + a if b=0 (-1) r i if b=1 (Adaptive) SQ learner: Two rounds of communication Non-adaptive SQ learner: Needs 2 d -1 samples Proof uses Fourier analytic argument similar to proof that parity is not in SQ 30

Summary PAC* is privately learnable Non-efficient learners Known problems in PAC are efficiently privately learnable Parity SQ [BDMN05] What else is in PAC? Equivalence of local model and SQ: Local = SQ Local non-interactive = non-adaptive SQ Interactivity helps in local model Local non-interactive Local SQ non-adaptive SQ Open questions Separate efficient learning from efficient private learning Better private algorithms for SQ problems Other learning models 31