Differential Privacy in an RKHS

Similar documents
New Statistical Applications for Differential Privacy

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

Distribution-Free Distribution Regression

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Oslo Class 2 Tikhonov regularization and kernels

Reproducing Kernel Hilbert Spaces

In terms of measures: Exercise 1. Existence of a Gaussian process: Theorem 2. Remark 3.

CIS 520: Machine Learning Oct 09, Kernel Methods

The Bernstein Mechanism: Function Release under Differential Privacy

Kernel Method: Data Analysis with Positive Definite Kernels

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

MATH 590: Meshfree Methods

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Probabilistic Machine Learning. Industrial AI Lab.

Privacy in Statistical Databases

Sparse Additive Functional and kernel CCA

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

RegML 2018 Class 2 Tikhonov regularization and kernels

Jun Zhang Department of Computer Science University of Kentucky

Reproducing Kernel Hilbert Spaces

Nonparametric Bayesian Methods (Gaussian Processes)

Approximation Theoretical Questions for SVMs

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Nonparametric Bayesian Methods

Advanced Introduction to Machine Learning CMU-10715

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Basis Expansion and Nonlinear SVM. Kai Yu

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Method for Multivariate Density Estimation

2 Garrett: `A Good Spectral Theorem' 1. von Neumann algebras, density theorem The commutant of a subring S of a ring R is S 0 = fr 2 R : rs = sr; 8s 2

Density Estimation (II)

Minimax Rates. Homology Inference

Differentially Private Linear Regression

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Robust Support Vector Machines for Probability Distributions

Learning gradients: prescriptive models

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

MA651 Topology. Lecture 9. Compactness 2.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech

Lebesgue Measure. Dung Le 1

Thus, X is connected by Problem 4. Case 3: X = (a, b]. This case is analogous to Case 2. Case 4: X = (a, b). Choose ε < b a

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Sparse Proteomics Analysis (SPA)

Math 5210, Definitions and Theorems on Metric Spaces

9.2 Support Vector Machines 159

Kernel Methods. Barnabás Póczos

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

4 Bias-Variance for Ridge Regression (24 points)

MS&E 226: Small Data

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Section 2: Classes of Sets

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 7: Kernels for Classification and Regression

Answering Numeric Queries

Kernel Logistic Regression and the Import Vector Machine

MS&E 226: Small Data

Generating Private Synthetic Data: Presentation 2

Lecture 1 Introduction to Differential Privacy: January 28

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

A Program for Data Transformations and Kernel Density Estimation

Statistical Convergence of Kernel CCA

IFT Lecture 7 Elements of statistical learning theory

Functional Analysis Review

Selçuk Demir WS 2017 Functional Analysis Homework Sheet

Lecture 4 February 2

Lecture 35: December The fundamental statistical distances

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Variable selection and machine learning methods in causal inference

Kernels for Multi task Learning

Learning sets and subspaces: a spectral approach

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Math 21b: Linear Algebra Spring 2018

Exam 2 extra practice problems

The Optimal Mechanism in Differential Privacy

Statistical Optimality of Stochastic Gradient Descent through Multiple Passes

Lecture Notes 1: Vector spaces

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Hilbert Space Representations of Probability Distributions

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Statistical Methods for Data Mining

Empirical Processes: General Weak Convergence Theory

Function Spaces. 1 Hilbert Spaces

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

Lecture Notes on Metric Spaces

A note on the σ-algebra of cylinder sets and all that

Transcription:

Differential Privacy in an RKHS Rob Hall (with Larry Wasserman and Alessandro Rinaldo) 2/20/2012 rjhall@cs.cmu.edu http://www.cs.cmu.edu/~rjhall 1

Overview Why care about privacy? Differential Privacy (a criteria to measure privacy protection). Background material Differentially Private methods in finite dimensions. Extending DP to function spaces. Examples. New material. 2

Privacy Caricature 1. Agency has data about individuals. Patient ID Tobacco Age Weight Heart Disease 0001 Y 36 170 N 0002 N 26 150 N 0003 N 45 165 Y 2. They release some statistics (e.g., for researchers to examine). 3. Somebody combines this with some side-information and identifies some individuals in the database. Embarrassment to agency, individuals, etc. (at very least). 3

Privacy Caricature 1. Agency has data about individuals. Patient ID Tobacco Age Weight Heart Disease 0001 Y 36 170 N 0002 N 26 150 N 0003 N 45 165 Y 2. They release some statistics (e.g., for researchers to examine). 3. Somebody combines this with some side-information Embarrassment to and identifies Goal of some Privacy-Preserving Data Mining agency, is to prevent individuals, etc. individuals identification, in the database. while maintaining utility (at in the very release. least). 4

Privacy Caricature 1. Agency has data about individuals. Patient ID Tobacco Age Weight Heart Disease 0001 Y 36 170 N 0002 N 26 150 N 0003 N 45 165 Y 2. They release some statistics Our (e.g., contribution for is when the researchers released to examine). data is itself a function. 3. Somebody combines this with some side-information Embarrassment to and identifies Goal of some Privacy-Preserving Data Mining agency, is to prevent individuals, etc. individuals identification, in the database. while maintaining utility (at in the very release. least). 5

This is not a Fake Problem Seemingly innocuous data releases may lead to embarrassment, lawsuits etc. 6

Background Consider algorithms which incorporate randomness: Patient ID Age Weight Unreleased intermediate value. 0001 36 170 0002 26 150 0003 45 165 Private Input data Output (to world) Characterized by probability measures indexed by D. 7

(α,β)-differential Privacy An algorithm is called (α,β)-differentially Private when: A pair of databases which differ by one element ( adjacent ). The measurable sets in the output space. Remarks: Same holds when D and D are reversed. Means that the distribution of the output doesn t change much when the input changes by one element. Sometimes called Approximate Differential Privacy (β > 0). Implications 8

Finite Dimensional Example For what Σ does this achieve (α,β)-dp? Patient ID Age Weight Unreleased intermediate value. 0001 36 170 0002 26 150 0003 45 165 Private Input data Output (to world) 9

Finite Dimensional Example For what Σ does this achieve (α,β)-dp? Patient ID Age Weight 0001 36 170 0002 26 150 0003 45 165 Patient ID Age Weight 0001 36 170 0002 56 213 0003 45 165 Replacing one row. 10

Finite Dimensional Example For what Σ does this achieve (α,β)-dp? Patient ID Age Weight 0001 36 170 0002 26 150 0003 45 165 Patient ID Age Weight 0001 36 170 0002 56 213 0003 45 165 Replacing one row. These two distributions are close (in DP sense) when: 11

DP via Sensitivity Analysis For what Σ does this achieve (α,β)-dp? Global Sensitivity Remarks: Gaussian noise achieves DP for any arbitrary family of vectors where the GS is appropriately bounded. If GS is bounded by some C then (α,β)-dp can be achieved via appropriate scaling of Σ. Typically requires data to live in compact set. To achieve (α,0)-dp, requires Laplace noise and sensitivity measured in the l 1 distance. 12

From Vectors to Functions The previous technique (and similar) work well for vectors: Statistical point estimates, Logistic regression, Linear SVM, Histograms, Contingency tables We consider methods which output functions: Kernel density estimators, Kernel SVM, Point estimates in function spaces Difficulties: no densities, different σ-field etc. 13

Measures on a Function Space Consider projections of functions obtained through evaluation on finite sets of points: x 2 x 1 x 1 x 2 14

Measures on a Function Space Inverse image of the Borel σ-field is a σ-field on the space of functions. x 2 x 1 x 1 x 2 Taking the union of these σ-fields over all countable subsets {x 1, x 2, } yields the Cylinder σ-field. 15

Measures on a Function Space Measures on the function space are defined by a consistent set of measures on the finite projections (c.f., Kolmogorov s extension theorem). x 2 x 1 x 1 x 2 Level sets of some measure. 16

DP on a Function Space We say a family of measures over a function space achieves (α,β)-dp whenever it achieves (α,β)-dp for every finite projection: x 2 x 1 Implies: x 1 x 2 achieving (α,β)-dp for every countable projection. achieving (α,β)-dp over the Borel σ-field (when separable/continuous etc). 17

DP with a Gaussian Process The obvious analog of Gaussian noise addition: Patient ID Age Weight 0001 36 170 Unreleased intermediate value. 0002 26 150 0003 45 165 Private Input data Output (to world) When does this achieve differential privacy? 18

Gaussian Processes A collection of random variables indexed by T: For any finite subset, the distribution is multivariate normal: Determined by mean function m and covariance kernel K. Interpreted as a function via the sample path: 19

Gaussian Processes blue: h = 0.1, red: h = 0.01 20

Gaussian Processes blue: h = 0.1, red: h = 0.01 21

Gaussian Processes Over higher dimension index space: 22

DP with a Gaussian Process Finite dimensional distributions are normal: Previous work demonstrates DP whenever: i.e., the Global Sensitivity being bounded on every finite projection. Suppose all f D lay in the RKHS corresponding to K, then the above is bounded by: 23

Salient Facts about an RKHS An inner product space of functions: Where: Each RKHS corresponds to a unique K. 24

Salient Facts about an RKHS An inner product space of functions: Where: The inner product: 25

Salient Facts about an RKHS An inner product space of functions: Where: The inner product: This definition leads to two important properties: The reproducing property: K x are the representers for the evaluation functionals: 26

DP with a Gaussian Process where An orthogonal projection to a linear subspace. Where P corresponds to any finite projection. 27

DP with a Gaussian Process When does this achieve differential privacy? Patient ID Age Weight 0001 36 170 Unreleased intermediate value. 0002 26 150 0003 45 165 Private Input data Output (to world) (at least) whenever: As before, may be achieved by scaling K 28

Example 1: Kernel Density Estimation A popular nonparametric density estimator in statistics. 29

Example 1: Kernel Density Estimation On two adjacent datasets the difference is: These functions are in the generating set of the RKHS with the kernel: Since: 30

Example 1: Kernel Density Estimation Therefore it is trivial to upper bound the RKHS norm: Hence DP is achieved by releasing: Note E G 2 = 1 31

Example 1: Kernel Density Estimation Remarks: released function is smooth, error is of same order as sampling error. 32

Example 2: Kernel SVM Consider regularized ERM in an RKHS: Classification, regression etc. D = {z i } = {(x i,y i )} Bousquet & Elisseeff (JMLR 2002) give a technique to bound the RKHS distance between functions on adjacent data. They use McDiarmid s to get generalization error bounds. Used for Differential Privacy before by Ben Rubinstein. The requisite bound: Lipschitz constant for loss function. 33

Example 2: Kernel SVM 34

DP in a Sobolev Space The same method may be applied more generally: Consider the Sobolev space of functions: This is an RKHS with the Kernel and the norm: In principle the technique may be applied to any family of functions where the latter quantity is bounded above. There is an analog in higher dimensions. 35

DP in a Sobolev Space We redo the analysis for Kernel Density Estimation in the above RKHS, finding: Taking the square root, we find this to be the same order as the above, and the constant off by a factor < 2. 36

Sobolev Technique Applied to KDE Remarks: released function is not smooth, but error is still of correct order. 37

Remarks and Current Work Lower error bounds? How much error must necessarily be introduced to achieve the DP. Upper error bounds? Easy for the l 2 metric. l, classification error etc? Sobolev technique in higher dimensions. 38

Thanks. 39