CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Similar documents
Lecture 9: Tolerant Testing

TESTS BASED ON MAXIMUM LIKELIHOOD

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

PTAS for Bin-Packing

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

18.413: Error Correcting Codes Lab March 2, Lecture 8

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Pseudo-random Functions

CHAPTER VI Statistical Analysis of Experimental Data

MATH 247/Winter Notes on the adjoint and on normal operators.

Point Estimation: definition of estimators

Lecture 4 Sep 9, 2015

A tighter lower bound on the circuit size of the hardest Boolean functions

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

18.657: Mathematics of Machine Learning

Median as a Weighted Arithmetic Mean of All Sample Observations

Lecture 02: Bounding tail distributions of a random variable

Rademacher Complexity. Examples

8.1 Hashing Algorithms

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

CHAPTER 4 RADICAL EXPRESSIONS

Logistic regression (continued)

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

X ε ) = 0, or equivalently, lim

5 Short Proofs of Simplified Stirling s Approximation

Pseudo-random Functions. PRG vs PRF

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

MOLECULAR VIBRATIONS

1. BLAST (Karlin Altschul) Statistics

The Mathematical Appendix

1 Solution to Problem 6.40

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Chapter 9 Jordan Block Matrices

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 4 Multiple Random Variables

arxiv: v1 [cs.lg] 22 Feb 2015

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Lecture 3. Sampling, sampling distributions, and parameter estimation

Hard Core Predicates: How to encrypt? Recap

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Dimensionality Reduction and Learning

Chapter 5 Properties of a Random Sample

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Third handout: On the Gini Index

Q-analogue of a Linear Transformation Preserving Log-concavity

Bayes (Naïve or not) Classifiers: Generative Approach

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Lecture 2 - What are component and system reliability and how it can be improved?

Parameter, Statistic and Random Samples

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

1 Lyapunov Stability Theory

Lebesgue Measure of Generalized Cantor Set

Mu Sequences/Series Solutions National Convention 2014

1 Onto functions and bijections Applications to Counting

ENGI 4421 Propagation of Error Page 8-01

Chain Rules for Entropy

Arithmetic Mean and Geometric Mean

Laboratory I.10 It All Adds Up

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Econometric Methods. Review of Estimation

STA302/1001-Fall 2008 Midterm Test October 21, 2008

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Summary of the lecture in Biostatistics

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

2SLS Estimates ECON In this case, begin with the assumption that E[ i

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET

The Occupancy and Coupon Collector problems

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

We have already referred to a certain reaction, which takes place at high temperature after rich combustion.

D KL (P Q) := p i ln p i q i

7.0 Equality Contraints: Lagrange Multipliers

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2

Chapter 3 Sampling For Proportions and Percentages

Introduction to Probability

= lim. (x 1 x 2... x n ) 1 n. = log. x i. = M, n

Statistics: Unlocking the Power of Data Lock 5

MEASURES OF DISPERSION

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Lecture Note to Rice Chapter 8

Qualifying Exam Statistical Theory Problem Solutions August 2005

The internal structure of natural numbers, one method for the definition of large prime numbers, and a factorization test

Chapter 8. Inferences about More Than Two Population Central Values

Special Instructions / Useful Data

22 Nonparametric Methods.

Unsupervised Learning and Other Neural Networks

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

A New Measure of Probabilistic Entropy. and its Properties

Lecture 07: Poles and Zeros

Lecture 3 Probability review (cont d)

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Simulation Output Analysis

Transcription:

CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme: Defto 1 Database Update Sequece) Let D N be ay database ad let {D t, Q t, v t )} t1,...,l D C R) L be a sequece of tuples. We say the sequece s a U, D, C, α, L)- database update sequece f t satsfes the followg propertes: 1. D 1 U,, ), 2. for every t 1, 2,..., L, Q t D) Q t D t ) α, 3. for every t 1, 2,..., L, Q t D) v t < α,. ad for every t 1, 2,..., L 1, D t+1 UD t, Q t, v t ). Defto 2 Database Update Algorthm DUA)) Let U : D C R D be a update rule ad let B : R R be a fucto. We say U s a Bα)-DUA for query class C f for every database D N, every U, D, C, α, L)-database update sequece satsfes L Bα). Usg the expoetal mechasm as a dstgusher, we proved the followg utlty theorem about the IC mechasm: Theorem 3 Gve a Bα)-DUA, the Iteratve Costructo mechasm s α, β) accurate ad ɛ-dfferetally prvate for: α 8Bα/2) ɛ ad ɛ, δ)-dfferetally prvate for: so log as β/2bα/2)). α 16 Bα/2) log1/δ) ɛ We plugged the Meda Mechasm, based o the exstece of small ets, to get: Pluggg ths to the IC mechasm, we get: Theorem Istatated wth the meda mecham, the Iteratve Costructo mechasm s α, β) accurate ad ɛ-dfferetally prvate for: 32 log log C α ɛα 2 log log 2 α Õ C log1/β) ɛ )1/3 ad ɛ, δ)-dfferetally prvate for: α 32 log log C log1/δ) ɛα log log 3 α Õ C log1/δ) log 2 1/β)) 1/ ) ɛ 9-1

We ow gve a more sophstcated database update algorthm for lear queres. It wll work by matag a dstrbuto ˆD t over the data uverse X. A lear query s a atural geeralzato of a coutg query, whch we cosdered earler. Although ths ew mechasm wll oly apply to lear queres The meda mechasm worked for geerc classes of queres), t wll have sgfcatly mproved rug tme, ad slghtly) mproved accuracy. Defto 5 A lear query s a vector Q [0, 1] X, evaluated as QD) 1 Q, D. Equvaletly, we ca vew Q as a fucto Q : X [0, 1], ad evaluate: QD) x Qx X D ) Qx ) D[] Algorthm 1 The Multplcatve Weghts MW) Algorthm. It s statated wth a parameter η 1. MWD t, Q t, v t ): f D t the Output: D 1 N : D 0 1 for all x X. ed f f v t < Q t D t ) the Let r t Q t else Let r t 1 Q t.e. for all, r t ) 1 Q t )) ed f Update: For all [] Let Output D t+1. ˆD t+1 exp ηr t x )) D t D t+1 ˆD t+1 j1 Lets thk about what the MW algorthm s tryg to do. Recall that the meda mechasm attempted to mata a dstrbuto over databases cosstet wth queres see so far. The MW mechasm, o the other had, s matag a explct probablty dstrbuto over the data uverse. Ths wll tur out to be suffcet for aswerg lear queres, ad as a result, the algorthm wll be more effcet. Why a probablty dstrbuto? It turs out that for lear queres, we ca thk of databases as equvalet to dstrbutos over the data uverse. Recall that for a database D N, ad a lear query Q [0, 1], we defed QD) 1 Q, D, where D 1. Suppose we cosder a ormalzed verso of our database, ˆD R X, where ˆD[] D[]/. Note that we have X ˆD[] 1:.e. ˆD s a probablty dstrbuto over X. We also have ˆD t+1 j Q ˆD) Q, ˆD 1 Q, D QD).e. ormalzg D to be a probablty dstrbuto does ot chage the value of ay lear query. We may therefore wthout loss of geeralty reaso about D as f t s a probablty dstrbuto. The MW algorthm seeks to lear the probablty dstrbuto D, as t s reflected the aswers to a set of lear queres. The strategy to aalyze the MW algorthm wll be to keep track of a potetal fucto Ψ measurg the smlarty betwee the hypothess database D t at tme t, ad the true database D. We wll show: 9-2

1. The potetal fucto does ot start out too large. 2. The potetal fucto decreases by a sgfcat amout at each update roud. 3. The potetal fucto s always o-egatve. Together, these 3 facts wll force us to coclude that there caot be too may update rouds! Let us ow beg the aalyss: Theorem 6 Lettg parameter η α/2, the Multplcatve Weghts algorthm s a Bα)-database update log algorthm for Bα) α for every class of lear queres C. 2 Proof We must show that ay sequece {D t, Q t, v t )} t1,...,l wth the property that Q t D t ) Q t D) > α ad v t Q T log D) < α caot have L > α. 2 We defe our potetal fucto as follows. Recall that we here vew the database as a probablty dstrbuto.e. we assume D 1 1. Of course ths does ot requre actually modfyg the real database. The potetal fucto that we use s the relatve etropy, or KL dvergece, betwee D ad D t. We beg wth a smple fact: def Ψ t DD D t ) Proposto 7 For all t: Ψ t 0, ad Ψ 1 log. ) D[] D[] log D t [] Proof Relatve etropy KL-Dvergece) s always a o-egatve quatty, by the log-sum equalty. To see that Ψ 1 log, recall that D 1 [] 1/ for all, ad so Ψ 1 D[] log D[]). Notg that D s a probablty dstrbuto, we see that ths quatty s maxmzed whe D[1] 1 ad D[] 0 for all > 1, gvg Ψ log. We wll ow argue that at each step, the potetal fucto drops by at least α 2 /. Because the potetal begs at log, ad must always be o-egatve, we therefore kow that there ca be at most L log X /α 2 steps the database update sequece. To beg, let us see exactly how much the potetal drops at each step: Lemma 8 Proof Ψ t Ψ t+1 Ψ t Ψ t+1 η r t D t ) r t D) ) η 2 ) D[] D[] log D t D t+1 ) D[] log D t ) D[] D[] log D t+1 ) D t D[] log exp ηr t x )) log exp ηr t x ))D D t D[]ηr t x ) log exp ηr t x ))D 9-3

ηr t D) log exp ηr t x ))D ηr t D) log D1 t + η 2 ηr t x )) ηr t D) log 1 + η 2 ηr t D t ) ) η r t D t ) r t D) ) η 2 The frst equalty follows from the fact that: exp ηr t x )) 1 ηr t x ) + η 2 r t x ) 2 1 ηr t x ) + η 2 The secod equalty follows from the fact that log1 + y) y for y > 1. The rest of the proof ow follows easly. By the codtos of a database update algorthm, v t Q t D) < α. Hece, because for each t: Q t D) Q t D t ) α, we also have that Q t D) > Q t D t ) f ad oly f v t > Q t D t ). I partcular, r t Q t f Q t D t ) Q t D) α, ad r t 1 Q t f Q t D) Q t D t ) α. Therefore, by Lemma 8 ad the fact that η α/2: Fally we kow: Solvg, we fd: L Ψ t Ψ t+1 α 2 log α 2 rt D t ) r t D) ) α2 α α2 α) 2 α2 0 Ψ L Ψ 0 L α2 Ths completes the proof. log Lα2 Fally, we ca see what bouds we get by pluggg the multplcatve weghts DUA to the IC algorthm: Theorem 9 Combg the multplcatve weghts DUA ad the expoetal mechasm dstgusher, the IC algorthm s α, β)-accurate ad ɛ-dfferetally prvate for: log X log C )1/3 α Õ β ɛ ad ɛ, δ)-dfferetally prvate for: α Õ log X log 1/δ) 1/ log C ) β )1/2 ɛ Lets coclude by apprecatg the magc that just happeed. Ulke the meda mechasm or the et mechasm, the multplcatve weghts mechasm dd ot start wth ay baked formato about the class of queres t was gog to aswer such as the form of a et). I fact, the exstece of the multplcatve weghts mechasm gves aother, urelated proof that lear queres have small ets! Recall that we already proved va samplg argumets that ay set of lear queres C has a et of sze X log C /α2, by argug that for every database, there s aother database of sze oly log C /α 2 that agrees wth t up to ±α) o ay set of lear queres C. What has the multplcatve weghts mechasm show? It has show that for ay set of C lear queres, we ca represet all of the aswers up to ±α) by a sequece of queres from C formg a 9-

database update sequece of legth log X /α 2. How may such sequeces of queres are there from C? Exactly C log X /α2 log C /α2. But ths s exactly equal to X that s, the MW mechasm proves the exstece of the same sze et for lear queres! Ths et s dual to the oe we already demostrated: rather tha beg a collecto of databases, t s a collecto of query sequeces! Yet the et s the same sze. Bblographc Iformato The Multplcatve Weghts Mechasm was gve by Hardt ad Rothblum, A Multplcatve Weghts Mechasm for Prvacy Preservg Data Aalyss, 2010. 9-5