CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme: Defto 1 Database Update Sequece) Let D N be ay database ad let {D t, Q t, v t )} t1,...,l D C R) L be a sequece of tuples. We say the sequece s a U, D, C, α, L)- database update sequece f t satsfes the followg propertes: 1. D 1 U,, ), 2. for every t 1, 2,..., L, Q t D) Q t D t ) α, 3. for every t 1, 2,..., L, Q t D) v t < α,. ad for every t 1, 2,..., L 1, D t+1 UD t, Q t, v t ). Defto 2 Database Update Algorthm DUA)) Let U : D C R D be a update rule ad let B : R R be a fucto. We say U s a Bα)-DUA for query class C f for every database D N, every U, D, C, α, L)-database update sequece satsfes L Bα). Usg the expoetal mechasm as a dstgusher, we proved the followg utlty theorem about the IC mechasm: Theorem 3 Gve a Bα)-DUA, the Iteratve Costructo mechasm s α, β) accurate ad ɛ-dfferetally prvate for: α 8Bα/2) ɛ ad ɛ, δ)-dfferetally prvate for: so log as β/2bα/2)). α 16 Bα/2) log1/δ) ɛ We plugged the Meda Mechasm, based o the exstece of small ets, to get: Pluggg ths to the IC mechasm, we get: Theorem Istatated wth the meda mecham, the Iteratve Costructo mechasm s α, β) accurate ad ɛ-dfferetally prvate for: 32 log log C α ɛα 2 log log 2 α Õ C log1/β) ɛ )1/3 ad ɛ, δ)-dfferetally prvate for: α 32 log log C log1/δ) ɛα log log 3 α Õ C log1/δ) log 2 1/β)) 1/ ) ɛ 9-1
We ow gve a more sophstcated database update algorthm for lear queres. It wll work by matag a dstrbuto ˆD t over the data uverse X. A lear query s a atural geeralzato of a coutg query, whch we cosdered earler. Although ths ew mechasm wll oly apply to lear queres The meda mechasm worked for geerc classes of queres), t wll have sgfcatly mproved rug tme, ad slghtly) mproved accuracy. Defto 5 A lear query s a vector Q [0, 1] X, evaluated as QD) 1 Q, D. Equvaletly, we ca vew Q as a fucto Q : X [0, 1], ad evaluate: QD) x Qx X D ) Qx ) D[] Algorthm 1 The Multplcatve Weghts MW) Algorthm. It s statated wth a parameter η 1. MWD t, Q t, v t ): f D t the Output: D 1 N : D 0 1 for all x X. ed f f v t < Q t D t ) the Let r t Q t else Let r t 1 Q t.e. for all, r t ) 1 Q t )) ed f Update: For all [] Let Output D t+1. ˆD t+1 exp ηr t x )) D t D t+1 ˆD t+1 j1 Lets thk about what the MW algorthm s tryg to do. Recall that the meda mechasm attempted to mata a dstrbuto over databases cosstet wth queres see so far. The MW mechasm, o the other had, s matag a explct probablty dstrbuto over the data uverse. Ths wll tur out to be suffcet for aswerg lear queres, ad as a result, the algorthm wll be more effcet. Why a probablty dstrbuto? It turs out that for lear queres, we ca thk of databases as equvalet to dstrbutos over the data uverse. Recall that for a database D N, ad a lear query Q [0, 1], we defed QD) 1 Q, D, where D 1. Suppose we cosder a ormalzed verso of our database, ˆD R X, where ˆD[] D[]/. Note that we have X ˆD[] 1:.e. ˆD s a probablty dstrbuto over X. We also have ˆD t+1 j Q ˆD) Q, ˆD 1 Q, D QD).e. ormalzg D to be a probablty dstrbuto does ot chage the value of ay lear query. We may therefore wthout loss of geeralty reaso about D as f t s a probablty dstrbuto. The MW algorthm seeks to lear the probablty dstrbuto D, as t s reflected the aswers to a set of lear queres. The strategy to aalyze the MW algorthm wll be to keep track of a potetal fucto Ψ measurg the smlarty betwee the hypothess database D t at tme t, ad the true database D. We wll show: 9-2
1. The potetal fucto does ot start out too large. 2. The potetal fucto decreases by a sgfcat amout at each update roud. 3. The potetal fucto s always o-egatve. Together, these 3 facts wll force us to coclude that there caot be too may update rouds! Let us ow beg the aalyss: Theorem 6 Lettg parameter η α/2, the Multplcatve Weghts algorthm s a Bα)-database update log algorthm for Bα) α for every class of lear queres C. 2 Proof We must show that ay sequece {D t, Q t, v t )} t1,...,l wth the property that Q t D t ) Q t D) > α ad v t Q T log D) < α caot have L > α. 2 We defe our potetal fucto as follows. Recall that we here vew the database as a probablty dstrbuto.e. we assume D 1 1. Of course ths does ot requre actually modfyg the real database. The potetal fucto that we use s the relatve etropy, or KL dvergece, betwee D ad D t. We beg wth a smple fact: def Ψ t DD D t ) Proposto 7 For all t: Ψ t 0, ad Ψ 1 log. ) D[] D[] log D t [] Proof Relatve etropy KL-Dvergece) s always a o-egatve quatty, by the log-sum equalty. To see that Ψ 1 log, recall that D 1 [] 1/ for all, ad so Ψ 1 D[] log D[]). Notg that D s a probablty dstrbuto, we see that ths quatty s maxmzed whe D[1] 1 ad D[] 0 for all > 1, gvg Ψ log. We wll ow argue that at each step, the potetal fucto drops by at least α 2 /. Because the potetal begs at log, ad must always be o-egatve, we therefore kow that there ca be at most L log X /α 2 steps the database update sequece. To beg, let us see exactly how much the potetal drops at each step: Lemma 8 Proof Ψ t Ψ t+1 Ψ t Ψ t+1 η r t D t ) r t D) ) η 2 ) D[] D[] log D t D t+1 ) D[] log D t ) D[] D[] log D t+1 ) D t D[] log exp ηr t x )) log exp ηr t x ))D D t D[]ηr t x ) log exp ηr t x ))D 9-3
ηr t D) log exp ηr t x ))D ηr t D) log D1 t + η 2 ηr t x )) ηr t D) log 1 + η 2 ηr t D t ) ) η r t D t ) r t D) ) η 2 The frst equalty follows from the fact that: exp ηr t x )) 1 ηr t x ) + η 2 r t x ) 2 1 ηr t x ) + η 2 The secod equalty follows from the fact that log1 + y) y for y > 1. The rest of the proof ow follows easly. By the codtos of a database update algorthm, v t Q t D) < α. Hece, because for each t: Q t D) Q t D t ) α, we also have that Q t D) > Q t D t ) f ad oly f v t > Q t D t ). I partcular, r t Q t f Q t D t ) Q t D) α, ad r t 1 Q t f Q t D) Q t D t ) α. Therefore, by Lemma 8 ad the fact that η α/2: Fally we kow: Solvg, we fd: L Ψ t Ψ t+1 α 2 log α 2 rt D t ) r t D) ) α2 α α2 α) 2 α2 0 Ψ L Ψ 0 L α2 Ths completes the proof. log Lα2 Fally, we ca see what bouds we get by pluggg the multplcatve weghts DUA to the IC algorthm: Theorem 9 Combg the multplcatve weghts DUA ad the expoetal mechasm dstgusher, the IC algorthm s α, β)-accurate ad ɛ-dfferetally prvate for: log X log C )1/3 α Õ β ɛ ad ɛ, δ)-dfferetally prvate for: α Õ log X log 1/δ) 1/ log C ) β )1/2 ɛ Lets coclude by apprecatg the magc that just happeed. Ulke the meda mechasm or the et mechasm, the multplcatve weghts mechasm dd ot start wth ay baked formato about the class of queres t was gog to aswer such as the form of a et). I fact, the exstece of the multplcatve weghts mechasm gves aother, urelated proof that lear queres have small ets! Recall that we already proved va samplg argumets that ay set of lear queres C has a et of sze X log C /α2, by argug that for every database, there s aother database of sze oly log C /α 2 that agrees wth t up to ±α) o ay set of lear queres C. What has the multplcatve weghts mechasm show? It has show that for ay set of C lear queres, we ca represet all of the aswers up to ±α) by a sequece of queres from C formg a 9-
database update sequece of legth log X /α 2. How may such sequeces of queres are there from C? Exactly C log X /α2 log C /α2. But ths s exactly equal to X that s, the MW mechasm proves the exstece of the same sze et for lear queres! Ths et s dual to the oe we already demostrated: rather tha beg a collecto of databases, t s a collecto of query sequeces! Yet the et s the same sze. Bblographc Iformato The Multplcatve Weghts Mechasm was gve by Hardt ad Rothblum, A Multplcatve Weghts Mechasm for Prvacy Preservg Data Aalyss, 2010. 9-5