CS 229r: Algorthms for Bg Data Fall 205 Prof. Jelan Nelson Lecture 5 September 7, 205 Scrbe: Yakr Reshef Recap and overvew Last tme we dscussed the problem of norm estmaton for p-norms wth p > 2. We had descrbed an algorthm by [Andon2] that, gven x R n updated under a turnstle model, approxmates x p wth constant multplcatve error. The algorthm generates two random matrces P R m n (wth m n) and D R n n. P s sampled so that each of ts columns contans all zeros except for one entry, whch contans a random sgn. D s a dagonal matrx whose -th dagonal entry s u /p where the u are..d. exponental random varables. The algorthm then mantans y P Dx, and ts output s y max y. In ths lecture we wll complete the proof of correctness of ths algorthm and then move on from p-norm estmaton to other problems related to lnear sketchng. 2 Completng the proof of correctness From last tme we have the followng clam. Clam. Let Z DX. Then P 2 x p Z 2 x p 3/4 Ths clam establshes that f we could mantan Z nstead of y then we would have a good soluton to our problem. Remember though that we can t store Z n memory because t s n-dmensonal and n m. That s why we need to analyze P Z R m. 2. Overvew of analyss of y P Z The dea behnd our analyss of y P Z s as follows: each entry n y s a sort of counter. And the matrx P takes each entry n Z, hashes t to a perfectly random counter, and adds that entry of Z tmes a random sgn to that counter. Snce n > m and there are only m counters, there wll be collsons, and these wll cause dfferent Z to potentally cancel each other out or add together n a way that one mght expect to cause problems. We ll get around ths by showng that there are very few large Z s, so few relatve to m that wth hgh probablty none of them wll collde wth each other. We stll need to worry, because small Z s and bg Z s mght collde wth each other. But remember that when we add the small Z s, we multply them wth a random sgn. So the expectaton of the aggregate contrbutons of the small Z s to each bucket s 0. We ll bound ther varance as well,
whch wll show that f they collde wth bg Z s then wth hgh probablty ths won t substantally change the relevant counter. All of ths together wll show that the maxmal counter value (.e., y ) s close to the maxmal Z and therefore to x p wth hgh probablty 2.2 Analyss of y P Z We make the followng defntons. Let T x p. Defne the heavy ndces as H {j : Z j T/(v lg(n))}. Thnk of c as bg. We ll set t later. Defne the lght ndces as L [n]\h. 2.2. Analyzng the heavy ndces We begn by showng that there wll not be many heavy ndces. Clam 2. For any l > 0, we have E ({ [n] : Z > T }) < l p l Before we prove ths clam, let s reflect: f l v lg(n) then we get polylog(n) heavy ndces, whch s mnscule compared to the m O(n 2/p ln(n)) counters. Brthday paradox-type reasonng wll then translate ths bound nto the dea that wth hgh probablty there wll not be collsons between bg Z j. Proof. Let Q { z > T/l 0 else so that the number of ndces wth Z > T/l equals Q. We then have E Q E(Q ) ( ) P x /u /p > T/l P (u < x p l p /T p ) ( e x p l p /T p ) (u exponentally dstrbuted) l p x p /T p ( + x e x for x R) l p x p x p p T p whch completes the proof. 2
2.2.2 Recallng Bernsten s nequalty To analyze the lght ndces, we ll need to recall Bernsten s nequalty. Theorem (Bernsten s nequalty). Suppose R,..., R n are ndependent, and for all, R K, and var( R ) σ 2. Then for all t > 0 ( ) P R E R > t e ct2 /σ 2 + e ct/k 2.2.3 Analyzng the lght ndces We now establsh that the lght ndces together wll not dstort any of the heavy ndces by too much. Before we wrte down our specfc clam, let s parametrze P as follows. We have a functon h : [n] [m] as well as a functon σ : [n] {, } that were both chosen at random. (One can show that these can be chosen to be k-wse ndependent hash functons, but we won t do so n ths lecture.) We then wrte { σ(j) f h(j) P j 0 else. So essentally, h tells us whch element of the column to make non-zero, and σ tells us whch sgn to use for column j. We can now wrte our clam about the lght ndces. Clam 3. It holds wth constant probablty that for all j [m], σ(j)z j < T/0. j L:h(j) Let us see how ths clam completes our argument. It means that If y ddn t get any heavy ndces then the magntude of y s much less than T, so t won t nterfere wth our estmate. If y got assgned the maxmal Z j, then by our prevous clam that s the only heavy ndex assgned to y. In that case, ths clam means that all the lght ndces assgned to y won t change t by more than T/0, and snce Z j s wthn a factor of 2 of T, y wll stll be wthn a constant multplcatve factor of T. If y got assgned some other heavy ndex, then the correspondng Z j s by defnton s less than 2T snce t s less than the maxmal Z j. In that case, ths clam agan tells us that y wll be at most 2.T. To put ths more formally: y σ(j)z j j:h(j) σ ( j)z j + σ(j heavy )Z jheavy j L:h(j) 3
where the second term s added only f y got some heavy ndex, n whch case we can assume t receved at most one. The trangle nequalty then mples that y Z jheavy ± σ ( j)z j j L:h(j) Z jheavy ± T/0 Applyng ths to the bucket that got the maxmal z then gves that that bucket of y should contan at least 0.4T. And applyng ths to all other buckets gves that they should contan at most 2.T. Let us now prove the clam. Proof of Clam 3. Fx [m]. We use Bernsten on the sum n queston. For j L, defne { f h(j) δ j 0 else. Then the sum we seek to bound equals δ j σ(j)z j j L We wll call the j-th term of the summand R j and then use Bernsten s nequalty. The brunt of the proof wll be computng the relevant quanttes to see what the nequalty gves us. Frst, the easy ones:. We have E( R j ) 0, snce the σ(j) represent random sgns. 2. We also have K T/(v lg(n)) snce δ j, σ(j), and we only terate over lght ndces so Z j T/(v lg(n)). It remans only to compute σ 2 var( j R j). If we condton on Z, then a problem from problem set mples that var j R j Z Z 2 2 m Ths sn t enough of course: we need to get somethng that takes the randomness of Z nto account. However, nstead of computng the uncondtonal varance of our sum, we wll prove that σ 2 s small wth hgh probablty over the choce of Z. We ll do ths by computng the uncondtonal expectaton of σ 2 and then usng Markov. We wrte ) E ( Z 2 2 j x 2 je u 2/p j 4
and E ( u 2/p j ) 0 0 0 e x (x 2/p )dx e x (x 2/p )dx + x 2/p dx + e x (x 2/p )dx e x dx. (trval bounds on e x and x 2/p ) The second ntegral trvally converges, and the former one converges because p > 2. Ths gves that E( Z 2 ) O( x 2 2) whch gves that wth hgh probablty we wll have σ 2 O( x 2 2 )/m. To use Bernsten s nequalty, we ll want to relate ths bound on σ 2, whch s currently stated n terms of x 2, to a bound n terms of x p. We wll do ths usng a standard argument based on Hölder s nequalty, whch we re-state wthout proof below. Theorem 2 (Hölder s nequalty). Let f, g R n. Then for any a, b satsfyng /a + /b. f g f a g b Settng f x 2, g, a p/2, b /( a) then gves x 2 f g 2/p ( (x 2 ) p/2 x 2 p n 2/p /( 2/p) ) 2/p (Hölder) Usng the fact that we chose m to Θ(n 2/p lg(n)), we can then obtan the followng bound on σ 2 wth hgh probablty. x σ 2 2 O 2 m T 2 n 2/p O (Hölder trck) m T 2 n 2/p O n 2/p (choce of m) lg n T 2 O lg(n) 5
We now need to apply Bernsten s nequalty and show that t gves us the desred result. Intally, the nequalty gves us the followng guarantee. ( ) P R > T/0 e ct 2 /00 O(lg(n)/T 2) ct/0 (v lg(n)/t ) + e e C lg(n) (for some new constant C) n C So the probablty that the nose at most T/0 can be made poly n. But there are at most n buckets, whch means that a unon bound gves us that wth at least constant probablty all of the lght ndex contrbutons are are at most T/0. 3 Wrap-up Thus far we presented algorthms for p-norm estmaton for p 2, p 2, and p > 2 separately. (Of course, the p 2 can be used for p 2 as well.) We notced that at p 2 there seems to be a crtcal pont above whch we appeared to need a dfferent algorthm. Later n the course we ll see that there are space lower-bounds that say that once p > 2 we really do need as much space as the algorthm we presented for p > 2 requred. We conclude our current treatment of norm estmaton and approxmate countng by brefly notng some motvatng applcatons for these problems. For example, dstnct elements s used n SQL to effcently count dstnct entres n some column of a data table. It s also used n network anomaly detecton to, say, track the rate at whch a worm s spreadng: you run dstnct elements on a router to count how many dstnct enttes are sendng packets wth the worm sgnature through your router. Another example s: how many dstnct people vsted a webste? For more general moment estmaton, there are other motvatng examples as well. Imagne x s the number of packets sent to IP address. Estmatng x would gve an approxmaton to the hghest load experenced by any server. Of course, as we just mentoned, x s dffcult to approxmate n small space, so n practce people settle for the closest possble norm to the -norm, whch s the 2-norm. And they do n fact use the 2-norm algorthm developed n the problem set for ths task. 4 Some setup for next tme Next tme we ll talk about two new, related problems that stes lke Google trends solve. They are called the heavy htters problem and the pont query problem. In Pont Query, we re gven some x R n updated n a turnstle model, wth n large. (You mght magne, for nstance, that x has a coordnate for each strng your search engne could see and x s the number of tmes you ve seen strng.) We seek a functon query() that, for [n], returns a value n x ± ε x. In Heavy Htters, we have the same x but we seek to compute a set L [n] such that. x ε x L 2. x < ε 2 x / L 6
As an observaton: f we can solve Pont Query wth bounded space then we can solve Heavy Htters wth bounded space as well (though not necessarly effcent run-tme). To do ths, we just run Pont Query wth ε/0 on each [n] and output the set of ndces for whch we had large estmates of x. 4. Determnstc soluton to Pont Query Let us begn a more detaled dscusson of Pont Query. We begn by defnng an ncoherent matrx. Defnton. Π R m n s ε-ncoherent f. For all, Π 2 2. For all j, Π, Π j ε. We also defne a related object: a code. Defnton 2. An (ε, t, q, N)-code s a set C {C,..., C N } [q] t such that for all j, (C, C j ) ( ε)t, where ndcates Hammng dstance. The key property of a code can be summarzed verbally: any two dstnct words n the code agree n at most εt entres. There s a relatonshp between ncoherent matrces and codes. Clam 4. Exstence of an (ε, t, q, n)-code mples exstence of an ε-ncoherent Π wth m qt. Proof. We construct Π from C. We have a column of Π for each C C, and we break each column vector nto t blocks, each of sze q. Then, the j-th block contans bnary strng of length q whose a-th bt s f the j-th element of C s a and 0 otherwse. Scalng the whole matrx by / t gves the desred result. We ll start next tme by showng the followng two clams. Clam 5 (to be shown next tme). Gven an ε-ncoherent matrx, we can create a lnear sketch to solve Pont Query. Clam 6 (shown next tme). A random code wth q O(/ε) and t O( ε log N) s an (ε, t, q, N)- code. References [Andon2] Alexandr Andon. Hgh frequency moments va max-stablty. Manuscrpt, 202. http: //web.mt.edu/andon/www/papers/fkstable.pdf 7