Lecture 5 September 17, 2015

Similar documents
Lecture 4: Universal Hash Functions/Streaming Cont d

Notes on Frequency Estimation in Data Streams

18.1 Introduction and Recap

Lecture 3. Ax x i a i. i i

Errors for Linear Systems

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

Eigenvalues of Random Graphs

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Lecture 12: Discrete Laplacian

Problem Set 9 Solutions

Week 2. This week, we covered operations on sets and cardinality.

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Lecture 3 January 31, 2017

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Kernel Methods and SVMs Extension

Lecture 3: Shannon s Theorem

Low correlation tensor decomposition via entropy maximization

Online Classification: Perceptron and Winnow

Math 217 Fall 2013 Homework 2 Solutions

Lecture 4: November 17, Part 1 Single Buffer Management

10-701/ Machine Learning, Fall 2005 Homework 3

Lecture 4: September 12

Lecture 10 Support Vector Machines II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Week 5: Neural Networks

PHYS 705: Classical Mechanics. Calculus of Variations II

Singular Value Decomposition: Theory and Applications

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Basically, if you have a dummy dependent variable you will be estimating a probability.

1 Matrix representations of canonical matrices

Introduction to Algorithms

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Vapnik-Chervonenkis theory

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Expected Value and Variance

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Feature Selection: Part 1

Randomness and Computation

From Biot-Savart Law to Divergence of B (1)

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Report on Image warping

find (x): given element x, return the canonical element of the set containing x;

Lecture 21: Numerical methods for pricing American type derivatives

Communication Complexity 16:198: February Lecture 4. x ij y ij

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Mining Data Streams-Estimating Frequency Moment

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Differentiating Gaussian Processes

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Module 9. Lecture 6. Duality in Assignment Problems

Finding Dense Subgraphs in G(n, 1/2)

Exercises. 18 Algorithms

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Min Cut, Fast Cut, Polynomial Identities

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

CHAPTER 17 Amortized Analysis

Chapter Newton s Method

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Sketching Sampled Data Streams

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Excess Error, Approximation Error, and Estimation Error

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

a b a In case b 0, a being divisible by b is the same as to say that

1 Convex Optimization

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Linear Approximation with Regularization and Moving Least Squares

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Convergence of random processes

Dimensionality Reduction Notes 1

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

Dimensionality Reduction Notes 2

Tracking with Kalman Filter

Homework Assignment 3 Due in class, Thursday October 15

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm

Lecture Space-Bounded Derandomization

3.1 ML and Empirical Distribution

APPENDIX A Some Linear Algebra

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Generalized Linear Methods

CSCE 790S Background Results

More metrics on cartesian products

Probability Theory (revisited)

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Transcription:

CS 229r: Algorthms for Bg Data Fall 205 Prof. Jelan Nelson Lecture 5 September 7, 205 Scrbe: Yakr Reshef Recap and overvew Last tme we dscussed the problem of norm estmaton for p-norms wth p > 2. We had descrbed an algorthm by [Andon2] that, gven x R n updated under a turnstle model, approxmates x p wth constant multplcatve error. The algorthm generates two random matrces P R m n (wth m n) and D R n n. P s sampled so that each of ts columns contans all zeros except for one entry, whch contans a random sgn. D s a dagonal matrx whose -th dagonal entry s u /p where the u are..d. exponental random varables. The algorthm then mantans y P Dx, and ts output s y max y. In ths lecture we wll complete the proof of correctness of ths algorthm and then move on from p-norm estmaton to other problems related to lnear sketchng. 2 Completng the proof of correctness From last tme we have the followng clam. Clam. Let Z DX. Then P 2 x p Z 2 x p 3/4 Ths clam establshes that f we could mantan Z nstead of y then we would have a good soluton to our problem. Remember though that we can t store Z n memory because t s n-dmensonal and n m. That s why we need to analyze P Z R m. 2. Overvew of analyss of y P Z The dea behnd our analyss of y P Z s as follows: each entry n y s a sort of counter. And the matrx P takes each entry n Z, hashes t to a perfectly random counter, and adds that entry of Z tmes a random sgn to that counter. Snce n > m and there are only m counters, there wll be collsons, and these wll cause dfferent Z to potentally cancel each other out or add together n a way that one mght expect to cause problems. We ll get around ths by showng that there are very few large Z s, so few relatve to m that wth hgh probablty none of them wll collde wth each other. We stll need to worry, because small Z s and bg Z s mght collde wth each other. But remember that when we add the small Z s, we multply them wth a random sgn. So the expectaton of the aggregate contrbutons of the small Z s to each bucket s 0. We ll bound ther varance as well,

whch wll show that f they collde wth bg Z s then wth hgh probablty ths won t substantally change the relevant counter. All of ths together wll show that the maxmal counter value (.e., y ) s close to the maxmal Z and therefore to x p wth hgh probablty 2.2 Analyss of y P Z We make the followng defntons. Let T x p. Defne the heavy ndces as H {j : Z j T/(v lg(n))}. Thnk of c as bg. We ll set t later. Defne the lght ndces as L [n]\h. 2.2. Analyzng the heavy ndces We begn by showng that there wll not be many heavy ndces. Clam 2. For any l > 0, we have E ({ [n] : Z > T }) < l p l Before we prove ths clam, let s reflect: f l v lg(n) then we get polylog(n) heavy ndces, whch s mnscule compared to the m O(n 2/p ln(n)) counters. Brthday paradox-type reasonng wll then translate ths bound nto the dea that wth hgh probablty there wll not be collsons between bg Z j. Proof. Let Q { z > T/l 0 else so that the number of ndces wth Z > T/l equals Q. We then have E Q E(Q ) ( ) P x /u /p > T/l P (u < x p l p /T p ) ( e x p l p /T p ) (u exponentally dstrbuted) l p x p /T p ( + x e x for x R) l p x p x p p T p whch completes the proof. 2

2.2.2 Recallng Bernsten s nequalty To analyze the lght ndces, we ll need to recall Bernsten s nequalty. Theorem (Bernsten s nequalty). Suppose R,..., R n are ndependent, and for all, R K, and var( R ) σ 2. Then for all t > 0 ( ) P R E R > t e ct2 /σ 2 + e ct/k 2.2.3 Analyzng the lght ndces We now establsh that the lght ndces together wll not dstort any of the heavy ndces by too much. Before we wrte down our specfc clam, let s parametrze P as follows. We have a functon h : [n] [m] as well as a functon σ : [n] {, } that were both chosen at random. (One can show that these can be chosen to be k-wse ndependent hash functons, but we won t do so n ths lecture.) We then wrte { σ(j) f h(j) P j 0 else. So essentally, h tells us whch element of the column to make non-zero, and σ tells us whch sgn to use for column j. We can now wrte our clam about the lght ndces. Clam 3. It holds wth constant probablty that for all j [m], σ(j)z j < T/0. j L:h(j) Let us see how ths clam completes our argument. It means that If y ddn t get any heavy ndces then the magntude of y s much less than T, so t won t nterfere wth our estmate. If y got assgned the maxmal Z j, then by our prevous clam that s the only heavy ndex assgned to y. In that case, ths clam means that all the lght ndces assgned to y won t change t by more than T/0, and snce Z j s wthn a factor of 2 of T, y wll stll be wthn a constant multplcatve factor of T. If y got assgned some other heavy ndex, then the correspondng Z j s by defnton s less than 2T snce t s less than the maxmal Z j. In that case, ths clam agan tells us that y wll be at most 2.T. To put ths more formally: y σ(j)z j j:h(j) σ ( j)z j + σ(j heavy )Z jheavy j L:h(j) 3

where the second term s added only f y got some heavy ndex, n whch case we can assume t receved at most one. The trangle nequalty then mples that y Z jheavy ± σ ( j)z j j L:h(j) Z jheavy ± T/0 Applyng ths to the bucket that got the maxmal z then gves that that bucket of y should contan at least 0.4T. And applyng ths to all other buckets gves that they should contan at most 2.T. Let us now prove the clam. Proof of Clam 3. Fx [m]. We use Bernsten on the sum n queston. For j L, defne { f h(j) δ j 0 else. Then the sum we seek to bound equals δ j σ(j)z j j L We wll call the j-th term of the summand R j and then use Bernsten s nequalty. The brunt of the proof wll be computng the relevant quanttes to see what the nequalty gves us. Frst, the easy ones:. We have E( R j ) 0, snce the σ(j) represent random sgns. 2. We also have K T/(v lg(n)) snce δ j, σ(j), and we only terate over lght ndces so Z j T/(v lg(n)). It remans only to compute σ 2 var( j R j). If we condton on Z, then a problem from problem set mples that var j R j Z Z 2 2 m Ths sn t enough of course: we need to get somethng that takes the randomness of Z nto account. However, nstead of computng the uncondtonal varance of our sum, we wll prove that σ 2 s small wth hgh probablty over the choce of Z. We ll do ths by computng the uncondtonal expectaton of σ 2 and then usng Markov. We wrte ) E ( Z 2 2 j x 2 je u 2/p j 4

and E ( u 2/p j ) 0 0 0 e x (x 2/p )dx e x (x 2/p )dx + x 2/p dx + e x (x 2/p )dx e x dx. (trval bounds on e x and x 2/p ) The second ntegral trvally converges, and the former one converges because p > 2. Ths gves that E( Z 2 ) O( x 2 2) whch gves that wth hgh probablty we wll have σ 2 O( x 2 2 )/m. To use Bernsten s nequalty, we ll want to relate ths bound on σ 2, whch s currently stated n terms of x 2, to a bound n terms of x p. We wll do ths usng a standard argument based on Hölder s nequalty, whch we re-state wthout proof below. Theorem 2 (Hölder s nequalty). Let f, g R n. Then for any a, b satsfyng /a + /b. f g f a g b Settng f x 2, g, a p/2, b /( a) then gves x 2 f g 2/p ( (x 2 ) p/2 x 2 p n 2/p /( 2/p) ) 2/p (Hölder) Usng the fact that we chose m to Θ(n 2/p lg(n)), we can then obtan the followng bound on σ 2 wth hgh probablty. x σ 2 2 O 2 m T 2 n 2/p O (Hölder trck) m T 2 n 2/p O n 2/p (choce of m) lg n T 2 O lg(n) 5

We now need to apply Bernsten s nequalty and show that t gves us the desred result. Intally, the nequalty gves us the followng guarantee. ( ) P R > T/0 e ct 2 /00 O(lg(n)/T 2) ct/0 (v lg(n)/t ) + e e C lg(n) (for some new constant C) n C So the probablty that the nose at most T/0 can be made poly n. But there are at most n buckets, whch means that a unon bound gves us that wth at least constant probablty all of the lght ndex contrbutons are are at most T/0. 3 Wrap-up Thus far we presented algorthms for p-norm estmaton for p 2, p 2, and p > 2 separately. (Of course, the p 2 can be used for p 2 as well.) We notced that at p 2 there seems to be a crtcal pont above whch we appeared to need a dfferent algorthm. Later n the course we ll see that there are space lower-bounds that say that once p > 2 we really do need as much space as the algorthm we presented for p > 2 requred. We conclude our current treatment of norm estmaton and approxmate countng by brefly notng some motvatng applcatons for these problems. For example, dstnct elements s used n SQL to effcently count dstnct entres n some column of a data table. It s also used n network anomaly detecton to, say, track the rate at whch a worm s spreadng: you run dstnct elements on a router to count how many dstnct enttes are sendng packets wth the worm sgnature through your router. Another example s: how many dstnct people vsted a webste? For more general moment estmaton, there are other motvatng examples as well. Imagne x s the number of packets sent to IP address. Estmatng x would gve an approxmaton to the hghest load experenced by any server. Of course, as we just mentoned, x s dffcult to approxmate n small space, so n practce people settle for the closest possble norm to the -norm, whch s the 2-norm. And they do n fact use the 2-norm algorthm developed n the problem set for ths task. 4 Some setup for next tme Next tme we ll talk about two new, related problems that stes lke Google trends solve. They are called the heavy htters problem and the pont query problem. In Pont Query, we re gven some x R n updated n a turnstle model, wth n large. (You mght magne, for nstance, that x has a coordnate for each strng your search engne could see and x s the number of tmes you ve seen strng.) We seek a functon query() that, for [n], returns a value n x ± ε x. In Heavy Htters, we have the same x but we seek to compute a set L [n] such that. x ε x L 2. x < ε 2 x / L 6

As an observaton: f we can solve Pont Query wth bounded space then we can solve Heavy Htters wth bounded space as well (though not necessarly effcent run-tme). To do ths, we just run Pont Query wth ε/0 on each [n] and output the set of ndces for whch we had large estmates of x. 4. Determnstc soluton to Pont Query Let us begn a more detaled dscusson of Pont Query. We begn by defnng an ncoherent matrx. Defnton. Π R m n s ε-ncoherent f. For all, Π 2 2. For all j, Π, Π j ε. We also defne a related object: a code. Defnton 2. An (ε, t, q, N)-code s a set C {C,..., C N } [q] t such that for all j, (C, C j ) ( ε)t, where ndcates Hammng dstance. The key property of a code can be summarzed verbally: any two dstnct words n the code agree n at most εt entres. There s a relatonshp between ncoherent matrces and codes. Clam 4. Exstence of an (ε, t, q, n)-code mples exstence of an ε-ncoherent Π wth m qt. Proof. We construct Π from C. We have a column of Π for each C C, and we break each column vector nto t blocks, each of sze q. Then, the j-th block contans bnary strng of length q whose a-th bt s f the j-th element of C s a and 0 otherwse. Scalng the whole matrx by / t gves the desred result. We ll start next tme by showng the followng two clams. Clam 5 (to be shown next tme). Gven an ε-ncoherent matrx, we can create a lnear sketch to solve Pont Query. Clam 6 (shown next tme). A random code wth q O(/ε) and t O( ε log N) s an (ε, t, q, N)- code. References [Andon2] Alexandr Andon. Hgh frequency moments va max-stablty. Manuscrpt, 202. http: //web.mt.edu/andon/www/papers/fkstable.pdf 7