Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk

Similar documents
Efficient Sketches for Earth-Mover Distance, with Applications

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Lecture 6 September 13, 2016

Lecture 3 Frequency Moments, Heavy Hitters

Heavy Hitters. Piotr Indyk MIT. Lecture 4

Data Streams & Communication Complexity

Improved Concentration Bounds for Count-Sketch

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Overcoming the l 1 Non-Embeddability Barrier: Algorithms for Product Metrics

Rademacher-Sketch: A Dimensionality-Reducing Embedding for Sum-Product Norms, with an Application to Earth-Mover Distance

Some notes on streaming algorithms continued

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

B669 Sublinear Algorithms for Big Data

Combining geometry and combinatorics

Data Stream Methods. Graham Cormode S. Muthukrishnan

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

CS 229r Notes. Sam Elder. November 26, 2013

Wavelet decomposition of data streams. by Dragana Veljkovic

Computing the Entropy of a Stream

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

Eigenvalues of a matrix in the streaming model

Optimal compression of approximate Euclidean distances

Declaring Independence via the Sketching of Sketches. Until August Hire Me!

Fast Dimension Reduction

Sketching and Streaming for Distributions

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT

An Algorithmist s Toolkit Nov. 10, Lecture 17

1-Pass Relative-Error L p -Sampling with Applications

Lecture 2 Sept. 8, 2015

Approximate Pattern Matching and the Query Complexity of Edit Distance

Randomized Algorithms

6.842 Randomness and Computation March 3, Lecture 8

Lecture 3 Sept. 4, 2014

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard,

Lecture 16 Oct. 26, 2017

25.2 Last Time: Matrix Multiplication in Streaming Model

The space complexity of approximating the frequency moments

Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion

Testing that distributions are close

Reconstructing Strings from Random Traces

Improved Algorithms for Distributed Entropy Monitoring

Lecture 01 August 31, 2017

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Part 1: Hashing and Its Many Applications

Convergence of Random Processes

Lecture Lecture 9 October 1, 2015

Tabulation-Based Hashing

Sample Optimal Fourier Sampling in Any Constant Dimension

Lower bounds for Edit Distance and Product Metrics via Poincaré-Type Inequalities

Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas

The Count-Min-Sketch and its Applications

Approximate counting: count-min data structure. Problem definition

Revisiting Frequency Moment Estimation in Random Order Streams

Sparse Fourier Transform (lecture 1)

Lecture 14 October 16, 2014

Sparser Johnson-Lindenstrauss Transforms

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Lecture 1 September 3, 2013

An Optimal Algorithm for Large Frequency Moments Using O(n 1 2/k ) Bits

Polynomials, quantum query complexity, and Grothendieck s inequality

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Big Data. Big data arises in many forms: Common themes:

8.1 Concentration inequality for Gaussian random matrix (cont d)

Locality Sensitive Hashing

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Topics in Approximation Algorithms Solution for Homework 3

Low Distortion Embedding from Edit to Hamming Distance using Coupling

Fast Moment Estimation in Data Streams in Optimal Space

Linear Algebra II. 2 Matrices. Notes 2 21st October Matrix algebra

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Data stream algorithms via expander graphs

Review of Linear Algebra

APPROXIMATE SPARSE RECOVERY: OPTIMIZING TIME AND MEASUREMENTS

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Classic Network Measurements meets Software Defined Networking

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan

Assignment 4: Solutions

Evaluating Determinants by Row Reduction

IITM-CS6845: Theory Toolkit February 3, 2012

BALANCING GAUSSIAN VECTORS. 1. Introduction

Lecture 4: Hashing and Streaming Algorithms

A Tutorial on Matrix Approximation by Row Sampling

Signal Recovery from Permuted Observations

Sparse Recovery Using Sparse (Random) Matrices

Estimating Dominance Norms of Multiple Data Streams

Oblivious String Embeddings and Edit Distance Approximations

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

Mathematical Statistical Physics Solution Sketch of the math part of the Exam

CR-precis: A deterministic summary structure for update data streams

Dimension Reduction in Kernel Spaces from Locality-Sensitive Hashing

Matrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via.

Sketching Techniques for Collaborative Filtering

Efficient Sketches for the Set Query Problem

Transcription:

Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology, cambridge, ma 02139 usa www.csail.mit.edu

Block Heavy Hitters Alexandr Andoni Khanh Do Ba April 11, 2008 Piotr Indyk Abstract We study a natural generalization of the heavy hitters problem in the streaming context. We term this generalization block heavy hitters and define it as follows. We are to stream over a matrix A, and report all rows that are heavy, where a row is heavy if its l 1 -norm is at least φ fraction of the l 1 norm of the entire matrix A. In comparison, in the standard heavy hitters problem, we are required to report the matrix entries that are heavy. As is common in streaming, we solve the problem approximately: we return all rows with weight at least φ, but also possibly some other rows that have weight no less than (1 ɛ)φ. To solve the block heavy hitters problem, we show how to construct a linear sketch of A from which we can recover the heavy rows of A. The block heavy hitters problem has already found applications for other streaming problems. In particular, it is a crucial building block in a streaming algorithm of [AIK08] that constructs a small-size sketch for the Ulam metric, a metric on non-repetitive strings under the edit (Levenshtein) distance. We prove the following theorem. Let M n,m be the set of real matrices A of size n by m, with entries from E = 1 nm {0, 1,... nm}. For a matrix A, let A i denote its i th row. Theorem 0.1. Fix some ɛ > 0, and n, m 1, and φ [0, 1]. There exists a randomized linear map (sketch) µ : M n,m {0, 1} s, where s = O( 1 log n), such that the following holds. For a matrix ɛ 5 φ 2 A M n,m, it is possible, given µ(a), to find a set W [n] of rows such that, with probability at least 1 1/n, we have: for any i W, A i 1 A 1 (1 ɛ)φ and if A i 1 A 1 φ, then i W. Moreover, µ can be of the form µ(a) = µ ((A 1 ), (A 2 ),... (A n )), where : E m R k and µ : R kn {0, 1} s are randomized linear mappings. That is, the sketch µ is obtained by first sketching the rows of A (using the same function ) and then sketching those sketches. Our construction is inspired by the CountMin sketch of [CM05], and may be seen as a CountMin sketch on the projections of the rows of A. Proof. Construction of the sketch. We define the function as an l 1 projection into a space with k = O( 1 ɛ 2 log n) dimensions, achieved through a standard Cauchy distribution projection. 1

Namely, the function is determined by k vectors c 1,... c k R m, with coordinates chosen iid from the Cauchy distribution with pdf f(x) = 1 1 π. Then ( x), for some x E m, is given by 1+x 2 ( x) = ( c 1 x, c 2 x,... c k x). The function µ takes as input (A 1 ),... (A n ), and produces k hash tables, each having l = O( 1 ɛ 2 φ ) cells. The jth cell of the ith hash table H (i), for j [l], is given by See Figure 1 for an illustration. H (i) j = q:h i (q)=j [(A q )] i. n m A 1 A n k (A 1 ) (A n ) h 1 h k µ l Figure 1: Illustration of µ as a double sketch. Reconstruction. Given a sketch µ(a) = µ ((A 1 ),.(.. (A n )), we construct the desired set W as follows. For each w [n], consider the vector r w = H (i) )i [k]. Then w is included in W iff median( r w ) > (1 ɛ/2)φ. In words, for any block w we consider the cell of a hash table H (i) into which w falls (one for each i). If the majority of these cells contain a value greater or equal to (1 ɛ/2)φ (in magnitude), then w is included in W. Sketch size. As described, the sketch µ(a) = µ ((A 1 ),... (A n )) consists of k l = O( 1 log n) ɛ 4 φ real numbers. We note that, by usual arguments, it is enough to store all the real numbers up to precision O(ɛφ) and cut off when the absolute value is beyond a constant such as 2. The resulting size of the sketch (in bits) is s = O( 1 ɛ 5 φ 2 log n). Analysis of correctness. We proceed to proving that the set W satisfies the desired properties. Since our sketches are linear, we assume without loss of generality that A 1 = 1. 2

First, consider any w such that A w 1 φ. We would like to prove that w W w.h.p. For this purpose, it is sufficient to prove that, for fixed i [k], we have that H (i) > (1 ɛ/2)φ with probability 1/2 + Ω(ɛ). Then, a standard application of the Chernoff bound will imply that median( r w ) > (1 ɛ/2)φ w.h.p. So fix some i [k], and consider the cell of the hash table H (i). Let χ[e] denote the indicator variable of an event E. The mass that falls into the cell is equal to the following quantity: H (i) = [(A w )] i + [(A j )] i χ[h i (j) = ] = c i A w + c i = c i A w + A j χ[h i (j) = h w (j)] A j χ[h i (j) = h w (j)]. ( ) Now, consider the vector z = A j χ[h i (j) = h w (j)]. The expected norm of z is at most E hi [ z 1 ] 1 A j 1 1/l = O(ɛ 2 φ). l By Markov s inequality, with probability at least 1 O(ɛ), we have z 1 ɛφ/4 and thus A w + z 1 (1 ɛ/4)φ. It follows that the random variable (A w + z) c i has a Cauchy distribution with median A w + z 1 (1 ɛ/4)φ. By standard properties of Cauchy distributions we have H (i) (1 ɛ/4) (1 ɛ/4)φ > (1 ɛ/2)φ with probability at least (1/2 + Ω(ɛ))(1 O(ɛ)) = 1/2 + Ω(ɛ). Next we prove that if A w 1 (1 ɛ)φ, then w W w.h.p. As above, we just need to < (1 ɛ/2)φ with probability 1/2 + Ω(ɛ). We again consider the vector ( ) z = A j χ[h i (j) = h j (w)], and similarly deduce that, with probability at least 1 O(ɛ), prove that H (i) we have z 1 ɛφ/4 and thus A w + z 1 (1 3 4ɛ)φ. Again by standard properties of Cauchy distributions, we conclude that H (i) (1 + ɛ/4) (1 3 4ɛ)φ < (1 ɛ/2)φ with probability at least (1/2 + Ω(ɛ))(1 O(ɛ)) = 1/2 + Ω(ɛ). References [AIK08] Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. Overcoming the l 1 nonembeddability barrier: Algorithms for product metrics. Manuscript, 2008. [CM05] G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58 75, 2005. 3