Improved Concentration Bounds for Count-Sketch

Size: px

Start display at page:

Download "Improved Concentration Bounds for Count-Sketch"

Mae Walton
6 years ago
Views:

1 Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

2 Count-Sketch: a classic streaming algorithm Charikar, Chen, Farach-Colton 2002 Solves heavy hitters problem Estimate a vector x R n from low dimensional sketch Ax R m. Nice algorithm Simple Used in Google s MapReduce standard library [CCF02] bounds the maximum error over all coordinates. We show, for the same algorithm, Most coordinates have asymptotically better estimation accuracy. The average accuracy over many coordinates will be asymptotically better with high probability. Experiments show our asymptotics are correct. Caveat: we assume fully independent hash functions. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

3 Outline 1 Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch 2 Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch 3 Experiments! Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

4 Outline 1 Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch 2 Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch 3 Experiments! Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

5 Estimating a symmetric random variable s mean X 1/2 1/2 mean µ, standard deviation σ µ ± Unknown distribution X over R, symmetric about unknown µ. Given samples x1,..., x R X. How to estimate µ? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

6 Estimating a symmetric random variable s mean X 1/2 1/2 mean µ, standard deviation σ µ ± Unknown distribution X over R, symmetric about unknown µ. Given samples x1,..., x R X. How to estimate µ? Mean: Converges to µ as σ/ R. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

7 Estimating a symmetric random variable s mean X 1/2 1/2 mean µ, standard deviation σ µ ± Unknown distribution X over R, symmetric about unknown µ. Given samples x1,..., x R X. How to estimate µ? Mean: Converges to µ as σ/ R. No robustness to outliers Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

8 Estimating a symmetric random variable s mean X 1/2 1/2 mean µ, standard deviation σ µ ± Unknown distribution X over R, symmetric about unknown µ. Given samples x1,..., x R X. How to estimate µ? Mean: Converges to µ as σ/ R. No robustness to outliers Median: Extremely robust Doesn t necessarily converge to µ. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

9 Estimating a symmetric random variable s mean X µ σ µ µ +σ Median doesn t converge Consider: median of pairwise means x µ = median i + x i+1 i {1, 3, 5,...} 2 Converges as O(σ/ R), even with outliers. That is: median of (X + X ) converges. [See also: Hodges-Lehmann estimator.] Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

10 Why does median converge for X + X? WLOG µ = 0. Define the Fourier transform F X of X : F X (t) = E [cos(τxt)] x X 2π 6.28 (standard Fourier transform of PDF, specialized to symmetric X.) Convolution multiplication F X +X (t) = (F X (t)) 2 0 for all t. Theorem Let Y be symmetric about 0 with F Y (t) 0 for all t and E[Y 2 ] = σ 2. Then for all ɛ 1, Pr[ y ɛσ] ɛ Standard Chernoff bounds: median y 1,..., y R converges as σ/ R. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

11 Proof Theorem Let F Y (t) 0 for all t and E[Y 2 ] = 1. Then for all ɛ 1, Pr[ y ɛ] ɛ. F Y (t) = E[cos(τyt)] 1 τ 2 Pr[ y ɛ] = Y 1 ɛ Y 1 ɛ = F Y ɛ 1/ɛ 2 t ɛ 1/ɛ ɛ. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

12 Outline 1 Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch 2 Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch 3 Experiments! Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

13 Count-Sketch Want to estimate x R n from small sketch. Hash to k buckets and sum up with random signs Choose random h : [n] [k], s : [n] {±1}. Store y j = i: h(i)=j s(i)x i k Can estimate x i by x i = y h(i) s(i). R Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

14 Count-Sketch Want to estimate x R n from small sketch. Hash to k buckets and sum up with random signs Choose random h : [n] [k], s : [n] {±1}. Store y j = i: h(i)=j s(i)x i k Can estimate x i by x i = y h(i) s(i). Repeat R times, take the median. For each row, R { ±xj with probability 1/k x i x i = j i 0 otherwise Symmetric, non-negative Fourier transform. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

15 Count-Sketch Analysis Let σ 2 = 1 k min k-sparse x [k] x x [k] 2 2 be the typical error for a single row of Count-Sketch with k columns. Theorem For the any coordinate i, we have for all t R that t Pr[ x i x i > R σ] e Ω(t). (CCF02: t = R = O(log n) case; x x σ w.h.p.) Corollary Excluding e Ω(R) probability events, we have for each i that E[( x i x i ) 2 ] = σ 2 /R Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

16 Estimation of multiple coordinates? What about the average error on a set S of k coordinates? Linearity of expectation: E[ x S x S 2 2 ] = O(1) R kσ2. Does it concentrate? By expectation: p = Θ(1). If independent: p = e Ω(k). Pr[ x S x S 2 2 > O(1) R kσ2 ] < p =??? Sum of many variables, but not independent... Chebyshev s inequality, bounding covariance of error: Feasible to analyze (though kind of nasty). Ideally get: p = 1/ k. We can get p = 1/k 1/14. Can we at least get high probability, i.e. 1/k c for arbitrary constant c? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

17 Boosting the error probability in a black box manner We know that x S x S 2 is small with all but k 1/14 probability. Way to get all but k c probability: repeat 100c times and take the median of results. With all but k c probability, > 75c of the x (i) S will have small error. Median of results has at most 3 small total error. But resulting algorithm is stupid: Run count-sketch with R = O(cR). Arbitrarily partition into blocks of R rows. Estimate is median (over blocks) of median (within block) of individual estimates. Can we show that the direct median is as good as the median-of-medians? Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

18 Outline 1 Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch 2 Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch 3 Experiments! Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

19 Electoral Colleges Suppose you have a two-party election for k offices. Voters come from a distribution X over {0, 1} k. True majority slate of candidates x {0, 1} k. Election day, receive ballots x1,..., x n X. How to best estimate x? For each office, x majority x electoral x 1 x 2 x 3 x n 1 x n x CA x TX x 1 x CA x n TX +1 x n Is x majority better than x electoral in every way? Is for all α,? Pr[ x majority x > α] Pr[ x electoral x > α] Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

20 Electoral Colleges Is x majority better than x electoral in every way, so Pr[ x majority x > α] Pr[ x electoral x > α] for all α,? Don t know, but Theorem Pr[ x majority x > 3α] 4 Pr[ x electoral x > α] for all p-norms. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

21 Proof Theorem for all p-norms. Pr[ x majority x > 3α] 4 Pr[ x electoral x > α] Follows easily from: Lemma (median 3 ) For any x 1,..., x n R k, we have median median median x i = median partitions into states states within state populace x i (With 4p failure probability, 3/4 of partitions have error at most α; then their median has error 3α.) Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

22 Outline 1 Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch 2 Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch 3 Experiments! Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

23 Concentration for sets We know that a median-of-medians variant of Count-Sketch would give good estimation of sets with high probability. Therefore the standard Count-Sketch would as well. Theorem For any constant c, we have for any set S of coordinates that S Pr[ x S x S 2 > O( R σ)] S c. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

24 Outline 1 Robust Estimation of Symmetric Variables Lemma Relevance to Count-Sketch 2 Electoral Colleges and Direct Elections Lemma Relevance to Count-Sketch 3 Experiments! Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

25 Experiments Claims 1 Individual coordinates have error that concentrates like a Gaussian with standard deviation σ/ R. 2 Sets of coordinates have error O(σ k/r) with high probability. Evaluate on power-law distribution with typical parameters. Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

26 Experiments 1 Individual coordinates have error that concentrates like a Gaussian with standard deviation σ/ R. Compare observed error to expected error for various R, C. Probability density Distribution of errors, 10 trials at n = R=20, C=20 R=20, C=30 R=20, C=50 R=20, C=100 R=20, C=200 R=20, C=500 R=20, C=1000 R=20, C=2000 R=20, C=5000 R=20, C=10000 R=50, C=20 R=50, C=30 R=50, C=50 R=50, C=100 R=50, C=200 R=100, C=20 R=100, C=30 R=100, C=50 R=100, C=100 R=100, C=200 R=200, C=20 R=200, C=30 R=200, C=50 R=200, C=100 R=500, C=20 R=500, C=30 R=1000, C= The ratio x i x i /m R,C Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

27 Experiments 2 Sets of coordinates have error O(σ k/r) with high probability. (for large enough R, C) Compare observed error to expected error for various R, C. Probability density Distribution of E k for various C with n =10000, k =25, R = C=20 C=50 C=100 C=200 C=500 C= E k /(m R,C k) Probability density Distribution of E k for various R with n =10000, k =25, C = R=10 R=20 R=50 R=100 R=200 R=500 R=1000 R=2000 R= E k /(m R,C k) Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

28 Conclusions We present an improved analysis of Count-Sketch, a classic algorithm used in practice. Experiments show it gives the right asymptotics More applications of our lemmas? Independence? Thank You Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

29 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds for Count-Sketch / 24

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]