Data Streams & Communication Complexity

Size: px

Start display at page:

Download "Data Streams & Communication Complexity"

Marcus Henry
5 years ago
Views:

1 Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25

2 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... 2/25

3 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties,... 2/25

4 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties,... Catch: 1. Limited working memory, sublinear in n and m 2/25

5 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties,... Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 2/25

6 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties,... Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 3. Process each element quickly 2/25

7 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... Goal: Compute some function of stream, e.g., number of distinct elements, frequent items, longest increasing sequence, a clustering, graph connectivity properties,... Catch: 1. Limited working memory, sublinear in n and m 2. Access data sequentially 3. Process each element quickly Origins in seventies but has become popular in last ten years... 2/25

8 Why s it become popular? Practical Appeal: Faster networks, cheaper data storage, ubiquitous data-logging results in massive amount of data to be processed. Applications to network monitoring, query planning, I/O efficiency for massive data, sensor networks aggregation... 3/25

9 Why s it become popular? Practical Appeal: Faster networks, cheaper data storage, ubiquitous data-logging results in massive amount of data to be processed. Applications to network monitoring, query planning, I/O efficiency for massive data, sensor networks aggregation... Theoretical Appeal: Easy to state problems but hard to solve. Links to communication complexity, compressed sensing, metric embeddings, pseudo-random generators, approximation... 3/25

10 This Lecture: Basic Numerical Statistics Given a stream of m elements from universe [n] = {1, 2,..., n}, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... let f N n be the frequency vector where f i is the frequency of i. 4/25

11 This Lecture: Basic Numerical Statistics Given a stream of m elements from universe [n] = {1, 2,..., n}, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... let f N n be the frequency vector where f i is the frequency of i. Problems: What can we approximate in sub linear space? Frequency moments: Fk = i f i k. Max frequency: F = max i f i. Number of distinct element: F0 = i f i 0 Median: j such that f1 + f f j m/2 Algorithms are often randomized and guarantees will be probabilistic. 4/25

12 This Lecture: Basic Numerical Statistics Given a stream of m elements from universe [n] = {1, 2,..., n}, e.g., x 1, x 2,..., x m = 3, 5, 3, 7, 5, 4,... let f N n be the frequency vector where f i is the frequency of i. Problems: What can we approximate in sub linear space? Frequency moments: Fk = i f i k. Max frequency: F = max i f i. Number of distinct element: F0 = i f i 0 Median: j such that f1 + f f j m/2 Algorithms are often randomized and guarantees will be probabilistic. Keep things simple: Could consider f i s being increased or decreased but for this talk we ll focus on unit increments. Will also assume algorithms have an unlimited store of random bits. 4/25

13 Outline Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist l p Sampling and Frequency Moments 5/25

14 Sampling and Statistics Sampling is a general technique for tackling massive amounts of data 6/25

15 Sampling and Statistics Sampling is a general technique for tackling massive amounts of data Example: To find an ɛ-approximate median, i.e., j such that f 1 + f f j = m/2 ± ɛm then sampling O(ɛ 2 ) stream elements and returning the sample median works with good probability. 6/25

16 Sampling and Statistics Sampling is a general technique for tackling massive amounts of data Example: To find an ɛ-approximate median, i.e., j such that f 1 + f f j = m/2 ± ɛm then sampling O(ɛ 2 ) stream elements and returning the sample median works with good probability. Beyond basic sampling: There are more powerful forms of sampling and other techniques the make better use of the limited space. 6/25

17 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 7/25

18 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } 7/25

19 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } Output X = m(g(r) g(r 1)) 7/25

20 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } Output X = m(g(r) g(r 1)) Expectation: E [X ] 7/25

21 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } Output X = m(g(r) g(r 1)) Expectation: E [X ] = i P [x J = i] E [X x J = i] 7/25

22 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } Output X = m(g(r) g(r 1)) Expectation: E [X ] = i = i P [x J = i] E [X x J = i] f i m ( fi r=1 ) m(g(r) g(r 1)) f i 7/25

23 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } Output X = m(g(r) g(r 1)) Expectation: E [X ] = i = i = i P [x J = i] E [X x J = i] f i m g(f i ) ( fi r=1 ) m(g(r) g(r 1)) f i 7/25

24 AMS Sampling Problem: Estimate i g(f i) for some function g with g(0) = 0 Basic Estimator: Sample x J where J R [m] and compute r = {j J : x j = x J } Output X = m(g(r) g(r 1)) Expectation: E [X ] = i = i = i P [x J = i] E [X x J = i] f i m g(f i ) ( fi r=1 ) m(g(r) g(r 1)) f i For high confidence: Compute t estimators in parallel and average. 7/25

25 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} 8/25

26 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} Use AMS estimator with X = m(r k (r 1) k ). 8/25

27 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} Use AMS estimator with X = m(r k (r 1) k ). Expectation: E [X ] = F k 8/25

28 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} Use AMS estimator with X = m(r k (r 1) k ). Expectation: E [X ] = F k Range: 0 X kmf k 1 kn 1 1/k F k 8/25

29 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} Use AMS estimator with X = m(r k (r 1) k ). Expectation: E [X ] = F k Range: 0 X kmf k 1 kn 1 1/k F k Repeat t times and let F k be the average value. By Chernoff, ] P [ F k F k ɛf k 2 exp ( tf kɛ 2 ) ) = 2 exp ( tɛ2 3kn 1 1/k F k 3kn 1 1/k 8/25

30 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} Use AMS estimator with X = m(r k (r 1) k ). Expectation: E [X ] = F k Range: 0 X kmf k 1 kn 1 1/k F k Repeat t times and let F k be the average value. By Chernoff, ) ] P [ F k F k ɛf k 2 exp ( tf kɛ 2 = 2 exp ( tɛ2 3kn 1 1/k F k ] If t = 3ɛ 2 kn 1 1/k log(2δ 1 ) then P [ F k F k ɛf k δ. 3kn 1 1/k ) 8/25

31 Example: Frequency Moments Frequency Moments: Define F k = i f k i for k {1, 2, 3,...} Use AMS estimator with X = m(r k (r 1) k ). Expectation: E [X ] = F k Range: 0 X kmf k 1 kn 1 1/k F k Repeat t times and let F k be the average value. By Chernoff, ) ] P [ F k F k ɛf k 2 exp ( tf kɛ 2 = 2 exp ( tɛ2 3kn 1 1/k F k ] If t = 3ɛ 2 kn 1 1/k log(2δ 1 ) then P [ F k F k ɛf k δ. 3kn 1 1/k Thm: In Õ(ɛ 2 n 1 1/k ) space we can find a (1 ± ɛ) approximation for F k with probability at least 1 δ. ) 8/25

32 Outline Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist l p Sampling and Frequency Moments 9/25

33 Random Projections Many stream algorithms use a random projection Z R w n, w n Z(f ) = z 1, z 1,n.. z w, z w,n f 1 f 2.. f n s 1 =. s w = s 10/25

34 Random Projections Many stream algorithms use a random projection Z R w n, w n Z(f ) = z 1, z 1,n.. z w, z w,n f 1 f 2.. f n s 1 =. Updatable: We can maintain sketch s in Õ(w) space since incrementing f i corresponds to s w = s 10/25

35 Random Projections Many stream algorithms use a random projection Z R w n, w n Z(f ) = z 1, z 1,n.. z w, z w,n f 1 f 2.. f n s 1 =. Updatable: We can maintain sketch s in Õ(w) space since incrementing f i corresponds to s s + z 1,i. z w,i s w = s 10/25

36 Random Projections Many stream algorithms use a random projection Z R w n, w n Z(f ) = z 1, z 1,n.. z w, z w,n f 1 f 2.. f n s 1 =. Updatable: We can maintain sketch s in Õ(w) space since incrementing f i corresponds to s s + z 1,i. z w,i s w = s Useful: Choose a distribution for z i,j such that relevant function of f can be estimated from s with high probability for sufficiently large w. 10/25

37 Examples If z i,j R { 1, 1}, can estimate F 2 with w = O(ɛ 2 log δ 1 ). 11/25

38 Examples If z i,j R { 1, 1}, can estimate F 2 with w = O(ɛ 2 log δ 1 ). If z i,j D where D is p-stable p (0, 2], can estimate F p with w = O(ɛ 2 log δ 1 ). For example, 1 and 2 stable distributions are: Cauchy(x) = 1 π x 2 Gaussian(x) = 1 2π e x 2 /2 11/25

39 Examples If z i,j R { 1, 1}, can estimate F 2 with w = O(ɛ 2 log δ 1 ). If z i,j D where D is p-stable p (0, 2], can estimate F p with w = O(ɛ 2 log δ 1 ). For example, 1 and 2 stable distributions are: Cauchy(x) = 1 π x 2 Gaussian(x) = 1 2π e x 2 /2 Note that F 0 = (1 ± ɛ)f p if p = log(1 + ɛ)/ log m. 11/25

40 Examples If z i,j R { 1, 1}, can estimate F 2 with w = O(ɛ 2 log δ 1 ). If z i,j D where D is p-stable p (0, 2], can estimate F p with w = O(ɛ 2 log δ 1 ). For example, 1 and 2 stable distributions are: Cauchy(x) = 1 π x 2 Gaussian(x) = 1 2π e x 2 /2 Note that F 0 = (1 ± ɛ)f p if p = log(1 + ɛ)/ log m. For the rest of lecture we ll focus on hash-based sketches. Given a random hash function h : [n] [w], non-zero entries are z hi,i. Z = /25

41 Outline Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist l p Sampling and Frequency Moments 12/25

42 Count-Min Sketch Maintain vector s N w via random hash function h : [n] [w] f[1] f[2] f[3] f[4] f[5] f[6]... f[n] s[1] s[2] s[3]... s[w] 13/25

43 Count-Min Sketch Maintain vector s N w via random hash function h : [n] [w] f[1] f[2] f[3] f[4] f[5] f[6]... f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, increment s hi. Hence, s k = j:h j =k f j 13/25

44 Count-Min Sketch Maintain vector s N w via random hash function h : [n] [w] f[1] f[2] f[3] f[4] f[5] f[6]... f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, increment s hi. Hence, s k = f j e.g., s 3 = f 6 + f 7 + f 13 j:h j =k 13/25

45 Count-Min Sketch Maintain vector s N w via random hash function h : [n] [w] f[1] f[2] f[3] f[4] f[5] f[6]... f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, increment s hi. Hence, s k = f j e.g., s 3 = f 6 + f 7 + f 13 j:h j =k Query: Use f i = s hi to estimate f i. 13/25

46 Count-Min Sketch Maintain vector s N w via random hash function h : [n] [w] f[1] f[2] f[3] f[4] f[5] f[6]... f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, increment s hi. Hence, s k = f j e.g., s 3 = f 6 + f 7 + f 13 j:h j =k Query: Use f i = s hi to estimate f i. Lemma: f i f ] i and P [ fi f i + 2m/w 1/2 13/25

47 Count-Min Sketch Maintain vector s N w via random hash function h : [n] [w] f[1] f[2] f[3] f[4] f[5] f[6]... f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, increment s hi. Hence, s k = f j e.g., s 3 = f 6 + f 7 + f 13 j:h j =k Query: Use f i = s hi to estimate f i. Lemma: f i f ] i and P [ fi f i + 2m/w 1/2 Thm: Let w = 2/ɛ. Repeat the hashing lg(δ 1 ) times in parallel and take the minimum estimate for f i [ ] P f i f i f i + ɛm 1 δ 13/25

48 Proof of Lemma Define E by f i = f i + E and so E = j i:h i =h j f j 14/25

49 Proof of Lemma Define E by f i = f i + E and so E = Since all f j 0, we have E 0. j i:h i =h j f j 14/25

50 Proof of Lemma Define E by f i = f i + E and so E = Since all f j 0, we have E 0. Since P [h i = h j ] = 1/w, j i:h i =h j f j E [E] = j i f j P [h i = h j ] 14/25

51 Proof of Lemma Define E by f i = f i + E and so E = Since all f j 0, we have E 0. Since P [h i = h j ] = 1/w, j i:h i =h j f j E [E] = j i f j P [h i = h j ] m/w 14/25

52 Proof of Lemma Define E by f i = f i + E and so E = Since all f j 0, we have E 0. Since P [h i = h j ] = 1/w, j i:h i =h j f j E [E] = j i f j P [h i = h j ] m/w By an application of the Markov bound, P [E 2m/w] 1/2 14/25

53 Range Queries Range Query: For i, j [n], estimate f [i,j] = f i + f i f j 15/25

54 Range Queries Range Query: For i, j [n], estimate f [i,j] = f i + f i f j Dyadic Intervals: Restrict attention to intervals of the form [1 + (i 1)2 j, i2 j ] where j {0, 1,..., lg n}, i {1, 2,... n/2 j } 15/25

55 Range Queries Range Query: For i, j [n], estimate f [i,j] = f i + f i f j Dyadic Intervals: Restrict attention to intervals of the form [1 + (i 1)2 j, i2 j ] where j {0, 1,..., lg n}, i {1, 2,... n/2 j } since any range can be partitioned as O(log n) such intervals. E.g., [48, 106] = [48, 48] [49, 64] [65, 96] [97, 104] [105, 106] 15/25

56 Range Queries Range Query: For i, j [n], estimate f [i,j] = f i + f i f j Dyadic Intervals: Restrict attention to intervals of the form [1 + (i 1)2 j, i2 j ] where j {0, 1,..., lg n}, i {1, 2,... n/2 j } since any range can be partitioned as O(log n) such intervals. E.g., [48, 106] = [48, 48] [49, 64] [65, 96] [97, 104] [105, 106] To support dyadic intervals, construct Count-Min sketches corresponding to intervals of width 1, 2, 4, 8,... 15/25

57 Range Queries Range Query: For i, j [n], estimate f [i,j] = f i + f i f j Dyadic Intervals: Restrict attention to intervals of the form [1 + (i 1)2 j, i2 j ] where j {0, 1,..., lg n}, i {1, 2,... n/2 j } since any range can be partitioned as O(log n) such intervals. E.g., [48, 106] = [48, 48] [49, 64] [65, 96] [97, 104] [105, 106] To support dyadic intervals, construct Count-Min sketches corresponding to intervals of width 1, 2, 4, 8,... E.g., for intervals of width 2 we have: f[1] f[2] f[3] f[4] f[5] f[6]... f[n-1] f[n] g[1] g[2] g[3]... g[n/2] s[1] s[2] s[3]... s[w] where update rule is now: for increment of f 2i 1 or f 2i, increment s hi. 15/25

58 Quantiles and Heavy Hitters 16/25

59 Quantiles and Heavy Hitters Quantiles: Find j such that f f j m/2 16/25

60 Quantiles and Heavy Hitters Quantiles: Find j such that f f j m/2 Can approximate median via binary search of range queries. 16/25

61 Quantiles and Heavy Hitters Quantiles: Find j such that f f j m/2 Can approximate median via binary search of range queries. Heavy Hitter Problem: Find a set S [n] where {i : f i φm} S {i : f i (φ ɛ)m} 16/25

62 Quantiles and Heavy Hitters Quantiles: Find j such that f f j m/2 Can approximate median via binary search of range queries. Heavy Hitter Problem: Find a set S [n] where {i : f i φm} S {i : f i (φ ɛ)m} Rather than checking each f i individually can save time by exploiting the fact that if f [i,k] < φm then f j < φm for all j [i, k]. 16/25

63 Outline Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist l p Sampling and Frequency Moments 17/25

64 Count-Sketch: Count-Min with a Twist Maintain s N w via hash functions h : [n] [w], r : [n] { 1, 1} f[1] f[2] f[3] f[4] f[5] f[6]... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6]... -f[n] s[1] s[2] s[3]... s[w] 18/25

65 Count-Sketch: Count-Min with a Twist Maintain s N w via hash functions h : [n] [w], r : [n] { 1, 1} f[1] f[2] f[3] f[4] f[5] f[6]... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6]... -f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, s hi s k = f j r j j:h j =k s hi + r i. Hence, 18/25

66 Count-Sketch: Count-Min with a Twist Maintain s N w via hash functions h : [n] [w], r : [n] { 1, 1} f[1] f[2] f[3] f[4] f[5] f[6]... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6]... -f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, s hi s hi + r i. Hence, s k = f j r j e.g., s 3 = f 6 f 7 f 13 j:h j =k 18/25

67 Count-Sketch: Count-Min with a Twist Maintain s N w via hash functions h : [n] [w], r : [n] { 1, 1} f[1] f[2] f[3] f[4] f[5] f[6]... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6]... -f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, s hi s hi + r i. Hence, s k = f j r j e.g., s 3 = f 6 f 7 f 13 j:h j =k Query: Use f i = s hi r i to estimate f i. 18/25

68 Count-Sketch: Count-Min with a Twist Maintain s N w via hash functions h : [n] [w], r : [n] { 1, 1} f[1] f[2] f[3] f[4] f[5] f[6]... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6]... -f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, s hi s hi + r i. Hence, s k = f j r j e.g., s 3 = f 6 f 7 f 13 j:h j =k Query: Use f i ] = s hi r i to estimate ] f i. Lemma: E [ f i = f i and V [ f i F 2 /w 18/25

69 Count-Sketch: Count-Min with a Twist Maintain s N w via hash functions h : [n] [w], r : [n] { 1, 1} f[1] f[2] f[3] f[4] f[5] f[6]... f[n] -f[1] f[2] -f[3] f[4] f[5] -f[6]... -f[n] s[1] s[2] s[3]... s[w] Update: For each increment of f i, s hi s hi + r i. Hence, s k = f j r j e.g., s 3 = f 6 f 7 f 13 j:h j =k Query: Use f i ] = s hi r i to estimate ] f i. Lemma: E [ f i = f i and V [ f i F 2 /w Thm: Let w = O(1/ɛ 2 ). Repeating O(lg δ 1 ) in parallel and taking the median estimate ensures P [f i ɛ F 2 f i f i + ɛ ] F 2 1 δ. 18/25

70 Proof of Lemma Define E by f i = f i + Er i and so E = f j r j j i:h i =h j 19/25

71 Proof of Lemma Define E by f i = f i + Er i and so E = f j r j j i:h i =h j Expectation: Since E [r j ] = 0, E [E] = f j E [r j ] = 0 j i:h i =h j 19/25

72 Proof of Lemma Define E by f i = f i + Er i and so E = f j r j j i:h i =h j Expectation: Since E [r j ] = 0, E [E] = f j E [r j ] = 0 j i:h i =h j Variance: Similarly, V [E] 19/25

73 Proof of Lemma Define E by f i = f i + Er i and so E = f j r j j i:h i =h j Expectation: Since E [r j ] = 0, E [E] = f j E [r j ] = 0 j i:h i =h j Variance: Similarly, V [E] E ( f j r j ) 2 j i:h i =h j 19/25

74 Proof of Lemma Define E by f i = f i + Er i and so E = f j r j j i:h i =h j Expectation: Since E [r j ] = 0, E [E] = f j E [r j ] = 0 j i:h i =h j Variance: Similarly, V [E] E ( f j r j ) 2 = j i:h i =h j f j f k E [r j r k ] P [h i = h j = h k ] j,k i h i =h j =h k 19/25

75 Proof of Lemma Define E by f i = f i + Er i and so E = f j r j j i:h i =h j Expectation: Since E [r j ] = 0, E [E] = f j E [r j ] = 0 j i:h i =h j Variance: Similarly, V [E] E ( f j r j ) 2 = j i:h i =h j = f j f k E [r j r k ] P [h i = h j = h k ] j,k i h i =h j =h k j i:h i =h j f 2 j P [h i = h j ] F 2 /w 19/25

76 Outline Sampling Sketching: The Basics Count-Min and Applications Count-Sketch: Count-Min with a Twist l p Sampling and Frequency Moments 20/25

77 l p Sampling 21/25

78 l p Sampling l p Sampling: Return random values I [n] and R R where P [I = i] = (1 ± ɛ) f i p F p and R = (1 ± ɛ)f i 21/25

79 l p Sampling l p Sampling: Return random values I [n] and R R where Applications: P [I = i] = (1 ± ɛ) f i p F p and R = (1 ± ɛ)f i Will use l2 sampling to get optimal algorithm for F k, k > 2. Will use l0 sampling for processing graph streams. Many other stream problems can be solved via lp sampling, e.g., duplicate finding, triangle counting, entropy estimation. 21/25

80 l p Sampling l p Sampling: Return random values I [n] and R R where Applications: P [I = i] = (1 ± ɛ) f i p F p and R = (1 ± ɛ)f i Will use l2 sampling to get optimal algorithm for F k, k > 2. Will use l0 sampling for processing graph streams. Many other stream problems can be solved via lp sampling, e.g., duplicate finding, triangle counting, entropy estimation. Let s see algorithm for p = /25

81 l 2 Sampling Algorithm 22/25

82 l 2 Sampling Algorithm Weight f i by γ i = 1/u i where u i R [0, 1] to form vector g: f = (f 1, f 2,..., f n ) g = (g 1, g 2,..., g n ) where g i = γ i f i 22/25

83 l 2 Sampling Algorithm Weight f i by γ i = 1/u i where u i R [0, 1] to form vector g: Return (i, f i ) if g 2 i f = (f 1, f 2,..., f n ) g = (g 1, g 2,..., g n ) where g i = γ i f i t := F 2 (f )/ɛ 22/25

84 l 2 Sampling Algorithm Weight f i by γ i = 1/u i where u i R [0, 1] to form vector g: f = (f 1, f 2,..., f n ) g = (g 1, g 2,..., g n ) where g i = γ i f i Return (i, f i ) if gi 2 t := F 2 (f )/ɛ Probability (i, f i ) is returned: P [ gi 2 t ] 22/25

85 l 2 Sampling Algorithm Weight f i by γ i = 1/u i where u i R [0, 1] to form vector g: f = (f 1, f 2,..., f n ) g = (g 1, g 2,..., g n ) where g i = γ i f i Return (i, f i ) if gi 2 t := F 2 (f )/ɛ Probability (i, f i ) is returned: P [ g 2 i t ] = P [ u i f 2 i /t ] = f 2 i /t 22/25

86 l 2 Sampling Algorithm Weight f i by γ i = 1/u i where u i R [0, 1] to form vector g: f = (f 1, f 2,..., f n ) g = (g 1, g 2,..., g n ) where g i = γ i f i Return (i, f i ) if gi 2 t := F 2 (f )/ɛ Probability (i, f i ) is returned: P [ g 2 i t ] = P [ u i f 2 i /t ] = f 2 i /t Probability some value is returned is i f 2 i /t = ɛ so repeating O(ɛ 1 log δ 1 ) ensures a value is returned with probability 1 δ. 22/25

87 l 2 Sampling Algorithm Weight f i by γ i = 1/u i where u i R [0, 1] to form vector g: f = (f 1, f 2,..., f n ) g = (g 1, g 2,..., g n ) where g i = γ i f i Return (i, f i ) if gi 2 t := F 2 (f )/ɛ Probability (i, f i ) is returned: P [ g 2 i t ] = P [ u i f 2 i /t ] = f 2 i /t Probability some value is returned is i f 2 i /t = ɛ so repeating O(ɛ 1 log δ 1 ) ensures a value is returned with probability 1 δ. Lemma: Using a Count-Sketch of size O(ɛ 1 log 2 n) ensures a (1 ± ɛ) approximation of any g i that passes the threshold. 22/25

88 Proof of Lemma 23/25

89 Proof of Lemma Exercise: P [F 2 (g)/f 2 (f ) c log n] 99/100 for some large c > 0 so we ll condition on this event. 23/25

90 Proof of Lemma Exercise: P [F 2 (g)/f 2 (f ) c log n] 99/100 for some large c > 0 so we ll condition on this event. Set w = 9cɛ 1 log n. Count-Sketch in O(w log 2 n) space ensures g i = g i ± F 2 (g)/w 23/25

91 Proof of Lemma Exercise: P [F 2 (g)/f 2 (f ) c log n] 99/100 for some large c > 0 so we ll condition on this event. Set w = 9cɛ 1 log n. Count-Sketch in O(w log 2 n) space ensures Then g 2 i F 2 (f )/ɛ implies g i = g i ± F 2 (g)/w F2 (g)/w F 2 (f )/(9ɛ 1 ) ɛ g 2 i /(9ɛ 1 ) = ɛ g i /3 and hence g i 2 = (1 ± ɛ/3) 2 gi 2 = (1 ± ɛ)gi 2 as required. 23/25

92 Proof of Lemma Exercise: P [F 2 (g)/f 2 (f ) c log n] 99/100 for some large c > 0 so we ll condition on this event. Set w = 9cɛ 1 log n. Count-Sketch in O(w log 2 n) space ensures Then g 2 i F 2 (f )/ɛ implies g i = g i ± F 2 (g)/w F2 (g)/w F 2 (f )/(9ɛ 1 ) ɛ g 2 i /(9ɛ 1 ) = ɛ g i /3 and hence g i 2 = (1 ± ɛ/3) 2 gi 2 = (1 ± ɛ)gi 2 as required. Under-the-rug: Need to ensure that conditioning doesn t affect sampling probability too much. 23/25

93 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. 24/25

94 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 24/25

95 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] 24/25

96 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] = F 2 P [I = i] ((1±γ)fi ) k 2 24/25

97 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] = F f 2 P [I = i] ((1±γ)fi ) k 2 = (1±γ) k 2 F i 2 f k 2 i F 2 24/25

98 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] = F f 2 P [I = i] ((1±γ)fi ) k 2 = (1±γ) k 2 F i 2 f k 2 i = (1± ɛ F 2 2 )F k 24/25

99 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] = F f 2 P [I = i] ((1±γ)fi ) k 2 = (1±γ) k 2 F i 2 f k 2 i = (1± ɛ F 2 2 )F k Range: 0 T (1 + γ)f 2 F k 2 = (1 + γ)n 1 2/k F k. 24/25

100 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] = F f 2 P [I = i] ((1±γ)fi ) k 2 = (1±γ) k 2 F i 2 f k 2 i = (1± ɛ F 2 2 )F k Range: 0 T (1 + γ)f 2 F k 2 = (1 + γ)n 1 2/k F k. Averaging over t = O(ɛ 2 n 1 2/k log δ 1 ) parallel repetitions gives, ] P [ F k F k ɛf k δ 24/25

101 F k Revisited Earlier we used O(n 1 1/k ) space to approximate F k = i f i k. Algorithm: Let (I, R) be an (1 + γ)-approximate l 2 sample. Return T = F 2 R k 2 where F 2 is a (1 ± γ) approximation for F 2 Expectation: Setting γ = ɛ/(4k), E [T ] = F f 2 P [I = i] ((1±γ)fi ) k 2 = (1±γ) k 2 F i 2 f k 2 i = (1± ɛ F 2 2 )F k Range: 0 T (1 + γ)f 2 F k 2 = (1 + γ)n 1 2/k F k. Averaging over t = O(ɛ 2 n 1 2/k log δ 1 ) parallel repetitions gives, ] P [ F k F k ɛf k δ Thm: In Õ(ɛ 2 n 1 2/k ) space we can find a (1 ± ɛ) approximation for F k with probability at least 1 δ. 24/25

102 Summary Basic Sampling: Can do basic sampling where i is selected with probability f i but we can be much smarter via sketches. Count-Min: f i f i f i + ɛf 1 in O(ɛ 1 ) space. Z = Count-Sketch: f i ɛ F 2 f i f i + ɛ F 2 in O(ɛ 2 ) space Z = Above sketches solve range-queries, quantiles, heavy hitters,... l p -Sampling: Selecting i with probability f p i in O(ɛ 2 ) space. 0 γ γ 6 Z = γ γ 1 0 γ 3 0 γ /25

CMPSCI 711: More Advanced Algorithms

CMPSCI 711: More Advanced Algorithms Section 1-1: Sampling Andrew McGregor Last Compiled: April 29, 2012 1/14 Concentration Bounds Theorem (Markov) Let X be a non-negative random variable with expectation