Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Size: px

Start display at page:

Download "Linear Sketches A Useful Tool in Streaming and Compressive Sensing"

Phillip Lucas
5 years ago
Views:

1 Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1

2 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v 2-1

3 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v Simple and useful: Statistics/graph/algebraic problems in data streams, compressive sensing,

4 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v Simple and useful: Statistics/graph/algebraic problems in data streams, compressive sensing,... And rich in theory! You will see in this course. 2-3

5 Data streams The model (Alon, Matias and Szegedy 1996) RAM CPU 3-1

6 Data streams The model (Alon, Matias and Szegedy 1996) RAM Applicaitons CPU etc. 3-2

7 Data streams The model (Alon, Matias and Szegedy 1996) RAM Applicaitons CPU A list of theoretical problems etc. 3-3

8 Why hard? Game 1: A sequence of numbers 4-1

9 Why hard? Game 1: A sequence of numbers

10 Why hard? Game 1: A sequence of numbers

11 Why hard? Game 1: A sequence of numbers

12 Why hard? Game 1: A sequence of numbers

13 Why hard? Game 1: A sequence of numbers

14 Why hard? Game 1: A sequence of numbers

15 Why hard? Game 1: A sequence of numbers

16 Why hard? Game 1: A sequence of numbers

17 Why hard? Game 1: A sequence of numbers

18 Why hard? Game 1: A sequence of numbers

19 Why hard? Game 1: A sequence of numbers

20 Why hard? Game 1: A sequence of numbers Q: What s the median? 4-13

21 Why hard? Game 1: A sequence of numbers Q: What s the median? A:

22 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul 5-1

23 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Bob become friends 5-2

24 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Carol and Eva become friends 5-3

25 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Eva and Bob become friends 5-4

26 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Paul become friends 5-5

27 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Paul become friends 5-6

28 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Eva and Bob unfriends 5-7

29 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Dave become friends 5-8

30 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Bob and Paul become friends 5-9

31 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Paul unfriends 5-10

32 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Carol become friends 5-11

33 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? 5-12

34 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? A: YES. Eva Carol Dave Alice Bob 5-13

35 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? A: YES. Eva Carol Dave Alice Bob Why hard? Short of memory! 5-14

36 A simple example: distinct elements The problem Q: Why linear sketch can be maintained in the streaming model? RAM CPU 6-1

37 A simple example: distinct elements The problem RAM How many distinct elements? Approximation needed. CPU 6-2

38 A simple example: distinct elements The problem RAM How many distinct elements? Approximation needed. CPU Search version Decision version Let D be # distinct elements: If D T (1 + ɛ), then answer YES. If D T /(1 + ɛ), then answer NO. Try T = 1, (1 + ɛ), (1 + ɛ) 2,

39 Now, the decision problem The algorithm 1. Select a random set S {1, 2,..., n}, s.t. for each i, independently, we have Pr[i S] = 1/T 2. Make a pass over the stream, maintaining Sum S (x) = i S x i Note: this is a linear sketch. 3. If Sum S (x) > 0, return YES, otherwise return NO. 7-1

40 Now, the decision problem 7-2 The algorithm 1. Select a random set S {1, 2,..., n}, s.t. for each i, independently, we have Pr[i S] = 1/T 2. Make a pass over the stream, maintaining Sum S (x) = i S x i Note: this is a linear sketch. 3. If Sum S (x) > 0, return YES, otherwise return NO. Lemma Let P = Pr[SumS(x) = 0]. If T is large enough, and ɛ is small enough, then If D T (1 + ɛ), then P < 1/e ɛ/3. If D T /(1 + ɛ), then P > 1/e + ɛ/3. (Introduce a few useful probabilistic basics)

41 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. 8-1

42 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. 8-2

43 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. Theorem The number of distinct elements can be (1 ± ɛ)-approximated with probability 1 δ using O(log n log(1/δ)/ɛ 3 ) words. 8-3

44 Course plan 9-1

45 Course plan That s all for lecture 1. Thank you. 9-2

46 Frequency moments and norms Frequency moments: F p = i f i p, f i : frequency of item i. F 0 : number of distinct items. F 1 : total number of items. F 2 : size of self-join. 10-1

47 Frequency moments and norms Frequency moments: F p = i f i p, f i : frequency of item i. F 0 : number of distinct items. F 1 : total number of items. F 2 : size of self-join. A very good measurement of the skewness of the dataset. 10-2

48 Frequency moments and norms Frequency moments: F p = i f i p, f i : frequency of item i. F 0 : number of distinct items. F 1 : total number of items. F 2 : size of self-join. A very good measurement of the skewness of the dataset. Norms: L p = F 1/p p 10-3

49 L 2 estimation The sketch for L 2 : a linear sketch Rx = [Z 1,..., Z k ], where each entry of k n (k = O(1/ɛ 2 )) matrix R has distribution N (0, 1). Each of Z i is draw from N (0, x 2 2 ). Alternatively, Z i = x 2 G i, where G i drawn from N (0, 1). 11-1

50 L 2 estimation The sketch for L 2 : a linear sketch Rx = [Z 1,..., Z k ], where each entry of k n (k = O(1/ɛ 2 )) matrix R has distribution N (0, 1). Each of Z i is draw from N (0, x 2 2 ). Alternatively, Z i = x 2 G i, where G i drawn from N (0, 1). The estimator: Y = median{ Z 1,..., Z k }/median{g}; G N (0, 1) a a M is the median of a random variable R if Pr[ R M] = 1/2 11-2

51 L 2 estimation The sketch for L 2 : a linear sketch Rx = [Z 1,..., Z k ], where each entry of k n (k = O(1/ɛ 2 )) matrix R has distribution N (0, 1). Each of Z i is draw from N (0, x 2 2 ). Alternatively, Z i = x 2 G i, where G i drawn from N (0, 1). The estimator: Y = median{ Z 1,..., Z k }/median{g}; G N (0, 1) a a M is the median of a random variable R if Pr[ R M] = 1/2 Sounds like magic? The intuition behind: For nice looking distributions (e.g., the Gaussian), the median of those samples, for large enough # samples, should converge to the median of the distribution. 11-3

52 The proof Closeness in Probability Let U 1,..., U k be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{u 1,..., U k }, there is an absolute constant C > 0, Pr[F (U) (1/2 ɛ, 1/2 + ɛ)] 1 e Ckɛ2 12-1

53 The proof Closeness in Probability Let U 1,..., U k be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{u 1,..., U k }, there is an absolute constant C > 0, Pr[F (U) (1/2 ɛ, 1/2 + ɛ)] 1 e Ckɛ2 Closeness in Value Let F be a c.d.f. of a random variable G, G drawn from N (0, 1). There exists an absolute constant C > 0 such that if for any z 0 we have F (z) (1/2 ɛ, 1/2 + ɛ), then z = M ± C ɛ. 12-2

54 The proof Closeness in Probability Let U 1,..., U k be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{u 1,..., U k }, there is an absolute constant C > 0, Pr[F (U) (1/2 ɛ, 1/2 + ɛ)] 1 e Ckɛ2 Closeness in Value Let F be a c.d.f. of a random variable G, G drawn from N (0, 1). There exists an absolute constant C > 0 such that if for any z 0 we have F (z) (1/2 ɛ, 1/2 + ɛ), then z = M ± C ɛ. Theorem Y = x 2 (M ± C ɛ)/m = x 2 (1 ± C ɛ), w.h.p. 12-3

55 Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U x n U n x p U for p =

56 Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U x n U n x p U for p = 2 Such distributions are called p-stable [Indyk 06] Good news: p-stable distributions exist for any p (0, 2] 13-2

Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U 1 +.

57 Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U x n U n x p U for p = 2 Such distributions are called p-stable [Indyk 06] Good news: p-stable distributions exist for any p (0, 2] For p = 1, we get Cauchy distribution with density function: f (x) = 1/[π(1 + x 2 )] 13-3

58 L p (p > 2) (Not linear mapping but important) We instead approximate F p = n i=1 x p i = x p p 14-1

59 L p (p > 2) (Not linear mapping but important) We instead approximate F p = n i=1 x p i First attempt: Use two passes. = x p p 1. Pick a random element i from the stream in 1st pass. (Q: How?) 2. Compute i s frequency x i in 2nd pass 3. Finally, return Y = mx p 1 i. 14-2

60 L p (p > 2) (Not linear mapping but important) 14-3 We instead approximate F p = n i=1 x p i First attempt: Use two passes. = x p p 1. Pick a random element i from the stream in 1st pass. (Q: How?) 2. Compute i s frequency x i in 2nd pass 3. Finally, return Y = mx p 1 i. Second attempt: Collapse the two passes above 1. Pick a random element i from the stream, count the number of occurances of i in the rest of the stream, denoted by r. 2. Now we use r instead of x i to construct the estimator: Y = m(r p (r 1) p ).

61 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } 15-1

62 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } L p Heavy Hitter Problem: Given φ, φ, (often φ = φ ɛ), return a set S such that HH p φ (x) S HHp φ (x) 15-2

63 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } L p Heavy Hitter Problem: Given φ, φ, (often φ = φ ɛ), return a set S such that HH p φ (x) S HHp φ (x) L p Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x p 15-3

64 L 2 point query The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s 16-1

65 L 2 point query The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s Theorem Johnson-Linderstrauss Lemma x x 2 = l, we have (1 ɛ)l Rx 2 2 /k (1 + ɛ)l w.p. 1 δ. 16-2

66 L 2 point query The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s Theorem Johnson-Linderstrauss Lemma x x 2 = l, we have (1 ɛ)l Rx 2 2 /k (1 + ɛ)l w.p. 1 δ. Theorem We can solve L 2 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ 2 log(1/δ)) numbers. 16-3

67 L 1 point query The algorithm for x 0 [Cormode and Muthu 05] Pick d (d = log(1/δ)) random hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ). Maintain d vectors Z 1,..., Z d where Z t = {Z1 t,..., Z w t } such that Zj t = i:h t (i)=j x i Estimator: x i = min t Z t h t (i) 17-1

68 L 1 point query The algorithm for x 0 [Cormode and Muthu 05] Pick d (d = log(1/δ)) random hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ). Maintain d vectors Z 1,..., Z d where Z t = {Z1 t,..., Z w t } such that Zj t = i:h t (i)=j x i Estimator: x i = min t Z t h t (i) Theorem We can solve L 1 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ log(1/δ)) numbers. 17-2

69 Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction Single-pixel camera Compressive sensor network etc. 18-1

70 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p ( ) 19-1

71 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p ( ) Err k p (x) 19-2

72 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p Often study: L 1 /L 1, L 1 /L 2 and L 2 /L 2 ( ) Err k p (x) 19-3

73 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p Often study: L 1 /L 1, L 1 /L 2 and L 2 /L 2 ( ) Err k p (x) For each: Given a (random) matrix A, for each signal x, ( ) holds w.h.p. For all: One matrix A for all signals x. Stronger. 19-4

74 Results 20-1 Up to year copied from Indky s talk

75 For each (L 1 /L 1 ) The algorithm for L 1 point query gives a L 1 /L 1 sparse approximation. 21-1

76 For each (L 1 /L 1 ) The algorithm for L 1 point query gives a L 1 /L 1 sparse approximation. Recall L 1 Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x

77 For each (L 1 /L 1 ) The algorithm for L 1 point query gives a L 1 /L 1 sparse approximation. Recall L 1 Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x 1 Set ɛ = α/k and δ = 1/n 2 in L 1 point query. And then return a vector x consisting of k largest (in magnitude) elements of x. It gives w.p. 1 δ, x x 1 (1 + 3α) Err k 1 Total measurements: m = O(k/α log n) 21-3

78 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x

79 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x 2. Theorem Johnson-Linderstrauss Lemma x with x 2 = 1, we have 7/8 Ax 2 8/7 w.p. 1 e O(m). 22-2

80 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x 2. Theorem Johnson-Linderstrauss Lemma x with x 2 = 1, we have 7/8 Ax 2 8/7 w.p. 1 e O(m). Theorem We If each canentry solveof L 1 Apoint is i.i.d. query, as N with (0, 1), approximation and m = O(klog(n/k)), ɛ, and failure then probability A satisfies δ by (k, storing 1/3)-RIP O(1/ɛw.h.p. log(1/δ)) numbers. 22-3

81 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x Theorem Johnson-Linderstrauss Lemma x with x 2 = 1, we have 7/8 Ax 2 8/7 w.p. 1 e O(m). Theorem We If each canentry solveof L 1 Apoint is i.i.d. query, as N with (0, 1), approximation and m = O(klog(n/k)), ɛ, and failure then probability A satisfies δ by (k, storing 1/3)-RIP O(1/ɛw.h.p. log(1/δ)) numbers. Theorem Main Theorem We If Acan has solve (6k, 1/3)-RIP. L 1 point query, Let x with be the approximation solution to the ɛ, and LP: failure probability minimize xδ by 1 storing subject O(1/ɛ to Ax log(1/δ)) = Ax (x numbers. is k-sparse). Then x x 2 C/ k Err k 1 for any x

82 The lower bounds What s known: There exists a m n matrix A with m = O(k log n) (can be improved to m = O(k log(n/k)), and a L 1 /L 1 recovery algorithm R so that for each x, R(Ax) = x such that w.h.p. x x 1 C min x x 1. x 0 k 23-1

83 The lower bounds What s known: There exists a m n matrix A with m = O(k log n) (can be improved to m = O(k log(n/k)), and a L 1 /L 1 recovery algorithm R so that for each x, R(Ax) = x such that w.h.p. x x 1 C min x x 1. x 0 k We are going to show that this is optimal. That is, m = Ω(k log(n/k)). [Do Ba et. al. SODA 10] To show this we need Communication complexity Coding theory 23-2

84 Communication complexity Round 1 Round 2 x y They want to jointly compute some function f (x, y) 24-1

to minimize Total bits of communication # rounds of

85 Communication complexity Round 1 Round 2 x y They want to jointly compute some function f (x, y) We would like to minimize Total bits of communication # rounds of communication (today we only consider 1-round protocol) 24-2

86 Augmented indexing Promise Input: Alice gets x = {x 1, x 2,..., x d } {0, 1} d Bob gets y = {y 1, y 2,..., y d } {0, 1, } d such that for some (unique) i: 1. y i {0, 1} 2. y k = x k for all k > i 3. y 1 = y 2 =... = y i 1 = Output: Does x i = y i (YES/NO)? 25-1

87 Augmented indexing Promise Input: Alice gets x = {x 1, x 2,..., x d } {0, 1} d Bob gets y = {y 1, y 2,..., y d } {0, 1, } d such that for some (unique) i: 1. y i {0, 1} 2. y k = x k for all k > i 3. y 1 = y 2 =... = y i 1 = Output: Does x i = y i (YES/NO)? Theorem We Anycan 1-round solveprotocol L 1 pointfor query, Augmented-Indexing with approximation that ɛ, succeeds and failure w.p. probability 1 δ for some δ bysmall storing const O(1/ɛ δ has log(1/δ)) communication numbers. complexity Ω(d). 25-2

88 The proof Let X be the maximal set of k-sparse n-dimensional binary vectors with minimum Hamming distance k. We have log X = Ω(k log(n/k)). 26-1

89 The proof Let X be the maximal set of k-sparse n-dimensional binary vectors with minimum Hamming distance k. We have log X = Ω(k log(n/k)). Restate of the input for Augmented-Indexing (AI): Alice is given y {0, 1} d and Bob is given i d and y i+1, y i+2,..., y d. Goal: Bob wants to learn y i. 26-2

90 The proof Let X be the maximal set of k-sparse n-dimensional binary vectors with minimum Hamming distance k. We have log X = Ω(k log(n/k)). Restate of the input for Augmented-Indexing (AI): Alice is given y {0, 1} d and Bob is given i d and y i+1, y i+2,..., y d. Goal: Bob wants to learn y i. A protocol using L 1 /L 1 recovery for AI: Set d = log X log n. Let D = 2C + 3 (C is the constant in the sparse recovery) Protocol in next slides 26-3

91 The proof (cont.) A protocol using L 1 /L 1 recovery for AI: 1. Alice splits her string y into log n contiguous chunks y 1,..., y log n, each containing log X bits. She use y j as an index into X to choose x j. Alice define: x = D 1 x 1 + D 2 x D log n x log n. 2. Alice and Bob use shared randomness to choose a random matrix A with orthonormal rows, and round it to A with b = O(log n) bits per entry. Alice computes A x and send to Bob. 3. Bob uses i to compute j = j(i) for which the bit y i occurs in y j. Bob also use y i+1,..., y d to compute x j+1,..., x log n, and he can compute z = D j+1 x j+1 + D j+2 x j D log n x log n. 4. Set w = x z Bob then computes A w using A z and A x 5. From w Bob can recover w such that w u w 1 C min x 0 k w u w 1. where u R B n 1 (k) (the L 1 ball of radius k) 6. From w he can recover x j, thus y j, thus bit y i 27-1

92 Next topic: Graph Algorithms 28-1

93 L 0 sampling Goal: sample an element from the support of a R n 29-1

94 L 0 sampling Goal: sample an element from the support of a R n Algorithm Maintain F 0, an (1 ± 0.1)-approximation to F 0. Hash items using h j : [n] [0, 2 j 1] for j [log n]. For each j, maintain: D j = (1 ± 0.1) {t h j (t) = 0} S j = t,h j (t)=0 f ti t C j = t,h j (t)=0 f t 29-2

95 L 0 sampling Goal: sample an element from the support of a R n Algorithm Maintain F 0, an (1 ± 0.1)-approximation to F 0. Hash items using h j : [n] [0, 2 j 1] for j [log n]. For each j, maintain: D j = (1 ± 0.1) {t h j (t) = 0} S j = t,h j (t)=0 f ti t C j = t,h j (t)=0 f t Theorem Lemma We At level can solve j = 2 L+ 1 log point F 0 query,, there with is aapproximation unique element ɛ, in and the failure stream that probability maps to δ by 0 with storing constant O(1/ɛprobability. log(1/δ)) numbers. 29-3

96 L 0 sampling Goal: sample an element from the support of a R n Algorithm Maintain F 0, an (1 ± 0.1)-approximation to F 0. Hash items using h j : [n] [0, 2 j 1] for j [log n]. For each j, maintain: D j = (1 ± 0.1) {t h j (t) = 0} S j = t,h j (t)=0 f ti t C j = t,h j (t)=0 f t Uniqueness is verified if D j = 1 ± 0.1. If unique, then S j = C j gives identity of the element and C j is the count Theorem Lemma We At level can solve j = 2 L+ 1 log point F 0 query,, there with is aapproximation unique element ɛ, in and the failure stream that probability maps to δ by 0 with storing constant O(1/ɛprobability. log(1/δ)) numbers.

97 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. 30-1

98 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. For example: Connectivity is easy with Õ(n) space if edges are only inserted. But what if edges get deleted? 30-2

99 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. For example: Connectivity is easy with Õ(n) space if edges are only inserted. But what if edges get deleted? A sketch matrix with dimension Õ(n) n2 suffice! To delete e from G: update MA G MA G MA e = MA G e, where A G is the adjacency matrix of G. 30-3

100 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. For example: Connectivity is easy with Õ(n) space if edges are only inserted. But what if edges get deleted? A sketch matrix with dimension Õ(n) n2 suffice! To delete e from G: update MA G MA G MA e = MA G e, where A G is the adjacency matrix of G. Magic? Mmm, the information of connectivity is Õ(n) :-) 30-4

101 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. 31-1

102 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. Lemma Takes O(log n) steps and selected edges include spanning forest. 31-2

103 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. Lemma Takes O(log n) steps and selected edges include spanning forest. Graph Representation For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i, j] = 1 if j > i and a i [i, j] = 1 if j < i. 31-3

104 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. Graph Representation For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i, j] = 1 if j > i and a i [i, j] = 1 if j < i. We For any can solve subsetl 1 ofpoint nodesquery, S V with, approximation ɛ, and failure probability δ by storing support( O(1/ɛ log(1/δ)) numbers. i S a i) = E[S, V \S] 31-4 Lemma Takes O(log n) steps and selected edges include spanning forest. Lemma

105 Connectivity (cont.) Sketch: Apply log n sketches C i to each a j 32-1

106 Connectivity (cont.) Sketch: Apply log n sketches C i to each a j Run previous algorithm in sketch space:: 1. Use C 1 a j to get incident edge on each node j 2. For i = 2 to t: To get an incident edge on supernode S V use: j S C ia j = C i ( j S a j) Use L 0 sampling algorithm to sample an edge e support( i S a i) = E[S, V \S] 32-2

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store