Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Size: px
Start display at page:

Download "Linear Sketches A Useful Tool in Streaming and Compressive Sensing"

Transcription

1 Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1

2 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v 2-1

3 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v Simple and useful: Statistics/graph/algebraic problems in data streams, compressive sensing,

4 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v Simple and useful: Statistics/graph/algebraic problems in data streams, compressive sensing,... And rich in theory! You will see in this course. 2-3

5 Data streams The model (Alon, Matias and Szegedy 1996) RAM CPU 3-1

6 Data streams The model (Alon, Matias and Szegedy 1996) RAM Applicaitons CPU etc. 3-2

7 Data streams The model (Alon, Matias and Szegedy 1996) RAM Applicaitons CPU A list of theoretical problems etc. 3-3

8 Why hard? Game 1: A sequence of numbers 4-1

9 Why hard? Game 1: A sequence of numbers

10 Why hard? Game 1: A sequence of numbers

11 Why hard? Game 1: A sequence of numbers

12 Why hard? Game 1: A sequence of numbers

13 Why hard? Game 1: A sequence of numbers

14 Why hard? Game 1: A sequence of numbers

15 Why hard? Game 1: A sequence of numbers

16 Why hard? Game 1: A sequence of numbers

17 Why hard? Game 1: A sequence of numbers

18 Why hard? Game 1: A sequence of numbers

19 Why hard? Game 1: A sequence of numbers

20 Why hard? Game 1: A sequence of numbers Q: What s the median? 4-13

21 Why hard? Game 1: A sequence of numbers Q: What s the median? A:

22 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul 5-1

23 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Bob become friends 5-2

24 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Carol and Eva become friends 5-3

25 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Eva and Bob become friends 5-4

26 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Paul become friends 5-5

27 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Paul become friends 5-6

28 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Eva and Bob unfriends 5-7

29 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Dave become friends 5-8

30 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Bob and Paul become friends 5-9

31 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Paul unfriends 5-10

32 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Carol become friends 5-11

33 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? 5-12

34 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? A: YES. Eva Carol Dave Alice Bob 5-13

35 Why hard? Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? A: YES. Eva Carol Dave Alice Bob Why hard? Short of memory! 5-14

36 A simple example: distinct elements The problem Q: Why linear sketch can be maintained in the streaming model? RAM CPU 6-1

37 A simple example: distinct elements The problem RAM How many distinct elements? Approximation needed. CPU 6-2

38 A simple example: distinct elements The problem RAM How many distinct elements? Approximation needed. CPU Search version Decision version Let D be # distinct elements: If D T (1 + ɛ), then answer YES. If D T /(1 + ɛ), then answer NO. Try T = 1, (1 + ɛ), (1 + ɛ) 2,

39 Now, the decision problem The algorithm 1. Select a random set S {1, 2,..., n}, s.t. for each i, independently, we have Pr[i S] = 1/T 2. Make a pass over the stream, maintaining Sum S (x) = i S x i Note: this is a linear sketch. 3. If Sum S (x) > 0, return YES, otherwise return NO. 7-1

40 Now, the decision problem 7-2 The algorithm 1. Select a random set S {1, 2,..., n}, s.t. for each i, independently, we have Pr[i S] = 1/T 2. Make a pass over the stream, maintaining Sum S (x) = i S x i Note: this is a linear sketch. 3. If Sum S (x) > 0, return YES, otherwise return NO. Lemma Let P = Pr[SumS(x) = 0]. If T is large enough, and ɛ is small enough, then If D T (1 + ɛ), then P < 1/e ɛ/3. If D T /(1 + ɛ), then P > 1/e + ɛ/3. (Introduce a few useful probabilistic basics)

41 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. 8-1

42 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. 8-2

43 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. Theorem The number of distinct elements can be (1 ± ɛ)-approximated with probability 1 δ using O(log n log(1/δ)/ɛ 3 ) words. 8-3

44 Course plan 9-1

45 Course plan That s all for lecture 1. Thank you. 9-2

46 Frequency moments and norms Frequency moments: F p = i f i p, f i : frequency of item i. F 0 : number of distinct items. F 1 : total number of items. F 2 : size of self-join. 10-1

47 Frequency moments and norms Frequency moments: F p = i f i p, f i : frequency of item i. F 0 : number of distinct items. F 1 : total number of items. F 2 : size of self-join. A very good measurement of the skewness of the dataset. 10-2

48 Frequency moments and norms Frequency moments: F p = i f i p, f i : frequency of item i. F 0 : number of distinct items. F 1 : total number of items. F 2 : size of self-join. A very good measurement of the skewness of the dataset. Norms: L p = F 1/p p 10-3

49 L 2 estimation The sketch for L 2 : a linear sketch Rx = [Z 1,..., Z k ], where each entry of k n (k = O(1/ɛ 2 )) matrix R has distribution N (0, 1). Each of Z i is draw from N (0, x 2 2 ). Alternatively, Z i = x 2 G i, where G i drawn from N (0, 1). 11-1

50 L 2 estimation The sketch for L 2 : a linear sketch Rx = [Z 1,..., Z k ], where each entry of k n (k = O(1/ɛ 2 )) matrix R has distribution N (0, 1). Each of Z i is draw from N (0, x 2 2 ). Alternatively, Z i = x 2 G i, where G i drawn from N (0, 1). The estimator: Y = median{ Z 1,..., Z k }/median{g}; G N (0, 1) a a M is the median of a random variable R if Pr[ R M] = 1/2 11-2

51 L 2 estimation The sketch for L 2 : a linear sketch Rx = [Z 1,..., Z k ], where each entry of k n (k = O(1/ɛ 2 )) matrix R has distribution N (0, 1). Each of Z i is draw from N (0, x 2 2 ). Alternatively, Z i = x 2 G i, where G i drawn from N (0, 1). The estimator: Y = median{ Z 1,..., Z k }/median{g}; G N (0, 1) a a M is the median of a random variable R if Pr[ R M] = 1/2 Sounds like magic? The intuition behind: For nice looking distributions (e.g., the Gaussian), the median of those samples, for large enough # samples, should converge to the median of the distribution. 11-3

52 The proof Closeness in Probability Let U 1,..., U k be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{u 1,..., U k }, there is an absolute constant C > 0, Pr[F (U) (1/2 ɛ, 1/2 + ɛ)] 1 e Ckɛ2 12-1

53 The proof Closeness in Probability Let U 1,..., U k be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{u 1,..., U k }, there is an absolute constant C > 0, Pr[F (U) (1/2 ɛ, 1/2 + ɛ)] 1 e Ckɛ2 Closeness in Value Let F be a c.d.f. of a random variable G, G drawn from N (0, 1). There exists an absolute constant C > 0 such that if for any z 0 we have F (z) (1/2 ɛ, 1/2 + ɛ), then z = M ± C ɛ. 12-2

54 The proof Closeness in Probability Let U 1,..., U k be i.i.d. real random variables chosen from any distribution having continuous c.d.f F and median M. Defining U = median{u 1,..., U k }, there is an absolute constant C > 0, Pr[F (U) (1/2 ɛ, 1/2 + ɛ)] 1 e Ckɛ2 Closeness in Value Let F be a c.d.f. of a random variable G, G drawn from N (0, 1). There exists an absolute constant C > 0 such that if for any z 0 we have F (z) (1/2 ɛ, 1/2 + ɛ), then z = M ± C ɛ. Theorem Y = x 2 (M ± C ɛ)/m = x 2 (1 ± C ɛ), w.h.p. 12-3

55 Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U x n U n x p U for p =

56 Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U x n U n x p U for p = 2 Such distributions are called p-stable [Indyk 06] Good news: p-stable distributions exist for any p (0, 2] 13-2

57 Generalization Key property of Guassian distribution: If U 1,..., U n and U are i.i.d drawn from Guassian distribution, then x 1 U x n U n x p U for p = 2 Such distributions are called p-stable [Indyk 06] Good news: p-stable distributions exist for any p (0, 2] For p = 1, we get Cauchy distribution with density function: f (x) = 1/[π(1 + x 2 )] 13-3

58 L p (p > 2) (Not linear mapping but important) We instead approximate F p = n i=1 x p i = x p p 14-1

59 L p (p > 2) (Not linear mapping but important) We instead approximate F p = n i=1 x p i First attempt: Use two passes. = x p p 1. Pick a random element i from the stream in 1st pass. (Q: How?) 2. Compute i s frequency x i in 2nd pass 3. Finally, return Y = mx p 1 i. 14-2

60 L p (p > 2) (Not linear mapping but important) 14-3 We instead approximate F p = n i=1 x p i First attempt: Use two passes. = x p p 1. Pick a random element i from the stream in 1st pass. (Q: How?) 2. Compute i s frequency x i in 2nd pass 3. Finally, return Y = mx p 1 i. Second attempt: Collapse the two passes above 1. Pick a random element i from the stream, count the number of occurances of i in the rest of the stream, denoted by r. 2. Now we use r instead of x i to construct the estimator: Y = m(r p (r 1) p ).

61 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } 15-1

62 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } L p Heavy Hitter Problem: Given φ, φ, (often φ = φ ɛ), return a set S such that HH p φ (x) S HHp φ (x) 15-2

63 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } L p Heavy Hitter Problem: Given φ, φ, (often φ = φ ɛ), return a set S such that HH p φ (x) S HHp φ (x) L p Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x p 15-3

64 L 2 point query The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s 16-1

65 L 2 point query The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s Theorem Johnson-Linderstrauss Lemma x x 2 = l, we have (1 ɛ)l Rx 2 2 /k (1 + ɛ)l w.p. 1 δ. 16-2

66 L 2 point query The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s Theorem Johnson-Linderstrauss Lemma x x 2 = l, we have (1 ɛ)l Rx 2 2 /k (1 + ɛ)l w.p. 1 δ. Theorem We can solve L 2 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ 2 log(1/δ)) numbers. 16-3

67 L 1 point query The algorithm for x 0 [Cormode and Muthu 05] Pick d (d = log(1/δ)) random hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ). Maintain d vectors Z 1,..., Z d where Z t = {Z1 t,..., Z w t } such that Zj t = i:h t (i)=j x i Estimator: x i = min t Z t h t (i) 17-1

68 L 1 point query The algorithm for x 0 [Cormode and Muthu 05] Pick d (d = log(1/δ)) random hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ). Maintain d vectors Z 1,..., Z d where Z t = {Z1 t,..., Z w t } such that Zj t = i:h t (i)=j x i Estimator: x i = min t Z t h t (i) Theorem We can solve L 1 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ log(1/δ)) numbers. 17-2

69 Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction Single-pixel camera Compressive sensor network etc. 18-1

70 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p ( ) 19-1

71 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p ( ) Err k p (x) 19-2

72 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p Often study: L 1 /L 1, L 1 /L 2 and L 2 /L 2 ( ) Err k p (x) 19-3

73 Formalization Lp/Lq guarantee: The goal to acquire a signal x = [x 1,..., x n ] (e.g., a digital image). The acquisition proceeds by computing a measurement vector Ax of dimension m n. Then, from Ax, we want to recover a k-sparse approximation x of x so that x x q C min x 0 k x x p Often study: L 1 /L 1, L 1 /L 2 and L 2 /L 2 ( ) Err k p (x) For each: Given a (random) matrix A, for each signal x, ( ) holds w.h.p. For all: One matrix A for all signals x. Stronger. 19-4

74 Results 20-1 Up to year copied from Indky s talk

75 For each (L 1 /L 1 ) The algorithm for L 1 point query gives a L 1 /L 1 sparse approximation. 21-1

76 For each (L 1 /L 1 ) The algorithm for L 1 point query gives a L 1 /L 1 sparse approximation. Recall L 1 Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x

77 For each (L 1 /L 1 ) The algorithm for L 1 point query gives a L 1 /L 1 sparse approximation. Recall L 1 Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x 1 Set ɛ = α/k and δ = 1/n 2 in L 1 point query. And then return a vector x consisting of k largest (in magnitude) elements of x. It gives w.p. 1 δ, x x 1 (1 + 3α) Err k 1 Total measurements: m = O(k/α log n) 21-3

78 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x

79 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x 2. Theorem Johnson-Linderstrauss Lemma x with x 2 = 1, we have 7/8 Ax 2 8/7 w.p. 1 e O(m). 22-2

80 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x 2. Theorem Johnson-Linderstrauss Lemma x with x 2 = 1, we have 7/8 Ax 2 8/7 w.p. 1 e O(m). Theorem We If each canentry solveof L 1 Apoint is i.i.d. query, as N with (0, 1), approximation and m = O(klog(n/k)), ɛ, and failure then probability A satisfies δ by (k, storing 1/3)-RIP O(1/ɛw.h.p. log(1/δ)) numbers. 22-3

81 For all (L 1 /L 2 ) A matrix A satisfies (k, δ)-rip (Restricted Isometry Property) if k-sparse vector x we have (1 δ) x 2 Ax 2 (1 + δ) x Theorem Johnson-Linderstrauss Lemma x with x 2 = 1, we have 7/8 Ax 2 8/7 w.p. 1 e O(m). Theorem We If each canentry solveof L 1 Apoint is i.i.d. query, as N with (0, 1), approximation and m = O(klog(n/k)), ɛ, and failure then probability A satisfies δ by (k, storing 1/3)-RIP O(1/ɛw.h.p. log(1/δ)) numbers. Theorem Main Theorem We If Acan has solve (6k, 1/3)-RIP. L 1 point query, Let x with be the approximation solution to the ɛ, and LP: failure probability minimize xδ by 1 storing subject O(1/ɛ to Ax log(1/δ)) = Ax (x numbers. is k-sparse). Then x x 2 C/ k Err k 1 for any x

82 The lower bounds What s known: There exists a m n matrix A with m = O(k log n) (can be improved to m = O(k log(n/k)), and a L 1 /L 1 recovery algorithm R so that for each x, R(Ax) = x such that w.h.p. x x 1 C min x x 1. x 0 k 23-1

83 The lower bounds What s known: There exists a m n matrix A with m = O(k log n) (can be improved to m = O(k log(n/k)), and a L 1 /L 1 recovery algorithm R so that for each x, R(Ax) = x such that w.h.p. x x 1 C min x x 1. x 0 k We are going to show that this is optimal. That is, m = Ω(k log(n/k)). [Do Ba et. al. SODA 10] To show this we need Communication complexity Coding theory 23-2

84 Communication complexity Round 1 Round 2 x y They want to jointly compute some function f (x, y) 24-1

85 Communication complexity Round 1 Round 2 x y They want to jointly compute some function f (x, y) We would like to minimize Total bits of communication # rounds of communication (today we only consider 1-round protocol) 24-2

86 Augmented indexing Promise Input: Alice gets x = {x 1, x 2,..., x d } {0, 1} d Bob gets y = {y 1, y 2,..., y d } {0, 1, } d such that for some (unique) i: 1. y i {0, 1} 2. y k = x k for all k > i 3. y 1 = y 2 =... = y i 1 = Output: Does x i = y i (YES/NO)? 25-1

87 Augmented indexing Promise Input: Alice gets x = {x 1, x 2,..., x d } {0, 1} d Bob gets y = {y 1, y 2,..., y d } {0, 1, } d such that for some (unique) i: 1. y i {0, 1} 2. y k = x k for all k > i 3. y 1 = y 2 =... = y i 1 = Output: Does x i = y i (YES/NO)? Theorem We Anycan 1-round solveprotocol L 1 pointfor query, Augmented-Indexing with approximation that ɛ, succeeds and failure w.p. probability 1 δ for some δ bysmall storing const O(1/ɛ δ has log(1/δ)) communication numbers. complexity Ω(d). 25-2

88 The proof Let X be the maximal set of k-sparse n-dimensional binary vectors with minimum Hamming distance k. We have log X = Ω(k log(n/k)). 26-1

89 The proof Let X be the maximal set of k-sparse n-dimensional binary vectors with minimum Hamming distance k. We have log X = Ω(k log(n/k)). Restate of the input for Augmented-Indexing (AI): Alice is given y {0, 1} d and Bob is given i d and y i+1, y i+2,..., y d. Goal: Bob wants to learn y i. 26-2

90 The proof Let X be the maximal set of k-sparse n-dimensional binary vectors with minimum Hamming distance k. We have log X = Ω(k log(n/k)). Restate of the input for Augmented-Indexing (AI): Alice is given y {0, 1} d and Bob is given i d and y i+1, y i+2,..., y d. Goal: Bob wants to learn y i. A protocol using L 1 /L 1 recovery for AI: Set d = log X log n. Let D = 2C + 3 (C is the constant in the sparse recovery) Protocol in next slides 26-3

91 The proof (cont.) A protocol using L 1 /L 1 recovery for AI: 1. Alice splits her string y into log n contiguous chunks y 1,..., y log n, each containing log X bits. She use y j as an index into X to choose x j. Alice define: x = D 1 x 1 + D 2 x D log n x log n. 2. Alice and Bob use shared randomness to choose a random matrix A with orthonormal rows, and round it to A with b = O(log n) bits per entry. Alice computes A x and send to Bob. 3. Bob uses i to compute j = j(i) for which the bit y i occurs in y j. Bob also use y i+1,..., y d to compute x j+1,..., x log n, and he can compute z = D j+1 x j+1 + D j+2 x j D log n x log n. 4. Set w = x z Bob then computes A w using A z and A x 5. From w Bob can recover w such that w u w 1 C min x 0 k w u w 1. where u R B n 1 (k) (the L 1 ball of radius k) 6. From w he can recover x j, thus y j, thus bit y i 27-1

92 Next topic: Graph Algorithms 28-1

93 L 0 sampling Goal: sample an element from the support of a R n 29-1

94 L 0 sampling Goal: sample an element from the support of a R n Algorithm Maintain F 0, an (1 ± 0.1)-approximation to F 0. Hash items using h j : [n] [0, 2 j 1] for j [log n]. For each j, maintain: D j = (1 ± 0.1) {t h j (t) = 0} S j = t,h j (t)=0 f ti t C j = t,h j (t)=0 f t 29-2

95 L 0 sampling Goal: sample an element from the support of a R n Algorithm Maintain F 0, an (1 ± 0.1)-approximation to F 0. Hash items using h j : [n] [0, 2 j 1] for j [log n]. For each j, maintain: D j = (1 ± 0.1) {t h j (t) = 0} S j = t,h j (t)=0 f ti t C j = t,h j (t)=0 f t Theorem Lemma We At level can solve j = 2 L+ 1 log point F 0 query,, there with is aapproximation unique element ɛ, in and the failure stream that probability maps to δ by 0 with storing constant O(1/ɛprobability. log(1/δ)) numbers. 29-3

96 L 0 sampling Goal: sample an element from the support of a R n Algorithm Maintain F 0, an (1 ± 0.1)-approximation to F 0. Hash items using h j : [n] [0, 2 j 1] for j [log n]. For each j, maintain: D j = (1 ± 0.1) {t h j (t) = 0} S j = t,h j (t)=0 f ti t C j = t,h j (t)=0 f t Uniqueness is verified if D j = 1 ± 0.1. If unique, then S j = C j gives identity of the element and C j is the count Theorem Lemma We At level can solve j = 2 L+ 1 log point F 0 query,, there with is aapproximation unique element ɛ, in and the failure stream that probability maps to δ by 0 with storing constant O(1/ɛprobability. log(1/δ)) numbers.

97 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. 30-1

98 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. For example: Connectivity is easy with Õ(n) space if edges are only inserted. But what if edges get deleted? 30-2

99 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. For example: Connectivity is easy with Õ(n) space if edges are only inserted. But what if edges get deleted? A sketch matrix with dimension Õ(n) n2 suffice! To delete e from G: update MA G MA G MA e = MA G e, where A G is the adjacency matrix of G. 30-3

100 Graphs In semi-streaming, want to process graph defined by edges e 1,..., e m with Õ(n) memory and reading sequence in order. For example: Connectivity is easy with Õ(n) space if edges are only inserted. But what if edges get deleted? A sketch matrix with dimension Õ(n) n2 suffice! To delete e from G: update MA G MA G MA e = MA G e, where A G is the adjacency matrix of G. Magic? Mmm, the information of connectivity is Õ(n) :-) 30-4

101 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. 31-1

102 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. Lemma Takes O(log n) steps and selected edges include spanning forest. 31-2

103 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. Lemma Takes O(log n) steps and selected edges include spanning forest. Graph Representation For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i, j] = 1 if j > i and a i [i, j] = 1 if j < i. 31-3

104 Connectivity Basic algorithm (Spanning Forest): 1. For each node, sample a random incident edge 2. Contract selected edges. Repeat until no edges. Graph Representation For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i, j] = 1 if j > i and a i [i, j] = 1 if j < i. We For any can solve subsetl 1 ofpoint nodesquery, S V with, approximation ɛ, and failure probability δ by storing support( O(1/ɛ log(1/δ)) numbers. i S a i) = E[S, V \S] 31-4 Lemma Takes O(log n) steps and selected edges include spanning forest. Lemma

105 Connectivity (cont.) Sketch: Apply log n sketches C i to each a j 32-1

106 Connectivity (cont.) Sketch: Apply log n sketches C i to each a j Run previous algorithm in sketch space:: 1. Use C 1 a j to get incident edge on each node j 2. For i = 2 to t: To get an incident edge on supernode S V use: j S C ia j = C i ( j S a j) Use L 0 sampling algorithm to sample an edge e support( i S a i) = E[S, V \S] 32-2

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

Heavy Hitters. Piotr Indyk MIT. Lecture 4

Heavy Hitters. Piotr Indyk MIT. Lecture 4 Heavy Hitters Piotr Indyk MIT Last Few Lectures Recap (last few lectures) Update a vector x Maintain a linear sketch Can compute L p norm of x (in zillion different ways) Questions: Can we do anything

More information

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan Connection between... Sparse Approximation and Compressed

More information

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT Sparse Approximations Goal: approximate a highdimensional vector x by x that is sparse, i.e., has few nonzero

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology,

More information

Data Streams & Communication Complexity

Data Streams & Communication Complexity Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x

More information

Sublinear Algorithms for Big Data

Sublinear Algorithms for Big Data Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 2: Sublinear in Communication Sublinear in communication The model x 1 = 010011 x 2 = 111011 x 3 = 111111 x k = 100011 Applicaitons They want to

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Distributed Statistical Estimation of Matrix Products with Applications

Distributed Statistical Estimation of Matrix Products with Applications Distributed Statistical Estimation of Matrix Products with Applications David Woodruff CMU Qin Zhang IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p-norms, heavy-hitters,... A {0, 1} m

More information

Lower Bound Techniques for Multiparty Communication Complexity

Lower Bound Techniques for Multiparty Communication Complexity Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand

More information

Tight Bounds for Distributed Streaming

Tight Bounds for Distributed Streaming Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming

More information

Leveraging Big Data: Lecture 13

Leveraging Big Data: Lecture 13 Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector

More information

25.2 Last Time: Matrix Multiplication in Streaming Model

25.2 Last Time: Matrix Multiplication in Streaming Model EE 381V: Large Scale Learning Fall 01 Lecture 5 April 18 Lecturer: Caramanis & Sanghavi Scribe: Kai-Yang Chiang 5.1 Review of Streaming Model Streaming model is a new model for presenting massive data.

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Tight Bounds for Distributed Functional Monitoring

Tight Bounds for Distributed Functional Monitoring Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan. 2012 1-1 The distributed streaming model (a.k.a.

More information

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation (approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup:

More information

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Data Stream Methods. Graham Cormode S. Muthukrishnan

Data Stream Methods. Graham Cormode S. Muthukrishnan Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating

More information

Algorithms for Querying Noisy Distributed/Streaming Datasets

Algorithms for Querying Noisy Distributed/Streaming Datasets Algorithms for Querying Noisy Distributed/Streaming Datasets Qin Zhang Indiana University Bloomington Sublinear Algo Workshop @ JHU Jan 9, 2016 1-1 The big data models The streaming model (Alon, Matias

More information

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /

More information

Wavelet decomposition of data streams. by Dragana Veljkovic

Wavelet decomposition of data streams. by Dragana Veljkovic Wavelet decomposition of data streams by Dragana Veljkovic Motivation Continuous data streams arise naturally in: telecommunication and internet traffic retail and banking transactions web server log records

More information

Lecture 7: Fingerprinting. David Woodruff Carnegie Mellon University

Lecture 7: Fingerprinting. David Woodruff Carnegie Mellon University Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

Lecture 4: Hashing and Streaming Algorithms

Lecture 4: Hashing and Streaming Algorithms CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected

More information

Computing the Entropy of a Stream

Computing the Entropy of a Stream Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound

More information

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Z Algorithmic Superpower Randomization October 15th, Lecture 12 15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Tutorial: Sparse Signal Recovery

Tutorial: Sparse Signal Recovery Tutorial: Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan (Sparse) Signal recovery problem signal or population length N k important Φ x = y measurements or tests:

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions

More information

Lecture 16 Oct 21, 2014

Lecture 16 Oct 21, 2014 CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Lecture Lecture 9 October 1, 2015

Lecture Lecture 9 October 1, 2015 CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest

More information

Some notes on streaming algorithms continued

Some notes on streaming algorithms continued U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.

More information

Finding Frequent Items in Probabilistic Data

Finding Frequent Items in Probabilistic Data Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008

More information

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation (approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup:

More information

Lecture 16 Oct. 26, 2017

Lecture 16 Oct. 26, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Sketching and Streaming for Distributions

Sketching and Streaming for Distributions Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Main Material: Stable distributions, pseudo-random generators,

More information

Asymmetric Communication Complexity and Data Structure Lower Bounds

Asymmetric Communication Complexity and Data Structure Lower Bounds Asymmetric Communication Complexity and Data Structure Lower Bounds Yuanhao Wei 11 November 2014 1 Introduction This lecture will be mostly based off of Miltersen s paper Cell Probe Complexity - a Survey

More information

Sample Optimal Fourier Sampling in Any Constant Dimension

Sample Optimal Fourier Sampling in Any Constant Dimension 1 / 26 Sample Optimal Fourier Sampling in Any Constant Dimension Piotr Indyk Michael Kapralov MIT IBM Watson October 21, 2014 Fourier Transform and Sparsity Discrete Fourier Transform Given x C n, compute

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

Explicit Constructions for Compressed Sensing of Sparse Signals

Explicit Constructions for Compressed Sensing of Sparse Signals Explicit Constructions for Compressed Sensing of Sparse Signals Piotr Indyk MIT July 12, 2007 1 Introduction Over the recent years, a new approach for obtaining a succinct approximate representation of

More information

Sketching in Adversarial Environments

Sketching in Adversarial Environments Sketching in Adversarial Environments Ilya Mironov Moni Naor Gil Segev Abstract We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch

More information

An Adaptive Sublinear Time Block Sparse Fourier Transform

An Adaptive Sublinear Time Block Sparse Fourier Transform An Adaptive Sublinear Time Block Sparse Fourier Transform Volkan Cevher Michael Kapralov Jonathan Scarlett Amir Zandieh EPFL February 8th 217 Given x C N, compute the Discrete Fourier Transform (DFT) of

More information

The Count-Min-Sketch and its Applications

The Count-Min-Sketch and its Applications The Count-Min-Sketch and its Applications Jannik Sundermeier Abstract In this thesis, we want to reveal how to get rid of a huge amount of data which is at least dicult or even impossible to store in local

More information

Streaming Graph Computations with a Helpful Advisor. Justin Thaler Graham Cormode and Michael Mitzenmacher

Streaming Graph Computations with a Helpful Advisor. Justin Thaler Graham Cormode and Michael Mitzenmacher Streaming Graph Computations with a Helpful Advisor Justin Thaler Graham Cormode and Michael Mitzenmacher Thanks to Andrew McGregor A few slides borrowed from IITK Workshop on Algorithms for Processing

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Trusting the Cloud with Practical Interactive Proofs

Trusting the Cloud with Practical Interactive Proofs Trusting the Cloud with Practical Interactive Proofs Graham Cormode G.Cormode@warwick.ac.uk Amit Chakrabarti (Dartmouth) Andrew McGregor (U Mass Amherst) Justin Thaler (Harvard/Yahoo!) Suresh Venkatasubramanian

More information

2 How many distinct elements are in a stream?

2 How many distinct elements are in a stream? Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition

More information

Kernelization Lower Bounds: A Brief History

Kernelization Lower Bounds: A Brief History Kernelization Lower Bounds: A Brief History G Philip Max Planck Institute for Informatics, Saarbrücken, Germany New Developments in Exact Algorithms and Lower Bounds. Pre-FSTTCS 2014 Workshop, IIT Delhi

More information

Lecture 6: Expander Codes

Lecture 6: Expander Codes CS369E: Expanders May 2 & 9, 2005 Lecturer: Prahladh Harsha Lecture 6: Expander Codes Scribe: Hovav Shacham In today s lecture, we will discuss the application of expander graphs to error-correcting codes.

More information

Streaming and communication complexity of Hamming distance

Streaming and communication complexity of Hamming distance Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern

More information

Big Data. Big data arises in many forms: Common themes:

Big Data. Big data arises in many forms: Common themes: Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity

More information

Sketching in Adversarial Environments

Sketching in Adversarial Environments Sketching in Adversarial Environments Ilya Mironov Moni Naor Gil Segev Abstract We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch

More information

Lecture 5: Probabilistic tools and Applications II

Lecture 5: Probabilistic tools and Applications II T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010 CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010 Recall the definition of probabilistically checkable proofs (PCP) from last time. We say L has

More information

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) 12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

Improved Algorithms for Distributed Entropy Monitoring

Improved Algorithms for Distributed Entropy Monitoring Improved Algorithms for Distributed Entropy Monitoring Jiecao Chen Indiana University Bloomington jiecchen@umail.iu.edu Qin Zhang Indiana University Bloomington qzhangcs@indiana.edu Abstract Modern data

More information

Communication-Efficient Computation on Distributed Noisy Datasets. Qin Zhang Indiana University Bloomington

Communication-Efficient Computation on Distributed Noisy Datasets. Qin Zhang Indiana University Bloomington Communication-Efficient Computation on Distributed Noisy Datasets Qin Zhang Indiana University Bloomington SPAA 15 June 15, 2015 1-1 Model of computation The coordinator model: k sites and 1 coordinator.

More information

Lecture 2 Sept. 8, 2015

Lecture 2 Sept. 8, 2015 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in

More information

Compressed Sensing: Lecture I. Ronald DeVore

Compressed Sensing: Lecture I. Ronald DeVore Compressed Sensing: Lecture I Ronald DeVore Motivation Compressed Sensing is a new paradigm for signal/image/function acquisition Motivation Compressed Sensing is a new paradigm for signal/image/function

More information

New Characterizations in Turnstile Streams with Applications

New Characterizations in Turnstile Streams with Applications New Characterizations in Turnstile Streams with Applications Yuqing Ai 1, Wei Hu 1, Yi Li 2, and David P. Woodruff 3 1 The Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary

More information

Lecture Lecture 25 November 25, 2014

Lecture Lecture 25 November 25, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Improved Concentration Bounds for Count-Sketch

Improved Concentration Bounds for Count-Sketch Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds

More information

Data stream algorithms via expander graphs

Data stream algorithms via expander graphs Data stream algorithms via expander graphs Sumit Ganguly sganguly@cse.iitk.ac.in IIT Kanpur, India Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream

More information

Performance Analysis for Sparse Support Recovery

Performance Analysis for Sparse Support Recovery Performance Analysis for Sparse Support Recovery Gongguo Tang and Arye Nehorai ESE, Washington University April 21st 2009 Gongguo Tang and Arye Nehorai (Institute) Performance Analysis for Sparse Support

More information

An exponential separation between quantum and classical one-way communication complexity

An exponential separation between quantum and classical one-way communication complexity An exponential separation between quantum and classical one-way communication complexity Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

Sparse Recovery Using Sparse (Random) Matrices

Sparse Recovery Using Sparse (Random) Matrices Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic Linear Compression (learning Fourier coeffs, linear

More information

Lecture 4 Thursday Sep 11, 2014

Lecture 4 Thursday Sep 11, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

A Non-sparse Tutorial on Sparse FFTs

A Non-sparse Tutorial on Sparse FFTs A Non-sparse Tutorial on Sparse FFTs Mark Iwen Michigan State University April 8, 214 M.A. Iwen (MSU) Fast Sparse FFTs April 8, 214 1 / 34 Problem Setup Recover f : [, 2π] C consisting of k trigonometric

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information

Sparse recovery using sparse random matrices

Sparse recovery using sparse random matrices Sparse recovery using sparse random matrices Radu Berinde MIT texel@mit.edu Piotr Indyk MIT indyk@mit.edu April 26, 2008 Abstract We consider the approximate sparse recovery problem, where the goal is

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Lecture 5: February 21, 2017

Lecture 5: February 21, 2017 COMS 6998: Advanced Complexity Spring 2017 Lecture 5: February 21, 2017 Lecturer: Rocco Servedio Scribe: Diana Yin 1 Introduction 1.1 Last Time 1. Established the exact correspondence between communication

More information

Sparse Fourier Transform (lecture 1)

Sparse Fourier Transform (lecture 1) 1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 190, Great ideas in algorithms: Pairwise independent hash functions CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required

More information

Solution Recovery via L1 minimization: What are possible and Why?

Solution Recovery via L1 minimization: What are possible and Why? Solution Recovery via L1 minimization: What are possible and Why? Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Eighth US-Mexico Workshop on Optimization

More information

Algorithms for Distributed Functional Monitoring

Algorithms for Distributed Functional Monitoring Algorithms for Distributed Functional Monitoring Graham Cormode AT&T Labs and S. Muthukrishnan Google Inc. and Ke Yi Hong Kong University of Science and Technology We study what we call functional monitoring

More information

Direct product theorem for discrepancy

Direct product theorem for discrepancy Direct product theorem for discrepancy Troy Lee Rutgers University Joint work with: Robert Špalek Direct product theorems Knowing how to compute f, how can you compute f f f? Obvious upper bounds: If can

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information