Sketching and Streaming for Distributions

Size: px

Start display at page:

Download "Sketching and Streaming for Distributions"

Sydney Lawson
5 years ago
Views:

1 Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Main Material: Stable distributions, pseudo-random generators, embeddings, and data stream computation Piotr Indyk (FOCS 2000) Sketching information divergences Sudipto Guha, Piotr Indyk, Andrew McGregor (COLT 2007) Declaring independence via the sketching of sketches Piotr Indyk, Andrew McGregor (SODA 2008)

2 The Problem

3 The Problem List of m red values and m green values in [n] 3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4,...

4 The Problem List of m red values and m green values in [n] 3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4,... Define distributions (p1,..., pn) and (q1,..., qn)

5 The Problem List of m red values and m green values in [n] 3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4,... Define distributions (p1,..., pn) and (q1,..., qn) How different are p and q?

6 The Problem List of m red values and m green values in [n] 3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4,... Define distributions (p1,..., pn) and (q1,..., qn) How different are p and q? Variational: p i q i Euclidean: Kullback-Leibler: (p i q i ) 2 Hellinger: ( p i p i log(p i /q i ) q i ) 2

7 The Problem List of m red values and m green values in [n] 3,5,3,7,5,4,8,5,3,7,5,4,8,6,3,2,6,4,7,3,4,... Define distributions (p1,..., pn) and (q1,..., qn) How different are p and q? D f (p, q) = p i f (q i /p i ) B F (p, q) = [F (p i ) F (q i ) (p i q i )F (q i )] where f and F are convex and f(1)=0.

8 The Catch...

9 The Catch... What if m and n are huge and you can t store the list?

10 The Catch... What if m and n are huge and you can t store the list? Applications: monitoring internet traffic, I/O efficient external memory, processing huge log files, database query planning, sensor networks,...

11 The Catch... What if m and n are huge and you can t store the list? Applications: monitoring internet traffic, I/O efficient external memory, processing huge log files, database query planning, sensor networks,... Data Stream Model: No control over the order of the stream Limited working memory, e.g., polylog(n,m) space Limited time to process each element

12 The Catch... What if m and n are huge and you can t store the list? Applications: monitoring internet traffic, I/O efficient external memory, processing huge log files, database query planning, sensor networks,... Data Stream Model: No control over the order of the stream Limited working memory, e.g., polylog(n,m) space Limited time to process each element Previous work: quantiles, frequency moments, histograms, clustering, entropy, graph problems... see, e.g., Muthukrishnan Data Streams: Algorithms and Applications

13 Today s Talk

14 Today s Talk Sketching Lp distances (0<p 2): (1+ε)-approx. with prob. 1-δ in Õ(ε -2 ln δ -1 ) space Stable distributions and pseudo-random generators Stable distributions, pseudo-random generators, embeddings & data stream computation (Indyk, FOCS 2000)

15 Today s Talk Sketching Lp distances (0<p 2): (1+ε)-approx. with prob. 1-δ in Õ(ε -2 ln δ -1 ) space Stable distributions and pseudo-random generators Stable distributions, pseudo-random generators, embeddings & data stream computation (Indyk, FOCS 2000) Impossibility of Extending to Other Divergences: Can we sketch other divergences such as Hellinger? Lower bounds via communication complexity Sketching information divergences (Guha, Indyk, McGregor, COLT 2007)

16 Today s Talk Sketching Lp distances (0<p 2): (1+ε)-approx. with prob. 1-δ in Õ(ε -2 ln δ -1 ) space Stable distributions and pseudo-random generators Stable distributions, pseudo-random generators, embeddings & data stream computation (Indyk, FOCS 2000) Impossibility of Extending to Other Divergences: Can we sketch other divergences such as Hellinger? Lower bounds via communication complexity Sketching information divergences (Guha, Indyk, McGregor, COLT 2007) Using sketches to test independence: Testing independence between data streams Declaring independence via the sketching of sketches (Indyk, McGregor, SODA 2008)

17 1. Sketching Lp distances p-stable distributions, pseudo-random generators 2. The Unsketchables information divergences, communication complexity 3. Sketching Sketches identifying correlations in data streams

18 1. Sketching Lp distances p-stable distributions, pseudo-random generators 2. The Unsketchables information divergences, communication complexity 3. Sketching Sketches identifying correlations in data streams

19 Stable Distributions

20 Stable Distributions A p-stable distribution μ has the following property: If X, Y, Z µ and a, b R then : ax + by ( a p + b p ) 1/p Z

21 Stable Distributions A p-stable distribution μ has the following property: If X, Y, Z µ and a, b R then : ax + by ( a p + b p ) 1/p Z Examples: Normal(0,1) is 2-stable: 1 2π e x 2 /2 Cauchy is 1-stable: 1 π x 2

22 Approximating L1 and L2

23 Approximating L1 and L2 Let μ be a p-stable distribution (0<p 1)

24 Approximating L1 and L2 Let μ be a p-stable distribution (0<p 1) Ideal Algorithm: For i = 1 to k: Let x be a length n vector with x j ~ μ Compute t i = x.(p-q) Return median(t 1, t 2,..., t n )/median( μ )

25 Approximating L1 and L2 Let μ be a p-stable distribution (0<p 1) Ideal Algorithm: For i = 1 to k: Let x be a length n vector with x j ~ μ Compute t i = x.(p-q) Return median(t 1, t 2,..., t n )/median( μ ) Easy to compute x.(p-q): for stream 3,5,3,7,5,... compute x3-x5+x3-x7-x5-... and scale.

26 Approximating L1 and L2 Let μ be a p-stable distribution (0<p 1) Ideal Algorithm: For i = 1 to k: Let x be a length n vector with x j ~ μ Compute t i = x.(p-q) Return median(t 1, t 2,..., t n )/median( μ ) Easy to compute x.(p-q): for stream 3,5,3,7,5,... compute x3-x5+x3-x7-x5-... and scale. Lemma: Returns (1±ε)Lp(p-q) with prob. 1-δ, if k=õ(ε -2 ln δ -1 ).

27 Approximating L1 and L2 Let μ be a p-stable distribution (0<p 1) Ideal Algorithm: For i = 1 to k: Let x be a length n vector with x j ~ μ Compute t i = x.(p-q) Return median(t 1, t 2,..., t n )/median( μ ) Easy to compute x.(p-q): for stream 3,5,3,7,5,... compute x3-x5+x3-x7-x5-... and scale. Lemma: Returns (1±ε)Lp(p-q) with prob. 1-δ, if k=õ(ε -2 ln δ -1 ). Proof: Each ti ~ L1(p-q) μ by p-stablity property. Apply Chernoff bounds.

28 Sketches and Space

29 Sketches and Space Sketch/Embedding into Small Dimension:

30 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ

31 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y)

32 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p)

33 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p) CAUTION: Not an embedding into a normed space.

34 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p) CAUTION: Not an embedding into a normed space. Can we also construct sketch in small space:

35 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p) CAUTION: Not an embedding into a normed space. Can we also construct sketch in small space: Storing all x i requires Ω(nk) space.

36 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p) CAUTION: Not an embedding into a normed space. Can we also construct sketch in small space: Storing all x i requires Ω(nk) space. Generate x i with Nisan s pseudo-random generator.

37 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p) CAUTION: Not an embedding into a normed space. Can we also construct sketch in small space: Storing all x i requires Ω(nk) space. Generate x i with Nisan s pseudo-random generator. Can store the seed in O(polylog n) space.

38 Sketches and Space Sketch/Embedding into Small Dimension: Let x 1, x 2,..., x k be length n vector with xj i ~ μ Let C(y)= (x 1.y,..., x k.y) Approximate L1(p-q) from C(p) and C(p) CAUTION: Not an embedding into a normed space. Can we also construct sketch in small space: Storing all x i requires Ω(nk) space. Generate x i with Nisan s pseudo-random generator. Can store the seed in O(polylog n) space. Thm: Can (1+ε)-approx Lp(p-q) in Õ(ε -2 ln δ -1 ) space.

39 1. Sketching Lp distances p-stable distributions, pseudo-random generators 2. The Unsketchables information divergences, communication complexity 3. Sketching Sketches identifying correlations in data streams

40 Results

41 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c m

42 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c m

43 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c m a/m a/m+δ

44 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c m b/m b/m+δ

45 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is twice differentiable and strictly convex. m

46 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is twice differentiable and strictly convex. Corollary: Poly(n)-approx. of BF requires Ω(n) space if there exists ρ, z0 with or 0 z 2 z 1 z 0, F (z 1 )/F (z 2 ) (z 1 /z 2 ) ρ 0 z 2 z 1 z 0, F (z 1 )/F (z 2 ) (z 2 /z 1 ) ρ m

47 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is twice differentiable and strictly convex. Corollary: Poly(n)-approx. of BF requires Ω(n) space if there exists ρ, z0 with 0 z 2 z 1 z 0, F (z 1 )/F (z 2 ) (z 1 /z 2 ) ρ or 0 z 2 z 1 z 0, F (z 1 )/F (z 2 ) (z 2 /z 1 ) ρ Only exceptions are L1 and L2! m

48 Results Thm (Shift Invariance): t-approx. of φ(pi, q i ) needs Ω(n) space if for some a, b, c and m=an/4+bn+cn/2, φ ( a m, a+c m ) > t 2 n 4 ( φ ( b+c m, b m ) ( + φ b m, )) b+c Corollary: Poly(n)-approx. of Df requires Ω(n) space if f is twice differentiable and strictly convex. Corollary: Poly(n)-approx. of BF requires Ω(n) space if there exists ρ, z0 with 0 z 2 z 1 z 0, F (z 1 )/F (z 2 ) (z 1 /z 2 ) ρ or 0 z 2 z 1 z 0, F (z 1 )/F (z 2 ) (z 2 /z 1 ) ρ Only exceptions are L1 and L2!! BREAKING NEWS: Many of these lower bounds also apply for randomly ordered streams [Chakrabarti, Cormode, McGregor 2007] m

49 Alice x {0,1} n weight n/4 Bob y {0,1} n weight n/4

50 Alice x {0,1} n weight n/4 Question: Are x and y disjoint, i.e., x.y=0? Thm (Razborov 92): Needs Ω(n) communication. Bob y {0,1} n weight n/4

51 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) Alice x {0,1} n weight n/4 Bob y {0,1} n weight n/4

52 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) Alice x {0,1} n weight n/4 Bob y {0,1} n weight n/4

53 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) cyi copies of i (i [n]) c copies of i+n (i [n/4]) Alice x {0,1} n weight n/4 Bob y {0,1} n weight n/4

54 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) cyi copies of i (i [n]) c copies of i+n (i [n/4]) Alice x {0,1} n weight n/4 Bob y {0,1} n weight n/4

55 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) cyi copies of i (i [n]) c copies of i+n (i [n/4]) Alice x {0,1} n weight n/4 If x.y = 0 then divergence is n 4 ( φ ( b m, b + c m ) ( b + c + φ m, b )) m If x.y =1 then divergence is at least φ ( a m, a + c m Factor t 2 difference by assumption ) Bob y {0,1} n weight n/4

56 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) cyi copies of i (i [n]) c copies of i+n (i [n/4]) Alice x {0,1} n weight n/4 Let A be a t-approx algorithm: 1. Alice runs A on first half 2. Transmits memory state 3. Bob instantiates A 4. Continues A on second half Bob y {0,1} n weight n/4

57 axi+b(1-xi) copies of i & i (i [n]) b copies of i+n & i+n (i [n/4]) cyi copies of i (i [n]) c copies of i+n (i [n/4]) Alice x {0,1} n weight n/4 Let A be a t-approx algorithm: 1. Alice runs A on first half 2. Transmits memory state 3. Bob instantiates A 4. Continues A on second half Thm: Any t-approx algorithm for the divergence requires Ω(n) memory. Bob y {0,1} n weight n/4

58 Corollary to Df

59 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive.

60 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive. Proof: Set a=c=1 and b=t 2 n(f (1)+1)/ 8f(2)

61 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive. Proof: Set a=c=1 and b=t 2 n(f (1)+1)/ 8f(2) Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m) [ f(1) + f (1)/b + f (1 + γ)/(2b 2 ) ] (f (1) + 1)/(2mb) 8φ(a/m, (a + c)/m)/(t 2 n)

62 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive. Proof: Set a=c=1 and b=t 2 n(f (1)+1)/ 8f(2) Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m) [ f(1) + f (1)/b + f (1 + γ)/(2b 2 ) ] (f (1) + 1)/(2mb) 8φ(a/m, (a + c)/m)/(t 2 n)

63 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive. Proof: Set a=c=1 and b=t 2 n(f (1)+1)/ 8f(2) Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m) [ f(1) + f (1)/b + f (1 + γ)/(2b 2 ) ] (f (1) + 1)/(2mb) 8φ(a/m, (a + c)/m)/(t 2 n)

64 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive. Proof: Set a=c=1 and b=t 2 n(f (1)+1)/ 8f(2) Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m) [ f(1) + f (1)/b + f (1 + γ)/(2b 2 ) ] (f (1) + 1)/(2mb) 8φ(a/m, (a + c)/m)/(t 2 n) Similarly, φ((b + c)/m, b/m) 8φ(a/m, (a + c)/m)/(t 2 n)

65 Corollary to Df Corollary: Any poly(n) approx. of D f (p, q) = p i f (q i /p i ) requires Ω(n) space if f exists and is strictly positive. Proof: Set a=c=1 and b=t 2 n(f (1)+1)/ 8f(2) Take Taylor expansion of f around 1: φ(b/m, (b + c)/m) = (b/m) [ f(1) + f (1)/b + f (1 + γ)/(2b 2 ) ] (f (1) + 1)/(2mb) 8φ(a/m, (a + c)/m)/(t 2 n) Similarly, φ((b + c)/m, b/m) 8φ(a/m, (a + c)/m)/(t 2 n) Result follows by the Shift-Invariant Theorem.

66 Additive Error Algorithms

67 Additive Error Algorithms f-divergences: Thm: If Df bounded, Oε(ln δ -1 ) space is sufficient for ±ε approx. with prob. 1-δ. Thm: If Df unbounded, need Ω(n) space.

68 Additive Error Algorithms f-divergences: Thm: If Df bounded, Oε(ln δ -1 ) space is sufficient for ±ε approx. with prob. 1-δ. Thm: If Df unbounded, need Ω(n) space. Bregman Divergences: Thm: If F(0), F (0), and F (. ) exist, Oε(ln δ -1 ) space is sufficient for ±ε approx. with prob. 1-δ. Thm: If F(0) or F (0) infinite, need Ω(n) space.

69 1. Sketching Lp distances p-stable distributions, pseudo-random generators 2. The Unsketchables information divergences, communication complexity 3. Sketching Sketches identifying correlations in data streams

70 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables:

71 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables: Xp with distribution (p1,..., pn)

72 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables: Xp with distribution (p1,..., pn) Xq with distribution (q1,..., qn)

73 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables: Xp with distribution (p1,..., pn) Xq with distribution (q1,..., qn) (Xp, Xq) with distribution (r11, r12,..., rnn)

74 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables: Xp with distribution (p1,..., pn) Xq with distribution (q1,..., qn) (Xp, Xq) with distribution (r11, r12,..., rnn) How independent are Xp and Xq?

75 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables: Xp with distribution (p1,..., pn) Xq with distribution (q1,..., qn) (Xp, Xq) with distribution (r11, r12,..., rnn) How independent are Xp and Xq? Is the joint distribution far from the product distribution (s11, s12,..., snn)?

76 New Problem List of m pairs in [n] x [n]: (3,5), (5,3), (2,7), (3,4), (7,1), (1,2), (3,9), (6,6),... Stream defines random variables: Xp with distribution (p1,..., pn) Xq with distribution (q1,..., qn) (Xp, Xq) with distribution (r11, r12,..., rnn) How independent are Xp and Xq? Is the joint distribution far from the product distribution (s11, s12,..., snn)? Consider L1(s-r) or L2(s-r) or mutual information: I (X p ; X q ) = H(X p ) + H(X q ) H(X p, X q ) = i,j r ij lg p i r ij

77 Results Estimating L2(s-r): Thm: (1+ε)-factor approx. (w/p 1-δ) in Õ(ε -2 ln δ -1 ) space. Estimating L1(s-r): Thm: O(ln n)-factor approx. (w/p 1-δ) in Õ(ln δ -1 ) space. Thm: ±ε approx. (w/p 1-δ) in Õ(ε -4 ln δ -1 ) space (2-pass). Estimating I(Xp, Xq): Thm: No 5/4-factor approx. (w/p 4/5) in O(n) space. Thm: ±ε approx. (w/p 1-δ) in Õ(ε -2 ln δ -1 ) space.

78 L2 Sketching Revisited

79 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept.

80 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept. Compute x.(p-q)...

81 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept. Compute x.(p-q)... E[(x.(p q)) 2 ] = Σ i,j E[x i x j ](p i q i )(p j q j ) = (L 2 (p q)) 2 Var[(x.(p q)) 2 ] E[(x.(p q)) 4 ] = Σ i,j,k,l E[x i x j x k x l ](p i q i )(p j q j )(p k q k )(p l q l ) = (L 2 (p q)) 4

82 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept. Compute x.(p-q)... E[(x.(p q)) 2 ] = Σ i,j E[x i x j ](p i q i )(p j q j ) = (L 2 (p q)) 2 Var[(x.(p q)) 2 ] E[(x.(p q)) 4 ] = Σ i,j,k,l E[x i x j x k x l ](p i q i )(p j q j )(p k q k )(p l q l ) = (L 2 (p q)) 4

83 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept. Compute x.(p-q)... E[(x.(p q)) 2 ] = Σ i,j E[x i x j ](p i q i )(p j q j ) = (L 2 (p q)) 2 Var[(x.(p q)) 2 ] E[(x.(p q)) 4 ] = Σ i,j,k,l E[x i x j x k x l ](p i q i )(p j q j )(p k q k )(p l q l ) = (L 2 (p q)) 4

84 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept. Compute x.(p-q)... E[(x.(p q)) 2 ] = Σ i,j E[x i x j ](p i q i )(p j q j ) = (L 2 (p q)) 2 Var[(x.(p q)) 2 ] E[(x.(p q)) 4 ] = Σ i,j,k,l E[x i x j x k x l ](p i q i )(p j q j )(p k q k )(p l q l ) = (L 2 (p q)) 4

85 L2 Sketching Revisited Let x {-1,1} n where xi are unbiased 4-wise indept. Compute x.(p-q)... E[(x.(p q)) 2 ] = Σ i,j E[x i x j ](p i q i )(p j q j ) = (L 2 (p q)) 2 Var[(x.(p q)) 2 ] E[(x.(p q)) 4 ] = Σ i,j,k,l E[x i x j x k x l ](p i q i )(p j q j )(p k q k )(p l q l ) = (L 2 (p q)) 4 Thm: By Chebychev bounds, the average of O(ε -2 ln δ -1 ) repetitions yields (1±ε) L2(p-q) with prob. 1-δ. [Alon, Matias, Szegedy 1996]

86 Testing L2 Independence

87 Testing L2 Independence Idea: Estimate L2(r-s) using z { 1, 1} n n where zij are unbiased 4-wise independent.

88 Testing L2 Independence Idea: Estimate L2(r-s) using z { 1, 1} n n where zij are unbiased 4-wise independent. Problem: Can t compute sketch of product distribution!

89 Testing L2 Independence Idea: Estimate L2(r-s) using z { 1, 1} n n unbiased 4-wise independent. where zij are Problem: Can t compute sketch of product distribution! Solution: Let x, y { 1, 1} n be 4-wise independent and set zij = xi yj. z.s = ij z ijs ij = (x.p)(y.q)

90 Testing L2 Independence Idea: Estimate L2(r-s) using z { 1, 1} n n unbiased 4-wise independent. where zij are Problem: Can t compute sketch of product distribution! Solution: Let x, y { 1, 1} n be 4-wise independent and set zij = xi yj. Entries are no longer 4-wise independent but it s okay. Let aij = rij - sij, and consider T = Σij zij aij : Var[T 2 ] E[T 4 ] z.s = ij z ijs ij = (x.p)(y.q) = Σ i1,j 1,i 2,j 2,i 3,j 3,i 4,j 4 E[z i1 j 1 z i2 j 2 z i3 j 3 z i4 j 4 ]a i1 j 1 a i2 j 2 a i3 j 3 a i4 j 4 3(L 2 (p q)) 4

91 Testing L2 Independence Idea: Estimate L2(r-s) using z { 1, 1} n n unbiased 4-wise independent. where zij are Problem: Can t compute sketch of product distribution! Solution: Let x, y { 1, 1} n be 4-wise independent and set zij = xi yj. Entries are no longer 4-wise independent but it s okay. Let aij = rij - sij, and consider T = Σij zij aij : Var[T 2 ] E[T 4 ] z.s = ij z ijs ij = (x.p)(y.q) = Σ i1,j 1,i 2,j 2,i 3,j 3,i 4,j 4 E[z i1 j 1 z i2 j 2 z i3 j 3 z i4 j 4 ]a i1 j 1 a i2 j 2 a i3 j 3 a i4 j 4 3(L 2 (p q)) 4

92 Testing L2 Independence Idea: Estimate L2(r-s) using z { 1, 1} n n unbiased 4-wise independent. where zij are Problem: Can t compute sketch of product distribution! Solution: Let x, y { 1, 1} n be 4-wise independent and set zij = xi yj. Entries are no longer 4-wise independent but it s okay. Let aij = rij - sij, and consider T = Σij zij aij : Var[T 2 ] E[T 4 ] z.s = ij z ijs ij = (x.p)(y.q) = Σ i1,j 1,i 2,j 2,i 3,j 3,i 4,j 4 E[z i1 j 1 z i2 j 2 z i3 j 3 z i4 j 4 ]a i1 j 1 a i2 j 2 a i3 j 3 a i4 j 4 3(L 2 (p q)) 4 Repeat O(ε -2 ln δ -1 ) times to deduce (1±ε) L2(r-s)

Summary Small space sketches of L1 and L2 using p-stable distributions. No small space sketches exists for other information divergences. Can use sketching ideas to estimate independence.

93 Summary Small space sketches of L1 and L2 using p-stable distributions. No small space sketches exists for other information divergences. Can use sketching ideas to estimate independence. Main Material: Stable distributions, pseudo-random generators, embeddings, and data stream computation Piotr Indyk (FOCS 2000) Sketching information divergences Sudipto Guha, Piotr Indyk, Andrew McGregor (COLT 2007) Declaring independence via the sketching of sketches Piotr Indyk, Andrew McGregor (SODA 2008)

Declaring Independence via the Sketching of Sketches. Until August Hire Me!

Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem