Tight Bounds for Distributed Functional Monitoring
|
|
- Alan Mills
- 5 years ago
- Views:
Transcription
1 Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan
2 The distributed streaming model (a.k.a. distributed functional/continuous monitoring) coordinator C sites S 1 S 2 S 3 S k A(t) : set of elements received up to time t from all sites. a a Assume 1 item comes at each time unit. time 2-1
3 The distributed streaming model (a.k.a. distributed functional/continuous monitoring) coordinator C The coordinator needs to maintain f(a(t)) for all t. sites S 1 S 2 S 3 S k A(t) : set of elements received up to time t from all sites. a a Assume 1 item comes at each time unit. time 2-2
4 The distributed streaming model (a.k.a. distributed functional/continuous monitoring) coordinator C The coordinator needs to maintain f(a(t)) for all t. sites S 1 S 2 S 3 S k A(t) : set of elements received up to time t from all sites. a a Assume 1 item comes at each time unit. time Goal: minimize communication cost 2-3
5 Problems coordinator C Static case (a one-shot/static computation at the end) Top-k sites S 1 S 2 S 3 S k Heavy-hitter... Dynamic case Samplings time The Distributed Streaming Model Frequent moments (F 0, F 1, F 2,...) Heavy-hitter Quantile Entropy Non-linear functions
6 This talk What you would like to see: Efficient algorithms/protocols Practical heuristics 4-1
7 This talk What you would like to see: Efficient algorithms/protocols Practical heuristics What you (probably) do not want to see: Useless impossibility results Complicated proofs 4-2
8 This talk What you would like to see: Efficient algorithms/protocols Practical heuristics What you (probably) do not want to see: Useless impossibility results Complicated proofs Unfortunately, in the next 30 minutes
9 The multiparty communication model A model for lower bounds x 1 = x 2 = x k = x 3 = We want to compute f(x 1, x 2,..., x k ) f can be bit-wise XOR, OR, AND, MAJ
10 The multiparty communication model A model for lower bounds x 1 = x 2 = Blackboard: One speaks, everyone else hears. Message passing: If x 1 talks to x 2, others cannot hear. x k = x 3 = We want to compute f(x 1, x 2,..., x k ) f can be bit-wise XOR, OR, AND, MAJ
11 The multiparty communication model A model for lower bounds x 1 = x 2 = Blackboard: One speaks, everyone else hears. Message passing: If x 1 talks to x 2, others cannot hear. x k = x 3 = = coordinator C We want to compute f(x 1, x 2,..., x k ) f can be bit-wise XOR, OR, AND, MAJ... S 1 S 2 S 3 S k sites 5-3
12 Previously Some works in the blackboard model. Almost nothing in the message-passing model. 6-1
13 Previously Some works in the blackboard model. Almost nothing in the message-passing model. This SODA, with Jeff Phillips and Elad Verbin we proposed a general and elegant technique called symmetrization which works in both variants. In particular, we obtained (in the message-passing model) 1. Ω(nk) for the bitwise-xor/or/and/maj. 2. Ω(nk) for connectivity. 6-2
14 Previously Some works in the blackboard model. Almost nothing in the message-passing model. This SODA, with Jeff Phillips and Elad Verbin we proposed a general and elegant technique called symmetrization which works in both variants. In particular, we obtained (in the message-passing model) 1. Ω(nk) for the bitwise-xor/or/and/maj. 2. Ω(nk) for connectivity. Artificial? Well
15 Previously Some works in the blackboard model. Almost nothing in the message-passing model. This SODA, with Jeff Phillips and Elad Verbin we proposed a general and elegant technique called symmetrization which works in both variants. In particular, we obtained (in the message-passing model) 1. Ω(nk) for the bitwise-xor/or/and/maj. 2. Ω(nk) for connectivity. Artificial? Well... In any case, let s look at real important problems. 6-4
16 Now, important problems Samplings Frequent moments (F 0, F 1, F 2,...) Heavy-hitter Quantile Entropy
17 Now, important problems Samplings Solved Frequent moments (F 0, F 1, F 2,...) Heavy-hitter Quantile Entropy
18 Now, important problems Samplings Solved Frequent moments (F 0, F 1, F 2,...) Heavy-hitter Quantile Our work Entropy
19 8-1 Results
20 Results (Almost) tight bounds for all these questions Static lower bounds (almost) match dynamic upper bounds. (up to polylog factors) 8-2
21 Results Today (Almost) tight bounds for all these questions Static lower bounds (almost) match dynamic upper bounds. (up to polylog factors) 8-3
22 F 0 upper bound (Cormode, Muthu and Yi 2008) 9-1
23 The (1 + ε)-approximation F 0 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 0 ( i k X i ) up to (1 + ε)-approximation How many distinct items? A fundamental problem in data analysis. 10-1
24 The (1 + ε)-approximation F 0 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 0 ( i k X i ) up to (1 + ε)-approximation How many distinct items? A fundamental problem in data analysis. Current best UB: Õ(k/ε2 ) (Cormode, Muthu, Yi 2008) Holds in the dynamic case. 10-2
25 General idea for the one-shot computation Each site generates a sketch via small-space streaming algorithms. The coordinator combines (via communication) the sketches from the k sites to obtain a global sketch, from which we can extract the answer. 11-1
26 The FM sketch Take a pair-wise independent random hash function h : {1,..., n} {1,..., 2 d }, where 2 d > n 12-1
27 The FM sketch Take a pair-wise independent random hash function h : {1,..., n} {1,..., 2 d }, where 2 d > n For each incoming element x, compute h(x) e.g., h(5) = Count how many trailing zeros Remember the max # trailing zeroes in any h(x) 12-2
28 The FM sketch Take a pair-wise independent random hash function h : {1,..., n} {1,..., 2 d }, where 2 d > n For each incoming element x, compute h(x) e.g., h(5) = Count how many trailing zeros Remember the max # trailing zeroes in any h(x) Let Y be the max # trailing zeroes Can show E[2 Y ] = #distinct elements 12-3
29 One-shot case, the FM sketch (cont.) So 2 Y is an unbiased estimator for # distinct elements 13-1
30 One-shot case, the FM sketch (cont.) So 2 Y is an unbiased estimator for # distinct elements However, has a large variance Some techniques [Bar-Yossef et. al. 2002] can produce a good estimator that has probability 1 δ to be within relative error ε. Space increased to Õ(1/ε2 ) 13-2
31 One-shot case, the FM sketch (cont.) So 2 Y is an unbiased estimator for # distinct elements However, has a large variance Some techniques [Bar-Yossef et. al. 2002] can produce a good estimator that has probability 1 δ to be within relative error ε. Space increased to Õ(1/ε2 ) FM sketch has linearity Y 1 from A, Y 2 from B, then 2 max{y 1,Y 2 } estimates # distinct items in A B. 13-3
32 One-shot case, the FM sketch (cont.) So 2 Y is an unbiased estimator for # distinct elements However, has a large variance Some techniques [Bar-Yossef et. al. 2002] can produce a good estimator that has probability 1 δ to be within relative error ε. Space increased to Õ(1/ε2 ) FM sketch has linearity Y 1 from A, Y 2 from B, then 2 max{y 1,Y 2 } estimates # distinct items in A B. Thus, we can use it to design a one-shot algorithm with communication Õ(k/ε2 ) 13-4
33 14-1 F 0 lower bound
34 The F 0 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 0 ( i k X i ) up to (1 + ε)-approximation How many distinct items? A fundamental problem in data analysis. 15-1
35 The F 0 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 0 ( i k X i ) up to (1 + ε)-approximation How many distinct items? A fundamental problem in data analysis Current best UB: Õ(k/ε 2 ) (Cormode, Muthu, Yi, 2008) Holds in the dynamic case. Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008) Ω(1/ε 2 ) (reduction from Gap-Hamming) Our LB: Ω(k/ε 2 ). Holds in the static and message-passing case.
36 The F 0 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 0 ( i k X i ) up to (1 + ε)-approximation How many distinct items? A fundamental problem in data analysis Tight! Current best UB: Õ(k/ε 2 ) (Cormode, Muthu, Yi, 2008) Holds in the dynamic case. Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008) Ω(1/ε 2 ) (reduction from Gap-Hamming) Our LB: Ω(k/ε 2 ). Holds in the static and message-passing case.
37 The proof framework Step 1: We first introduce a simpler problem called k-gap-maj Step 2: We compose k-gap-maj with the Set Disjointness problem using information cost to prove a lower bound for F
38 k-gap-maj We have k sites S 1, S 2,..., S k. S i holds a bit Z i which is 1 w.p. β and 0 w.p. 1 β where ω(1/k) β 1/2 is a prefixed value. Our goal: compute the following function. GM(Z 1, Z 2,..., Z k ) = 0, 1, if i [k] Z i βk βk, if i [k] Z i βk + βk,, otherwise, where means that the answer can be arbitrary. 17-1
39 k-gap-maj We have k sites S 1, S 2,..., S k. S i holds a bit Z i which is 1 w.p. β and 0 w.p. 1 β where ω(1/k) β 1/2 is a prefixed value. Our goal: compute the following function. GM(Z 1, Z 2,..., Z k ) = 0, 1, if i [k] Z i βk βk, if i [k] Z i βk + βk,, otherwise, where means that the answer can be arbitrary. Lemma 1: If a protocol P computes k-gap-maj correctly w.p , then w.p. Ω(1), the protocol has to learn at least Ω(k) of Z i each with Ω(1) bit (that is, H(Z i Π) H b (0.01β)). 17-2
40 k-gap-maj We have k sites S 1, S 2,..., S k. S i holds a bit Z i which is 1 w.p. β and 0 w.p. 1 β where ω(1/k) β 1/2 is a prefixed value. Our goal: compute the following function. GM(Z 1, Z 2,..., Z k ) = 0, 1, if i [k] Z i βk βk, if i [k] Z i βk + βk,, otherwise, where means that the answer can be arbitrary. Lemma 1: If a protocol P computes k-gap-maj correctly w.p , then w.p. Ω(1), the protocol has to learn at least Ω(k) of Z i each with Ω(1) bit (that is, H(Z i Π) H b (0.01β)). Alternatively: I(Z 1, Z 2,..., Z k ; Π) = Ω(k) 17-3
41 Set disjointness (2-DISJ) Alice x {0, 1} n x y =? Bob y {0, 1} n 18-1
42 Set disjointness (2-DISJ) Alice x {0, 1} n x y =? Bob y {0, 1} n A classical hard instance: Distribution µ: X and Y are both random subsets of size l = (n+1)/4 from [n] such that X Y = 1 w.p. β and X Y = 0 w.p. 1 β. Razborov [1990] shows an Ω(n) for this hard distribution and error β/
43 Next step: Compose k-gap-maj with 2-DISJ n = Θ(1/ε 2 ) l = (n + 1)/4 β = 1/kε 2 coordinator C sites S 1 S 2 S 3 S k 19-1
44 Next step: Compose k-gap-maj with 2-DISJ n = Θ(1/ε 2 ) l = (n + 1)/4 β = 1/kε 2 coordinator C Y Step 1: Pick Y = y [n] of size l uniformly at random sites S 1 S 2 S 3 S k 19-2
45 Next step: Compose k-gap-maj with 2-DISJ n = Θ(1/ε 2 ) l = (n + 1)/4 β = 1/kε 2 coordinator C Y Step 1: Pick Y = y [n] of size l uniformly at random sites S 1 S 2 S 3 S k X 1 X 2 X 3 X k Step 2: Pick X 1,..., X k [n] indepedently and randomly from µ Y =y 19-3
46 Next step: Compose k-gap-maj with 2-DISJ n = Θ(1/ε 2 ) l = (n + 1)/4 β = 1/kε 2 coordinator C Y Step 1: Pick Y = y [n] of size l uniformly at random F 0 (X 1, X 2,..., X k )? sites S 1 S 2 S 3 S k X 1 X 2 X 3 X k Step 2: Pick X 1,..., X k [n] indepedently and randomly from µ Y =y 19-4
47 The proof coordinator C Y Z i = X i Y { 1 w.p. β 0 w.p. 1 β sites S 1 S 2 S 3 S k X 1 X 2 X 3 X k F 0 (X 1, X 2,..., X k ) k-gap-maj(z 1, Z 2,..., Z k ) (Z i = X i Y ) learn Ω(k) Z i s well (by Lemma 1) need Ω(k/ε 2 ) bits (learning each Z i = X i Y well needs Ω(n) = Ω(1/ε 2 ) bits, by 2-DISJ) 20-1
48 The proof coordinator C Y Z i = X i Y { 1 w.p. β 0 w.p. 1 β sites S 1 S 2 S 3 S k X 1 X 2 X 3 X k F 0 (X 1, X 2,..., X k ) k-gap-maj(z 1, Z 2,..., Z k ) Q.E.D. (Z i = X i Y ) learn Ω(k) Z i s well (by Lemma 1) need Ω(k/ε 2 ) bits (learning each Z i = X i Y well needs Ω(n) = Ω(1/ε 2 ) bits, by 2-DISJ) 20-2
49 Proof sketch of Lemma 1 Lemma 1: If a protocol P computes k-gap-maj correctly w.p , then w.p. Ω(1), for Ω(k) Z i s, we have H(Z i Π) H b (0.01β). Proof: 1. Suppose Π does not satisfy this. 2. Since the Z i are independent given Π, k i=1 Z i Π is a sum of independent Bernoulli random variables. 3. Since most H(Z i Π) are large, by anti-concentration, both of the following events occur with constant probability: k i=1 Z i Π > βk + βk, k i=1 Z i Π < βk βk. 4. So P can t succeed with large probability. 21-1
50 22-1 F 2 lower bound
51 The F 2 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 2 ( i k X i ) up to (1 + ε)-approximation Join What s the size of self-join? Another fundamental problem in data analysis. 23-1
52 The F 2 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 2 ( i k X i ) up to (1 + ε)-approximation Join What s the size of self-join? Another fundamental problem in data analysis. Previous UB: Õ(k 2 /ε + k 1.5 /ε 3 ) (Cormode, Muthu, Yi 2008) Our UB: Õ(k/poly(ε)), one way protocol Holds in the dynamic case. Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008) Our LB: Ω(k/ε 2 ). Holds in the static and blackboard case. 23-2
53 The F 2 problem We have k sites S 1, S 2,..., S k. S i holds a set X i. Our goal: compute F 2 ( i k X i ) up to (1 + ε)-approximation Join Almost Tight! What s the size of self-join? Another fundamental problem in data analysis. Previous UB: Õ(k 2 /ε + k 1.5 /ε 3 ) (Cormode, Muthu, Yi 2008) Our UB: Õ(k/poly(ε)), one way protocol Holds in the dynamic case. Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008) Our LB: Ω(k/ε 2 ). Holds in the static and blackboard case. 23-3
54 A quick glance: (1 + ε)-approximation F 2 2-party gap-hamming: Alice has X = {X 1, X 2,..., X 1/ε 2}, Bob has Y = {Y 1, Y 2,..., Y 1/ε 2}. They want to compute: GHD(X, Y ) = 0, 1, if i [1/ε 2 ] X i Y i 1/2ε 2 1/ε, if i [1/ε 2 ] X i Y i 1/2ε 2 + 1/ε,, otherwise, where means that the answer can be arbitrary. 24-1
55 A quick glance: (1 + ε)-approximation F 2 2-party gap-hamming: Alice has X = {X 1, X 2,..., X 1/ε 2}, Bob has Y = {Y 1, Y 2,..., Y 1/ε 2}. They want to compute: GHD(X, Y ) = 0, 1, if i [1/ε 2 ] X i Y i 1/2ε 2 1/ε, if i [1/ε 2 ] X i Y i 1/2ε 2 + 1/ε,, otherwise, where means that the answer can be arbitrary. k-disj: We have k sites S 1, S 2,..., S k. S i holds a set Z i. We promise that either Z i (i = 1,..., k) are all disjoint, or they intersect on one element and the rest are all disjoint (sun-flower). The goal is to find out which is the case. 24-2
56 A quick glance: (1 + ε)-approximation F 2 2-party gap-hamming: Alice has X = {X 1, X 2,..., X 1/ε 2}, Bob has Y = {Y 1, Y 2,..., Y 1/ε 2}. They want to compute: GHD(X, Y ) = 0, 1, if i [1/ε 2 ] X i Y i 1/2ε 2 1/ε, if i [1/ε 2 ] X i Y i 1/2ε 2 + 1/ε,, otherwise, where means that the answer can be arbitrary. k-xor 2 copies k-disj: We have k sites S 1, S 2,..., S k. S i holds a set Z i. We promise that either Z i (i = 1,..., k) are all disjoint, or they intersect on one element and the rest are all disjoint (sun-flower). The goal is to find out which is the case. 24-3
57 A quick glance: (1 + ε)-approximation F 2 2-party gap-hamming: Alice has X = {X 1, X 2,..., X 1/ε 2}, Bob has Y = {Y 1, Y 2,..., Y 1/ε 2}. They want to compute: GHD(X, Y ) = 0, 1, if i [1/ε 2 ] X i Y i 1/2ε 2 1/ε, if i [1/ε 2 ] X i Y i 1/2ε 2 + 1/ε,, otherwise, where means that the answer can be arbitrary. k-xor 2 copies compose via information cost k-bta k-disj: We have k sites S 1, S 2,..., S k. S i holds a set Z i. We promise that either Z i (i = 1,..., k) are all disjoint, or they intersect on one element and the rest are all disjoint (sun-flower). The goal is to find out which is the case. 24-4
58 A quick glance: (1 + ε)-approximation F 2 2-party gap-hamming: Alice has X = {X 1, X 2,..., X 1/ε 2}, Bob has Y = {Y 1, Y 2,..., Y 1/ε 2}. They want to compute: GHD(X, Y ) = 0, 1, if i [1/ε 2 ] X i Y i 1/2ε 2 1/ε, if i [1/ε 2 ] X i Y i 1/2ε 2 + 1/ε,, otherwise, where means that the answer can be arbitrary. k-xor 2 copies compose via information cost k-bta CC(k-BTA) = Ω(k/ε 2 ) k-disj: We have k sites S 1, S 2,..., S k. S i holds a set Z i. We promise that either Z i (i = 1,..., k) are all disjoint, or they intersect on one element and the rest are all disjoint (sun-flower). The goal is to find out which is the case. 24-5
59 A quick glance: (1 + ε)-approximation F 2 2-party gap-hamming: Alice has X = {X 1, X 2,..., X 1/ε 2}, Bob has Y = {Y 1, Y 2,..., Y 1/ε 2}. They want to compute: GHD(X, Y ) = 0, 1, if i [1/ε 2 ] X i Y i 1/2ε 2 1/ε, if i [1/ε 2 ] X i Y i 1/2ε 2 + 1/ε,, otherwise, where means that the answer can be arbitrary. k-xor 2 copies compose via information cost k-bta CC(k-BTA) = Ω(k/ε 2 ) k-disj: We have k sites S 1, S 2,..., S k. S i holds a set Z i. We promise that either Z i (i = 1,..., k) are all disjoint, or they intersect on one element and the rest are all disjoint (sun-flower). The goal is to find out which is the case. Finally, we reduce F 2 to k-bta. 24-6
60 The end T HAN K YOU Q and A 25-1
Tight Bounds for Distributed Streaming
Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming
More informationLower Bound Techniques for Multiparty Communication Complexity
Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand
More informationLower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy
Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy Jeff M. Phillips School of Computing University of Utah jeffp@cs.utah.edu Elad Verbin Dept. of Computer Science Aarhus University,
More informationc 2016 Jeff M. Phillips, Elad Verbin, and Qin Zhang
SIAM J. COMPUT. Vol. 45, No. 1, pp. 174 196 c 2016 Jeff M. Phillips, Elad Verbin, and Qin Zhang LOWER BOUNDS FOR NUMBER-IN-HAND MULTIPARTY COMMUNICATION COMPLEXITY, MADE EASY JEFF M. PHILLIPS, ELAD VERBIN,
More informationDistributed Statistical Estimation of Matrix Products with Applications
Distributed Statistical Estimation of Matrix Products with Applications David Woodruff CMU Qin Zhang IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p-norms, heavy-hitters,... A {0, 1} m
More informationCS Foundations of Communication Complexity
CS 2429 - Foundations of Communication Complexity Lecturer: Sergey Gorbunov 1 Introduction In this lecture we will see how to use methods of (conditional) information complexity to prove lower bounds for
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More informationStreaming and communication complexity of Hamming distance
Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationInformation Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017
Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017 Coding vs complexity: a tale of two theories Coding Goal: data transmission Different channels
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationSketching and Streaming for Distributions
Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Main Material: Stable distributions, pseudo-random generators,
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationCommunication-Efficient Computation on Distributed Noisy Datasets. Qin Zhang Indiana University Bloomington
Communication-Efficient Computation on Distributed Noisy Datasets Qin Zhang Indiana University Bloomington SPAA 15 June 15, 2015 1-1 Model of computation The coordinator model: k sites and 1 coordinator.
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationCommunication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log
Communication Complexity 6:98:67 2/5/200 Lecture 6 Lecturer: Nikos Leonardos Scribe: Troy Lee Information theory lower bounds. Entropy basics Let Ω be a finite set and P a probability distribution on Ω.
More informationA Near-Optimal Algorithm for Computing the Entropy of a Stream
A Near-Optimal Algorithm for Computing the Entropy of a Stream Amit Chakrabarti ac@cs.dartmouth.edu Graham Cormode graham@research.att.com Andrew McGregor andrewm@seas.upenn.edu Abstract We describe a
More informationLecture 16: Communication Complexity
CSE 531: Computational Complexity I Winter 2016 Lecture 16: Communication Complexity Mar 2, 2016 Lecturer: Paul Beame Scribe: Paul Beame 1 Communication Complexity In (two-party) communication complexity
More informationAlgorithms for Querying Noisy Distributed/Streaming Datasets
Algorithms for Querying Noisy Distributed/Streaming Datasets Qin Zhang Indiana University Bloomington Sublinear Algo Workshop @ JHU Jan 9, 2016 1-1 The big data models The streaming model (Alon, Matias
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store
More informationNear-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness
Near-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness Amit Chakrabarti School of Mathematics Institute for Advanced Study Princeton, NJ 08540 amitc@ias.edu Subhash Khot
More informationCMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms
CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the
More informationComplexity of Finding a Duplicate in a Stream: Simple Open Problems
Complexity of Finding a Duplicate in a Stream: Simple Open Problems Jun Tarui Univ of Electro-Comm, Tokyo Shonan, Jan 2012 1 Duplicate Finding Problem Given: Stream a 1,,a m a i {1,,n} Assuming m > n,
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya Indian Institute of Science, Bangalore arnabb@csa.iisc.ernet.in Palash Dey Indian Institute of
More informationOn the tightness of the Buhrman-Cleve-Wigderson simulation
On the tightness of the Buhrman-Cleve-Wigderson simulation Shengyu Zhang Department of Computer Science and Engineering, The Chinese University of Hong Kong. syzhang@cse.cuhk.edu.hk Abstract. Buhrman,
More informationAn exponential separation between quantum and classical one-way communication complexity
An exponential separation between quantum and classical one-way communication complexity Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical
More informationStream Computation and Arthur- Merlin Communication
Stream Computation and Arthur- Merlin Communication Justin Thaler, Yahoo! Labs Joint Work with: Amit Chakrabarti, Dartmouth Graham Cormode, University of Warwick Andrew McGregor, Umass Amherst Suresh Venkatasubramanian,
More informationAlgorithms for Distributed Functional Monitoring
Algorithms for Distributed Functional Monitoring Graham Cormode AT&T Labs and S. Muthukrishnan Google Inc. and Ke Yi Hong Kong University of Science and Technology We study what we call functional monitoring
More informationCS Introduction to Complexity Theory. Lecture #11: Dec 8th, 2015
CS 2401 - Introduction to Complexity Theory Lecture #11: Dec 8th, 2015 Lecturer: Toniann Pitassi Scribe Notes by: Xu Zhao 1 Communication Complexity Applications Communication Complexity (CC) has many
More informationTopology Matters in Communication
Electronic Colloquium on Computational Complexity, Report No. 74 204 Topology Matters in Communication Arkadev Chattopadhyay Jaikumar Radhakrishnan Atri Rudra May 4, 204 Abstract We provide the first communication
More informationSpace- Efficient Es.ma.on of Sta.s.cs over Sub- Sampled Streams
Space- Efficient Es.ma.on of Sta.s.cs over Sub- Sampled Streams Andrew McGregor University of Massachuse7s mcgregor@cs.umass.edu A. Pavan Iowa State University pavan@cs.iastate.edu Srikanta Tirthapura
More informationA Tight Lower Bound for Dynamic Membership in the External Memory Model
A Tight Lower Bound for Dynamic Membership in the External Memory Model Elad Verbin ITCS, Tsinghua University Qin Zhang Hong Kong University of Science & Technology April 2010 1-1 The computational model
More informationNondeterminism LECTURE Nondeterminism as a proof system. University of California, Los Angeles CS 289A Communication Complexity
University of California, Los Angeles CS 289A Communication Complexity Instructor: Alexander Sherstov Scribe: Matt Brown Date: January 25, 2012 LECTURE 5 Nondeterminism In this lecture, we introduce nondeterministic
More information11 Heavy Hitters Streaming Majority
11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath
More informationLinear sketching for Functions over Boolean Hypercube
Linear sketching for Functions over Boolean Hypercube Grigory Yaroslavtsev (Indiana University, Bloomington) http://grigory.us with Sampath Kannan (U. Pennsylvania), Elchanan Mossel (MIT) and Swagato Sanyal
More informationDistributed Summaries
Distributed Summaries Graham Cormode graham@research.att.com Pankaj Agarwal(Duke) Zengfeng Huang (HKUST) Jeff Philips (Utah) Zheiwei Wei (HKUST) Ke Yi (HKUST) Summaries Summaries allow approximate computations:
More informationLecture 16 Oct 21, 2014
CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve
More information25.2 Last Time: Matrix Multiplication in Streaming Model
EE 381V: Large Scale Learning Fall 01 Lecture 5 April 18 Lecturer: Caramanis & Sanghavi Scribe: Kai-Yang Chiang 5.1 Review of Streaming Model Streaming model is a new model for presenting massive data.
More informationImproved Algorithms for Distributed Entropy Monitoring
Improved Algorithms for Distributed Entropy Monitoring Jiecao Chen Indiana University Bloomington jiecchen@umail.iu.edu Qin Zhang Indiana University Bloomington qzhangcs@indiana.edu Abstract Modern data
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationPROPERTY TESTING LOWER BOUNDS VIA COMMUNICATION COMPLEXITY
PROPERTY TESTING LOWER BOUNDS VIA COMMUNICATION COMPLEXITY Eric Blais, Joshua Brody, and Kevin Matulef February 1, 01 Abstract. We develop a new technique for proving lower bounds in property testing,
More informationQuantum Communication Complexity
Quantum Communication Complexity Ronald de Wolf Communication complexity has been studied extensively in the area of theoretical computer science and has deep connections with seemingly unrelated areas,
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More informationTight Bounds for Distributed Functional Monitoring
Tight Bounds for Distributed Functiona Monitoring David P. Woodruff IBM Amaden dpwoodru@us.ibm.com Qin Zhang IBM Amaden qinzhang@cse.ust.hk Abstract We resove severa fundamenta questions in the area of
More informationThe Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan
The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting
More informationCPSC 467: Cryptography and Computer Security
CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 14 October 16, 2013 CPSC 467, Lecture 14 1/45 Message Digest / Cryptographic Hash Functions Hash Function Constructions Extending
More informationLecture 21: Quantum communication complexity
CPSC 519/619: Quantum Computation John Watrous, University of Calgary Lecture 21: Quantum communication complexity April 6, 2006 In this lecture we will discuss how quantum information can allow for a
More informationCommunication vs information complexity, relative discrepancy and other lower bounds
Communication vs information complexity, relative discrepancy and other lower bounds Iordanis Kerenidis CNRS, LIAFA- Univ Paris Diderot 7 Joint work with: L. Fontes, R. Jain, S. Laplante, M. Lauriere,
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationCommunication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality
Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality Mark Braverman Ankit Garg Tengyu Ma Huy Nguyen David Woodruff DIMACS Workshop on Big Data through
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationLecture 1: Perfect Secrecy and Statistical Authentication. 2 Introduction - Historical vs Modern Cryptography
CS 7880 Graduate Cryptography September 10, 2015 Lecture 1: Perfect Secrecy and Statistical Authentication Lecturer: Daniel Wichs Scribe: Matthew Dippel 1 Topic Covered Definition of perfect secrecy One-time
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More informationA Continuous Sampling from Distributed Streams
A Continuous Sampling from Distributed Streams Graham Cormode, AT&T Labs Research S. Muthukrishnan, Rutgers University Ke Yi, HKUST Qin Zhang, MADALGO, Aarhus University A fundamental problem in data management
More informationCommunication Complexity
Communication Complexity Jaikumar Radhakrishnan School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai 31 May 2012 J 31 May 2012 1 / 13 Plan 1 Examples, the model, the
More informationSparser Johnson-Lindenstrauss Transforms
Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)
More informationMaximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback
Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback Klim Efremenko UC Berkeley klimefrem@gmail.com Ran Gelles Princeton University rgelles@cs.princeton.edu Bernhard
More informationFourier analysis of boolean functions in quantum computation
Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge
More informationLinear Index Codes are Optimal Up To Five Nodes
Linear Index Codes are Optimal Up To Five Nodes Lawrence Ong The University of Newcastle, Australia March 05 BIRS Workshop: Between Shannon and Hamming: Network Information Theory and Combinatorics Unicast
More informationOn Quantum vs. Classical Communication Complexity
On Quantum vs. Classical Communication Complexity Dmitry Gavinsky Institute of Mathematics, Praha Czech Academy of Sciences Introduction Introduction The setting of communication complexity is one of the
More informationCombinatorial Auctions Do Need Modest Interaction
Combinatorial Auctions Do Need Modest Interaction SEPEHR ASSADI, University of Pennsylvania We study the necessity of interaction for obtaining efficient allocations in combinatorial auctions with subadditive
More informationLecture 11: Quantum Information III - Source Coding
CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that
More informationSettling the Query Complexity of Non-Adaptive Junta Testing
Settling the Query Complexity of Non-Adaptive Junta Testing Erik Waingarten, Columbia University Based on joint work with Xi Chen (Columbia University) Rocco Servedio (Columbia University) Li-Yang Tan
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationHardness of MST Construction
Lecture 7 Hardness of MST Construction In the previous lecture, we saw that an MST can be computed in O( n log n+ D) rounds using messages of size O(log n). Trivially, Ω(D) rounds are required, but what
More informationGrothendieck Inequalities, XOR games, and Communication Complexity
Grothendieck Inequalities, XOR games, and Communication Complexity Troy Lee Rutgers University Joint work with: Jop Briët, Harry Buhrman, and Thomas Vidick Overview Introduce XOR games, Grothendieck s
More informationOptimal Random Sampling from Distributed Streams Revisited
Optimal Random Sampling from Distributed Streams Revisited Srikanta Tirthapura 1 and David P. Woodruff 2 1 Dept. of ECE, Iowa State University, Ames, IA, 50011, USA. snt@iastate.edu 2 IBM Almaden Research
More informationNew Characterizations in Turnstile Streams with Applications
New Characterizations in Turnstile Streams with Applications Yuqing Ai 1, Wei Hu 1, Yi Li 2, and David P. Woodruff 3 1 The Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary
More informationdistributed simulation and distributed inference
distributed simulation and distributed inference Algorithms, Tradeoffs, and a Conjecture Clément Canonne (Stanford University) April 13, 2018 Joint work with Jayadev Acharya (Cornell University) and Himanshu
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationSimultaneous Communication Protocols with Quantum and Classical Messages
Simultaneous Communication Protocols with Quantum and Classical Messages Oded Regev Ronald de Wolf July 17, 2008 Abstract We study the simultaneous message passing model of communication complexity, for
More informationPERFECTLY secure key agreement has been studied recently
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 2, MARCH 1999 499 Unconditionally Secure Key Agreement the Intrinsic Conditional Information Ueli M. Maurer, Senior Member, IEEE, Stefan Wolf Abstract
More informationAlgebraic Problems in Computational Complexity
Algebraic Problems in Computational Complexity Pranab Sen School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai 400005, India pranab@tcs.tifr.res.in Guide: Prof. R.
More informationTracking Distributed Aggregates over Time-based Sliding Windows
Tracking Distributed Aggregates over Time-based Sliding Windows Graham Cormode and Ke Yi Abstract. The area of distributed monitoring requires tracking the value of a function of distributed data as new
More informationLecture 5: February 21, 2017
COMS 6998: Advanced Complexity Spring 2017 Lecture 5: February 21, 2017 Lecturer: Rocco Servedio Scribe: Diana Yin 1 Introduction 1.1 Last Time 1. Established the exact correspondence between communication
More information1 Secure two-party computation
CSCI 5440: Cryptography Lecture 7 The Chinese University of Hong Kong, Spring 2018 26 and 27 February 2018 In the first half of the course we covered the basic cryptographic primitives that enable secure
More information1-Pass Relative-Error L p -Sampling with Applications
-Pass Relative-Error L p -Sampling with Applications Morteza Monemizadeh David P Woodruff Abstract For any p [0, 2], we give a -pass poly(ε log n)-space algorithm which, given a data stream of length m
More informationCommunication Complexity (for Algorithm Designers)
arxiv:1509.06257v1 [cs.cc] 21 Sep 2015 Communication Complexity (for Algorithm Designers) Tim Roughgarden c Tim Roughgarden 2015 Preface The best algorithm designers prove both possibility and impossibility
More informationFunctional Monitoring without Monotonicity
Functional Monitoring without Monotonicity Chrisil Arackaparambil, Joshua Brody, and Amit Chakrabarti Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA Abstract. The notion of distributed
More informationRange-efficient computation of F 0 over massive data streams
Range-efficient computation of F 0 over massive data streams A. Pavan Dept. of Computer Science Iowa State University pavan@cs.iastate.edu Srikanta Tirthapura Dept. of Elec. and Computer Engg. Iowa State
More informationNorms, XOR lemmas, and lower bounds for GF(2) polynomials and multiparty protocols
Norms, XOR lemmas, and lower bounds for GF(2) polynomials and multiparty protocols Emanuele Viola, IAS (Work partially done during postdoc at Harvard) Joint work with Avi Wigderson June 2007 Basic questions
More informationA Mergeable Summaries
A Mergeable Summaries Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi We study the mergeability of data summaries. Informally speaking, mergeability requires
More informationENEE 457: Computer Systems Security 09/19/16. Lecture 6 Message Authentication Codes and Hash Functions
ENEE 457: Computer Systems Security 09/19/16 Lecture 6 Message Authentication Codes and Hash Functions Charalampos (Babis) Papamanthou Department of Electrical and Computer Engineering University of Maryland,
More informationCache-Oblivious Hashing
Cache-Oblivious Hashing Zhewei Wei Hong Kong University of Science & Technology Joint work with Rasmus Pagh, Ke Yi and Qin Zhang Dictionary Problem Store a subset S of the Universe U. Lookup: Does x belong
More informationLecture 2: Perfect Secrecy and its Limitations
CS 4501-6501 Topics in Cryptography 26 Jan 2018 Lecture 2: Perfect Secrecy and its Limitations Lecturer: Mohammad Mahmoody Scribe: Mohammad Mahmoody 1 Introduction Last time, we informally defined encryption
More informationA Stable Marriage Requires Communication
A Stable Marriage Requires Communication Yannai A. Gonczarowski Noam Nisan Rafail Ostrovsky Will Rosenbaum Abstract The Gale-Shapley algorithm for the Stable Marriage Problem is known to take Θ(n 2 ) steps
More information4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationCell-Probe Lower Bounds for Prefix Sums and Matching Brackets
Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Emanuele Viola July 6, 2009 Abstract We prove that to store strings x {0, 1} n so that each prefix sum a.k.a. rank query Sumi := k i x k can
More informationLecture 3: Oct 7, 2014
Information and Coding Theory Autumn 04 Lecturer: Shi Li Lecture : Oct 7, 04 Scribe: Mrinalkanti Ghosh In last lecture we have seen an use of entropy to give a tight upper bound in number of triangles
More informationA Mergeable Summaries
A Mergeable Summaries Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi We study the mergeability of data summaries. Informally speaking, mergeability requires
More informationLecture 7 Limits on inapproximability
Tel Aviv University, Fall 004 Lattices in Computer Science Lecture 7 Limits on inapproximability Lecturer: Oded Regev Scribe: Michael Khanevsky Let us recall the promise problem GapCVP γ. DEFINITION 1
More informationLecture 7: Fingerprinting. David Woodruff Carnegie Mellon University
Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random
More informationInterval Selection in the streaming model
Interval Selection in the streaming model Pascal Bemmann Abstract In the interval selection problem we are given a set of intervals via a stream and want to nd the maximum set of pairwise independent intervals.
More information12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)
12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.
More informationLeveraging Big Data: Lecture 13
Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector
More informationA Tight Lower Bound for High Frequency Moment Estimation with Small Error
A Tight Lower Bound for High Frequency Moment Estimation with Small Error Yi Li Department of EECS University of Michigan, Ann Arbor leeyi@umich.edu David P. Woodruff IBM Research, Almaden dpwoodru@us.ibm.com
More informationThe one-way communication complexity of the Boolean Hidden Matching Problem
The one-way communication complexity of the Boolean Hidden Matching Problem Iordanis Kerenidis CRS - LRI Université Paris-Sud jkeren@lri.fr Ran Raz Faculty of Mathematics Weizmann Institute ran.raz@weizmann.ac.il
More informationPart 1: Hashing and Its Many Applications
1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random
More information