Arthur-Merlin Streaming Complexity

Weizmann Institute of Science Joint work with Ran Raz

Data Streams The data stream model is an abstraction commonly used for algorithms that process network traffic using sublinear space. A data stream is a sequence of elements from a given domain. A streaming algorithm has a sequential, one-pass access to a data stream. The algorithm is required to compute a function of the data stream, using as little space as possible.

Data Streams - A bit more formally Let ɛ, δ > 0. Let m, n N. A data stream σ = (a 1,..., a m ) is a sequence of elements, each from [n] = {1,..., n}. We say that the length of the stream is m, and the alphabet size is n. Definition A δ-error, ɛ-approximation data stream algorithm A ɛ,δ for approximating f is a probabilistic algorithm that gets a sequential, one pass access to a data stream σ = (a 1,..., a m ) (where each a i is a member of [n]), and satisfies: Pr [ A ɛ,δ (σ) f (σ) > ɛ f (σ)] < δ.

Distinct Elements Among the most fundamental problems in the data stream model is the problem of (counting the number of) Distinct Elements. The problem has been studied extensively in the last two decades. It has vast variety of applications (covering IP routing, database operations and text compression). Also gives a theoretical insight that it gives on the nature of computation in the data stream model.

Distinct Elements - Definition Definition The Distinct Elements problem is the data stream problem of computing the exact number of distinct elements in a data stream σ = (a 1,..., a m ) (where a i [n] for every i), i.e., computing (exactly): F 0 (σ) = {i N : j [m] aj = i}. Note that the k th frequency moment of σ is given by F k (σ) = j [n] (where f j is the frequency of the j th element). Thus, if we define 0 0 = 0 then this is exactly the 0 th frequency moment of the stream. Hence the notation F 0. f k j

Distinct Elements - History A long line of research on the complexity of the problem started with the seminal paper by Flajolet and Martin (1983). Alon at el. have shown a lower bound of Ω(n) on the streaming complexity of the computation of the exact number of distinct elements. Recently, Kane at el. (2010) gave the first optimal approximation algorithm: Theorem Given ɛ > 0, a (1 ± ɛ) multiplicative approximation of F 0 (with 2/3 success probability) can be done using O(ɛ 2 + log n) bits of space.

Distinct Elements - Probabilistic proof systems A natural approach for reducing the space complexity of streaming algorithms, without settling on an approximation, is by considering a probabilistic proof system. Important step stones includes: 1 Chakrabarti at el. (CCM 2009) have shown data stream with annotations algorithms for several data stream problems, using a probabilistic proof system that is very similar to MA. 2 This line of work continued in (CMT 2010), for numerous graph problems. 3 Chakrabarti at el. (CMT 2011) provided a practical instantiation of the general-purpose constructions for arbitrary computations, due to Goldwasser at el. (GKR 2008).

Merlin-Arthur (MA) probabilistic proof systems MA (Merlin-Arthur) proofs are the probabilistic extension of the notion of N P-proofs. These protocol start with an omniscient prover that sends a proof to a computationally bounded verifier. After seeing both the input and the proof, the verifier should be able to verify a proof of a correct statement w.h.p., and reject every possible alleged proof for a wrong statement.

AM probabilistic proof systems An AM proof is defined almost the same as an MA proof, except that we assume that both the prover and the verifier have access to a common source of randomness. The notion of AM and MA proof systems can be extended to many computational models. In this work we consider both the communication complexity analogue of MA, and the data stream analogues of MA and AM.

Arthur-Merlin proofs and cloud computing Probabilistic proof systems for streaming algorithms provide an abstraction for delegation of computation to a cloud: A user receives (or generates) a massive amount of data, which he cannot afford to store locally. Instead, the user can stream the data he receives to the cloud, keeping only a short certificate of the data it streamed. Then, the cloud can perform the calculations and send the result to the user, which can be verified via the short certificate.

Informal overview of our results The main contributions in this work are - in the data stream model: A scheme for constructing AM streaming algorithms for a wide class of data stream problems, including the Distinct Elements problem. A lower bound on the MA streaming complexity of the Distinct Elements problem. In communication complexity: A tight lower bound on the MA communication complexity of the Gap Hamming Distance problem.

An Important Notation Given a data stream σ = (a 1,..., a m ) (over alphabet [n]), Definition For every position i [m] of σ, let χ σ i : [n] {0, 1} be the element indicator (of the i th position of σ), given by { χ σ 1 if a i = j i (j) = 0 otherwise In order to simplify notations, we omit the superscript and write χ i (j) χ σ i (j).

Generic scheme for AM streaming algorithm (informal) Say we have a data stream problem P. Let C be a Boolean circuit with a small number of unbounded fan-in gates, such that For example, P(σ) = n C ( χ 1 (j),..., χ m (j) ). j=1 Example (Distinct Elements) n ( F 0 (σ) = χ1 (j)... χ m (j) ). j=1

Generic scheme for AM streaming algorithm (informal) Say we have a data stream problem P. Let C be a Boolean circuit with a small number of unbounded fan-in gates, such that Then, Theorem (informal) P(σ) = n C ( χ 1 (j),..., χ m (j) ). j=1 For every s, w N such that s w n, we give an AM streaming algorithm for P(σ) with Space Complexity Õ(s) Proof Complexity Õ(w)

Lower bound for Distinct Elements Recall that the scheme above implies the following: Upper Bound For every s, w N such that s w n, we give an AM streaming algorithm for the exact computation of F 0 (σ) with space Complexity Õ(s), and proof Complexity Õ(w). This raises the following (natural) question: Open Question Is it possible to show a matching lower bound?

Lower bound for Distinct Elements While unable to answer the question above, we give a lower bound on a similar model: MA streaming complexity. The MA lower bound also works for the approximation variant of Distinct Elements: Lower Bound Let ɛ > 1/ n. For every MA streaming algorithm (with space complexity s, and proof complexity p) that (1 ± ɛ)-approximates F 0 (σ), must satisfy s w = Ω(1/ɛ 2 ).

The communication complexity of GHD The Gap Hamming Distance problem is the problem of deciding whether the Hamming distance of the inputs of the players is greater than n/2 + n, or less than n/2 n. We show the following: Lower Bound For every MA communication complexity protocol for GHD that communicates t bits and uses a proof of size w, we have t w = Ω(n). We prove the tightness of the lower bound. Upper Bound For every t, w N such that t w n, there exists an MA communication complexity protocol for GHD, which communicates O(t log n) bits and uses a proof of size O(w log n).

Generic scheme for AM streaming algorithm (formal) Let 0 ɛ < 1/2. Let P be a data stream problem P such that for every m, n N there exists an unbounded fan-in Boolean circuit C : {0, 1} m {0, 1} with k = k(m, n) non-input gates, such that for every data stream σ = (a 1,..., a m ) with alphabet [n], (1 ɛ)p(σ) n C ( χ 1 (j),..., χ m (j) ) (1 + ɛ)p(σ). j=1 Assuming that C can be efficiently determined by the verifier, we have the following AM streaming algorithm for computing P(σ):

Generic scheme for AM streaming algorithm (formal) For every 0 < δ 1 and every s, w N such that s w n, we give an AM streaming algorithm, with error probability δ, for approximating P(σ) within a multiplicative factor of 1 ± ɛ. The algorithm uses space O ( sk polylog(n, δ 1 ) ), a proof of size W = O ( wk polylog(n, δ 1 ) ), randomness complexity polylog(n, δ 1 ).

Sketch of proof - MA streaming algorithm (easy case) Let p = poly(m, n) be a prime number. Say we have a data stream problem P for which there exists a low degree polynomial g : F m p F p such that P(σ) = g(χ 1 (j),..., χ m (j)) j [n] (where the summation is over Z). Example (Frequency Moments) F k (σ) = = ( χ1 (j) +... + χ m (j) ) k. j [n] f k j j [n]

Sketch of proof - MA streaming algorithm (easy case) Let p = poly(m, n) be a prime number. Say we have a data stream problem P for which there exists a low degree polynomial g : F m p F p such that P(σ) = g(χ 1 (j),..., χ m (j)) j [n] (where the summation is over Z). Theorem For every s w n we can obtain an MA streaming algorithm for P that uses Õ(s) space and proof of size Õ(w).

Sketch of proof - MA streaming algorithm (easy case) Recall that P(σ) = g(χ 1 (j),... χ m (j)), j [n] Fix D s F p with s elements. Fix D w F p with w elements. For every i [m], we can write χ i : [n] {0, 1} as χ i : D w D s {0, 1}. Then, we take the unique low-degree extension: χ i : F 2 p F p.

Sketch of proof - MA streaming algorithm (easy case) So we can write P(σ) = g(χ 1 (j),... χ m (j)), j [n] as P(σ) = x D w y D s g( χ 1 (x, y),... χ m (x, y)),

Sketch of proof - MA streaming algorithm (easy case) Recall that, P(σ) = x D w y D s g( χ 1 (x, y),... χ m (x, y)). Define the following proof polynomial ω : F p F p : ω(x) = g( χ 1 (x, y),... χ m (x, y)), y D s Note that deg(ω) = w 1, and that ω(x) = P(σ). x D w

Sketch of proof - MA streaming algorithm (easy case) Upon getting ω of degree w 1, the verifier chooses at random r F p, and check whether ω (r) = ω(r). Since x D w ω(x) = P(σ). if the prover is honest, then ω = ω, and we are done. If instead the prover sent a polynomial ω (x) ω(x), then Pr[ω (r) = ω(r)] w 1 p = o(1)

Sketch of proof - AM streaming algorithm However, in our case we can only express P(σ) as n C ( χ 1 (j),..., χ m (j) ). j=1 or, after arithmetization, as ( χ1 (x, y),..., χ m (x, y) ) p C x D w y D s (where the summation is over F p ). Unfortunately, p C is not a low degree polynomial.

Sketch of proof - AM streaming algorithm We would have liked to lower the degree of p C x D w y D s ( χ1 (x, y),..., χ m (x, y) ), by using the approximation method of [Raz86,Smo87]. In principle, the latter allows us to construct a low degree approximation polynomial p C, such that w.h.p. p C = p C. However, in order for p C to be a low degree polynomial, it must be over a small finite field. But in our case we need to work over a large field in order to express P(σ) (which might be at most n).

Sketch of proof - AM streaming algorithm Instead we apply the method of [Raz86,Smo87] to approximate {P(σ) (mod q)} q Q for a set Q of O(log(n)) primes, each of size at most O(log(n)). This way, each approximation polynomial that we get is over a finite field of cardinality O(log(n)), and of sufficiently low degree. Then, we intend to use the Chinese Remainder Theorem to extract the value of P(σ) from {P(σ) (mod q)} q Q.

Proof Making life easier For simplicity, we show the AM scheme for the particular problem of Distinct Elements. Let s, w N such that s w = n. We show that there exists an explicit AM streaming algorithm for computing F 0 (σ); where S = Õ ( s ) W = Õ ( w ) and the randomness complexity is O ( log(n) ).

Proof: Arithmetizing the Element Indicators Recall that we can write F 0 as follows: F 0 (σ) = n ( χ1 (j)... χ m (j) ). j=1 We wish to write F 0 (σ) as a summation of a low degree polynomial over some domain. As before, we start by representing the element indicators {χ i } i [m] as bivariate polynomials over a finite field.

Proof: Arithmetizing the Element Indicators Again, let p be a prime number. Fix D s F p with s elements. Fix D w F p with w elements. For every i [m], we can write χ i : [n] {0, 1} as χ i : D w D s {0, 1}. Choosing the low-degree extension, we have χ i : F 2 p F p Note that for every ξ D s, the degree of the univariate polynomial χ i (, ξ) : F p F p is at most w 1.

Proof: Arithmetizing the Element Indicators Plugging-in the polynomial extensions of the element indicators yields that F 0 (σ) = x D w y D s (where the summation is over F p ). ( χ1 (x, y)... χ m (x, y) )

Proof: Arithmetizing the Disjunction Next, we replace the disjunction clause with a low-degree polynomial (over a small finite field) that approximates it. Towards this end, we show the following lemma (originated in [Raz86,Smo87]): Lemma (Circuit Approximation) Let δ > 0, let q be a prime number, and let C be a disjunction clause over m variables. There exists a family of polynomials { p C : F m q F q } of degree O(q log 1 /δ), such that for every x {0, 1} m, Pr [p C (x) = C(x)] 1 δ (where the probability is taken over the selection of p C ).

Proof: Arithmetizing the Disjunction Recall that the arithmetization of the disjunction function is (z 1... z m ) = 1 m l=1 (1 z l ). The Lemma is proved by picking a linear error correcting code ECC : F m q F 100m q with relative distance 1/3, and defining the following polynomial: Definition η(z 1,..., z m ) = 1 log(1/δ) l=1 (1 ( ECC(z 1,..., z m ) ιl ) q 1 ).

Proof: Arithmetizing the Disjunction Observe that by applying the circuit approximation lemma with δ = 2 /3 and p as the prime number, we can represent F 0 (σ) as a summation over a polynomial. However, the degree of this polynomial (which is dominated by p), is too high for our needs. Instead, we approximate F 0 (σ) by O(log p) low degree polynomials.

Proof: Congruences Arithmetization Let Q = {q 1,..., q ρ(c log p) } be the set of all prime numbers that are smaller or equal to c log p, where c is a constant such that q > n. q Q For every q Q denote H q := F q λq, where λ q is the minimum integer that satisfies q λq > n log n.

Proof: Congruences Arithmetization For every q Q we have, F q 0 (σ) F 0(σ) (mod q) = q ( χ 1 (x, y)... χq m(x, y) ) x D w y D s Here D w, D s are subsets of H q, and χ q i is the unique low-degree extension of χ i to H q.

Proof: Congruences Arithmetization For every q Q we apply the circuit approximation lemma with δ = O ( (n log n) 1), and get a degree Õ (q) polynomial p q : F m q F q such that for every (x 1,..., x m ) {0, 1} m, Pr [p q (x 1,..., x m ) = (x 1... x m )] 1 O ( 1 ) n log n (where the probability is taken over the random coin flips performed during the construction of p q ). Since F q is a subfield of H q, we can view p q as a polynomial p q : H m q H q.

Proof: Congruences Arithmetization Applying a union bound yields that Pr F q 0 (σ) = x D w q p q ( χ 1 (x, y),..., χq m(x, y) ) y D s 1 O ( 1 ) log n

Proof: Merlin s Proof We define the polynomial ω q : H q H q by y D s p q ( χ q 1 (x, y),..., χq m(x, y) ) Merlin s proof stream ϕ consists of all the proof polynomials {ω q } q Q (total size Õ(w)).

Proof: Extracting the answer from Merlin s Proof Lemma Note that And that [ Pr F q 0 (σ) = deg(ω q ) = Õ(w). x D w ω q (x) ] 1 O Together with the CRT theorem, we get: ( ) 1 logn Given access to the proof stream, it is possible to compute P(σ) with high probability.

Proof: Detecting Merlin s lies We show that w.h.p. Merlin cannot cheat Arthur. That is, by evaluating the actual proof polynomials at a randomly chosen point, Arthur can detect a false proof. Lemma For every q Q, given a polynomial ˆω q : H q H q of degree Õ(w), then: Pr r H q [ˆω q (r) ω q (r)] 1 o(1), This is due to our choice of finite fields and the Schwartz-Zippel lemma.

Proof: Verification (Arthur s Streaming Algorithm) For every q Q, Arthur selects uniformly at random r H q and computes ω q (r) from the input stream. If Merlin s proof disagrees with the above computations - reject. Otherwise, extract F 0 (σ) from the proof stream.

Proof:Efficient evaluation at a single point Last, we need to show that Arthur can compute ω q (r) from the input stream (within the space limitations). We show: Lemma For every q Q and r H q, there exists a streaming algorithm that can evaluate ω q (r) using Õ(s) bits of space. Recall that ω q (r) = y D s p q and that p q is of the form 1 log(1/δ) l=1 ( χ q 1 (r, y),..., χq m(r, y) ) (1 ( ECC(z 1,..., z m ) ιl ) q 1 ).

Proof: Efficient evaluation at a single point Hence it suffices to know the values of: ECC( χ q 1 (r, y),..., χq m(r, y)) ιl, for all s possible values of y D s. By the linearity of the error correcting code we incrementally compute these values, adding ECC(0,..., 0, χ q i (r, y), 0,..., 0) when reading the i th element of the stream.

Thank you!