The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Transmitting Correlated Variables (X, Y) pair of correlated random variables

Transmitting Correlated Variables (X, Y) pair of correlated random variables Alice Bob

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice Bob Input (to Alice): x R X

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob Input (to Alice): x R X

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution)

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Question: What is the minimum "expected" number of bits (i.e, Z ) Alice needs to send Bob? (say, T(X : Y))

Correlated variables an example W = (i, b) random variable uniformly distributed over [n] {0, 1}. X and Y two n bit strings such that X[i] = Y[i] = b remaining 2n 1 bits independently and uniformly chosen.

Information Theory Preliminaries (X, Y) - pair of random variables 1. Entropy: H[X] x p(x) log 1 p(x), where p(x) = Pr[X = x]. minimum expected number of bits to encode X (upto ±1) 2. Conditional Entropy: H[Y X] x Pr[X = x] H[Y X=x ] 3. Joint Entropy: H[XY] = H[X] + H[Y X] 4. Mutual Information: I[X : Y] H[X] + H[Y] H[XY] = H[Y] H[Y X]

Information Theory Preliminaries (Contd) 1. Independence: X and Y are independent if Pr[X = x, Y = y] = Pr[X = x] Pr[Y = y], for all x, y Equivalently, I[X : Y] = 0. 2. Markov Chain: X -Z -Y is called a Markov chain if X and Y are conditionally independent given Z (ie., I[X : Y Z] = 0) 3. Data Processing Inequality: X -Z -Y = I[X : Z] I[X : Y]

Transmitting Correlated Variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Question: What is the minimum "expected" number of bits (i.e, z ) Alice needs to send Bob? (say, T(X : Y)) T(X : Y) = min H[Z] Z where the minimum is over all Markov chains X -Z -Y (i.e., Z such that X and Y are independent given Z)

Asymptotic Version (with error) x 1 x 2. x m Alice -z Bob Input: x 1, x 2,..., x m - i.i.d. samples from X ỹ 1 ỹ 2. ỹ m

Asymptotic Version (with error) x 1 x 2. x m Alice -z Bob Input: x 1, x 2,..., x m - i.i.d. samples from X Output: ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ ỹ 1 ỹ 2. ỹ m

Asymptotic Version (with error) Theorem (Wyner 1975) C(X : Y) = min I[XY : Z], Z where the minimum is taken over all Z such that X -Z -Y.

Asymptotic Version (with error) Theorem (Wyner 1975) C(X : Y) = min I[XY : Z], Z where the minimum is taken over all Z such that X -Z -Y. For instance in the example, C(X : Y) = 2 o(1) i.e., Can send significantly less in the asymptotic case (2 bits on average) compared to the one-shot case (Θ(log n) bits).

Asymptotic Version x 1 x 2. x m Alice -z Bob ỹ 1 ỹ 2. ỹ m ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ T λ (X m, Y m ) = min E[ Z ]

Asymptotic Version with Shared Randomness Random String R x 1 ỹ 1 x 2 ỹ 2. x m Alice -z Bob. ỹ m ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ T R λ (Xm, Y m ) = min E[ Z ]

Main Result: One-shot Version R x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Communication: T R (X : Y) = E[ Z ]

Main Result: One-shot Version R x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Communication: T R (X : Y) = E[ Z ] Theorem (Main Result) I[X : Y] T R (X : Y) I[X : Y] + 2 log I[X : Y] + O(1)

One-shot vs. Asymptotic

One-shot vs. Asymptotic Typical Sets For large n, n i.i.d samples of X fall in "typical sets" Typical sets all elements equally probable and number of elements 2 nh[x]. Asymptotic statements arguably properties of typical sets as opposed to the underlying distributions.

One-shot vs. Asymptotic Typical Sets Applications For large n, n i.i.d samples of X fall in "typical sets" Typical sets all elements equally probable and number of elements 2 nh[x]. Asymptotic statements arguably properties of typical sets as opposed to the underlying distributions. Asymptotic versions not strong enough for applications.

Outline Introduction Proof of Main Result Rejection Sampling Procedure Applications of Main Result Communication Complexity: Direct Sum Result

Generating one distribution from another P, Q two distributions such that Supp(P) Supp(Q). Rejection Sampling Procedure: q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P

Naive procedure Sample according to P to obtain item x Wait till item x appears in the stream and output corresponding index

Naive procedure Sample according to P to obtain item x Wait till item x appears in the stream and output corresponding index E[l(i )] x p(x) log 1 q(x)

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x)

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric S(P Q) < Supp(P) Supp(Q)

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric S(P Q) < Supp(P) Supp(Q) S(P Q) = 0 P Q

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric S(P Q) < Supp(P) Supp(Q) S(P Q) = 0 P Q S(P Q) 0

Rejection Sampling Lemma Lemma (Rejection Sampling Lemma) There exists a rejection sampling procedure that generates P from Q such that E[l(i )] S(P Q) + 2 log S(P Q) + O(1).

Proof of Main Result Fact: I[X : Y] = E x X [ S (Y X=x Y) ]

Proof of Main Result Fact: I[X : Y] = E x X [ S (Y X=x Y) ] Proof. Common random string: sequence of samples from marginal Y On input x, Alice performs rejection sampling procedure to generate Y X=x from Y E[ Z ] = E x X [ S(Y X=x Y) + 2 log S(Y X=x Y) + O(1) ] I[X : Y] + 2 log I[X : Y] + O(1)

Greedy Approach to Rejection Sampling q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P

Greedy Approach to Rejection Sampling q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P Greedy Approach: At each iteration, fill distribution P with best possible sub-distribution of Q, while maintaining that sum is always less then P

Greedy Approach 1. Set p 1 (x) p(x) [p i (x) = Probability for item x that still needs to be satisfied] 2. Set s 0 0 [s i = Pr[Greedy stops before examining (i + 1)th sample]] 3. For i 1 to 3.1 Examine sample q i 3.2 If q i = x, n o p With probability min i (x), 1 (1 s i 1 output i ) q(x) 3.3 Updates: For all x, set p i+1(x) n o p p i(x) (1 s i 1)q(x) min i (x), 1 (1 s i 1 = p )q(x) i(x) α i,x Set s i s i 1 + P x αi,x

Expected Index Length

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i].

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i]. Note p(x) = i α i,x

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i]. Note p(x) = i α i,x Suppose α i+1,x > 0, i.e., previous iterations not sufficient to provide the required probability p(x) to item x. Hence,

Expected Index Length E[l(i)] E[log i] = α i,x log i x i ( 1 α i,x log p(x) ) 1 s x i i 1 q(x) = 1 α i,x (log + log p(x) ) 1 s x i i 1 q(x) = p(x) log p(x) q(x) + ( ) 1 α i,x log 1 s x i x i 1 = S(P Q) + 1 1 α i log S(P Q) + log 1 s i i 1 0 = S(P Q) + O(1) 1 1 s ds

Expected Index Length E[l(i)] E[log i] = α i,x log i x i ( 1 α i,x log p(x) ) 1 s i 1 q(x) x i = 1 α i,x (log + log p(x) 1 s x i i 1 q(x) = p(x) log p(x) q(x) + ( ) 1 α i,x log 1 s x i x i 1 = S(P Q) + 1 1 α i log S(P Q) + 1 s i i 1 0 = S(P Q) + O(1) Actually, l(i) = log i + 2 log log i, hence extra log term in final result ) log 1 1 s ds

Greedy Approach (Contd) Clearly, the greedy approach generates the target distribution P Furthermore, it can be shown that the expected index length is at most E[l(i)] S(P Q) + 2 log S(P Q) + O(1).

Outline Introduction Proof of Main Result Rejection Sampling Procedure Applications of Main Result Communication Complexity: Direct Sum Result

Two Party Communication Complexity Model [Yao] f : X Y Z Alice Bob

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z m 1 x Alice y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z m 1 m 2 x Alice y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice m 1 m 2 m 3 m 4.. m k y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice m 1 m 2 m 3 m 4.. m k f (x, y) y Bob k-round protocol computing f

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice m 1 m 2 m 3 m 4.. m k f (x, y) y Bob k-round protocol computing f Question: How many bits must Alice and Bob exchange to compute f?

Direct Sum Question Alice m 1 m 2 m 3 m 4.. m k Bob

Direct Sum Question (x 1, x 2,..., x t ) Alice m 1 m 2 m 3 m 4.. m k (f (x 1, y 1 ), f (x 2, y 2 ),..., f (x t, y t )) (y 1, y 2,..., y t ) Bob

Direct Sum Question (x 1, x 2,..., x t ) Alice m 1 m 2 m 3 m 4.. m k (f (x 1, y 1 ), f (x 2, y 2 ),..., f (x t, y t )) (y 1, y 2,..., y t ) Bob Direct Sum Question: Does the number of bits communicated increase t fold? Direct Product Question: Keeping number of bits communicated fixed, does success probability fall exponentially in t?

Communication Complexity Measures randomized communication complexity: R k ɛ(f ) = min Π max (number of bits communicated) (x,y) Π - k-round public-coins randomized protocol, that computes f correctly with probability at least 1 ɛ on each input (x, y). distributional communication complexity: For a distribution µ on the inputs X Y, D µ,k ɛ (f ) = min Π max (number of bits communicated) (x,y) where Π - "deterministic" k-round protocol for f with average error at most ɛ under µ.

Direct Sum Results R k ɛ(f ) vs. R k ɛ(f t ) and D µ,k ɛ (f ) vs. D µt,k ɛ (f t )

Direct Sum Results R k ɛ(f ) vs. R k ɛ(f t ) and D µ,k ɛ (f ) vs. D µt,k ɛ (f t ) 1. [Chakrabarti, Shi, Wirth and Yao 2001] Direct Sum result for "Equality function" in Simultaneous message passing model. 2. [Jain, Radhakrishnan and Sen 2005] Extended above result to all functions in simultaneous message passing model and one-way communication model. 3. [Jain, Radhakrishnan and Sen 2003] For bounded round communication models, for any f and any product distribution µ, ( ) D µt,k δ ɛ (f t 2 ) t 2k Dµ,k ɛ+2δ (f ) 2

Improved Direct Sum Result Theorem For any function f and any product distribution µ, D µt,k ɛ (f t ) t ( ) δd µ,k 2 ɛ+δ (f ) O(k).

Improved Direct Sum Result Theorem For any function f and any product distribution µ, D µt,k ɛ (f t ) t ( ) δd µ,k 2 ɛ+δ (f ) O(k). Applying Yao s minmax principle, ( t ( )) R k ɛ(f t ) max δd µ,k product µ 2 ɛ+δ (f ) O(k).

Information Cost X Alice Private Coins Protocol Π M M transcript Y Bob

Information Cost X Alice Information Cost of Π wrt µ: Private Coins Protocol Π M M transcript Y Bob IC µ (Π) = I[XY : M]

Information Cost X Alice Information Cost of Π wrt µ: Private Coins Protocol Π M M transcript Y Bob IC µ (Π) = I[XY : M] For a function f, let IC µ,k ɛ (f ) = min Π ICµ (Π), where Π k-round private-coins protocols for f with error at most ɛ under µ.

Three Lemmata Lemma (Direct Sum for Information Cost) For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ).

Three Lemmata Lemma (Direct Sum for Information Cost) For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Lemma (IC upper bounds distributional complexity) IC µ,k ɛ (f ) D µ,k ɛ (f ). Lemma ((Improved) Message Compression) D µ,k ɛ+δ (f ) 1 [ δ 2 IC µ,k ɛ ] (f ) + O(k).

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ).

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Private Coins Protocol Π achieves I = IC ɛ µ,k (f t ) X 1, X 2,..., X t Alice M M transcript Y 1, Y 2,..., Y t Bob Chain rule: I[XY : M] t i=1 I[X iy i : M] Claim: For each i, I[X i Y i : M] IC µ,k ɛ (f )

IC upper bounds distributional complexity IC µ,k ɛ (f ) D µ,k ɛ (f ).

IC upper bounds distributional complexity IC µ,k ɛ (f ) D µ,k ɛ (f ). Proof. Let Π be a protocol that achieves D µ,k ɛ (f ) and M be its transcript. Then, D µ,k ɛ (f ) E[ M ] H[M] I[XY : M] IC µ,k ɛ (f )

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ Private Coins Protocol Π m 1 m 2 m 3 m 4. m k Information Cost I [ 2 IC µ,k ɛ ] (f ) + O(k)

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ [ 2 IC µ,k ɛ ] (f ) + O(k) Private Coins Protocol Π Public Coins Protocol Π m 1 m 2 z 1 m1 m 3 m 4. m k Information Cost I m 2 z 2 z 3 m 4 z 4. z k m3 mk k i=1 E [Z i] 2I + O (k)

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ [ 2 IC µ,k ɛ ] (f ) + O(k) Private Coins Protocol Π Public Coins Protocol Π m 1 m 2 z 1 m1 m 3 m 4. m k Information Cost I I = I[XY : M] = I[XY : M 1 M 2... M k ] = Sufficient to prove, m 2 z 2 z 3 m 4 z 4. z k m3 mk k i=1 E [Z i] 2I + O (k) k I[XY : M i M 1 M 2... M i 1 ] i=1 For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1)

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice m 1 m 2 m 2. m i. m k m1 Y Bob

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 m 2. m i. m k m1 Y Bob

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 z 2. m i. m k m1 Y Bob

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 z 2. m i. m k m1 Y Bob Conditioned on all earlier messages, message M i is independent of Bob s Input Y. Hence, I i = I[XY : M i m 1... m i 1 ] = I[X : M i m 1... m i 1 ]

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 z 2. z i. m k m1 mi Y Bob Conditioned on all earlier messages, message M i is independent of Bob s Input Y. Hence, I i = I[XY : M i m 1... m i 1 ] = I[X : M i m 1... m i 1 ] Main Result implies M i can be generated on Bob s side sending only I i + 2 log I i + O(1) 2I i + O(1) bits

Summarizing Results A characterization of mutual information and relative entropy in terms of communication complexity (modulo lower order log terms) An improved direct sum result for communication complexity Thank You