The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Similar documents
Lecture 11: Quantum Information III - Source Coding

Exponential Separation of Quantum Communication and Classical Information

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Information Theoretic Limits of Randomness Generation

Communication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log

Lecture 22: Final Review

Interactive Communication for Data Exchange

An Information Complexity Approach to the Inner Product Problem

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture Lecture 9 October 1, 2015

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

LECTURE 13. Last time: Lecture outline

Solutions to Set #2 Data Compression, Huffman code and AEP

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

An approach from classical information theory to lower bounds for smooth codes

Homework Set #2 Data Compression, Huffman code and AEP

Information Complexity vs. Communication Complexity: Hidden Layers Game

CS Foundations of Communication Complexity

EE5585 Data Compression May 2, Lecture 27

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

(Classical) Information Theory II: Source coding

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Lecture 16 Oct 21, 2014

Lecture 20: Lower Bounds for Inner Product & Indexing

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

14. Direct Sum (Part 1) - Introduction

Communication Complexity

Lecture 3: Channel Capacity

Lecture 2: August 31

Lecture 5: Asymptotic Equipartition Property

Solutions to Homework Set #3 Channel and Source coding

Shannon s Noisy-Channel Coding Theorem

Lecture 6: Expander Codes

Inaccessible Entropy and its Applications. 1 Review: Psedorandom Generators from One-Way Functions

Quantum Achievability Proof via Collision Relative Entropy

Direct Product Theorems for Classical Communication Complexity via Subdistribution Bounds

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

arxiv:cs.cc/ v2 14 Apr 2003

Lecture 15: Conditional and Joint Typicaility

The Method of Types and Its Application to Information Hiding

Exercises with solutions (Set D)

LECTURE 3. Last time:

Lecture 5 - Information theory

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

An exponential separation between quantum and classical one-way communication complexity

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

On Function Computation with Privacy and Secrecy Constraints

Lecture 2. Capacity of the Gaussian channel

How to Compress Interactive Communication

CS Communication Complexity: Applications and New Directions

Coding for Computing. ASPITRG, Drexel University. Jie Ren 2012/11/14

Distributed Source Coding Using LDPC Codes

QB LECTURE #4: Motif Finding

Secret Key Agreement: General Capacity and Second-Order Asymptotics

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

(Classical) Information Theory III: Noisy channel coding

Communication vs information complexity, relative discrepancy and other lower bounds

CS 630 Basic Probability and Information Theory. Tim Campbell

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Exponential Separation of Quantum and Classical Non-Interactive Multi-Party Communication Complexity

Common Randomness Principles of Secrecy

Interactive Channel Capacity

Lecture 11: Key Agreement

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Exercises with solutions (Set B)

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Lecture 11: Continuous-valued signals and differential entropy

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 11: Polar codes construction

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

COS597D: Information Theory in Computer Science September 21, Lecture 2

Information-theoretic Secrecy A Cryptographic Perspective

Secret Key Generation and Secure Computing

Interactive information and coding theory

Shannon s noisy-channel theorem

Information Theory: Entropy, Markov Chains, and Huffman Coding

Course Introduction. 1.1 Overview. Lecture 1. Scribe: Tore Frederiksen

EE 4TM4: Digital Communications II. Channel Capacity

Unequal Error Protection Querying Policies for the Noisy 20 Questions Problem

Quantitative Biology II Lecture 4: Variational Methods

Internal Compression of Protocols to Entropy

National University of Singapore Department of Electrical & Computer Engineering. Examination for

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

arxiv: v5 [cs.cc] 10 Jul 2014

Example: Letter Frequencies

Graph Coloring and Conditional Graph Entropy

Transcription:

The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Transmitting Correlated Variables (X, Y) pair of correlated random variables

Transmitting Correlated Variables (X, Y) pair of correlated random variables Alice Bob

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice Bob Input (to Alice): x R X

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob Input (to Alice): x R X

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution)

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Question: What is the minimum "expected" number of bits (i.e, Z ) Alice needs to send Bob? (say, T(X : Y))

Transmitting Correlated Variables (X, Y) pair of correlated random variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Question: What is the minimum "expected" number of bits (i.e, Z ) Alice needs to send Bob? (say, T(X : Y)) Easy to check: T(X : Y) I[X : Y] (mutual information)

Correlated variables an example W = (i, b) random variable uniformly distributed over [n] {0, 1}. X and Y two n bit strings such that X[i] = Y[i] = b remaining 2n 1 bits independently and uniformly chosen.

Correlated variables an example W = (i, b) random variable uniformly distributed over [n] {0, 1}. X and Y two n bit strings such that X[i] = Y[i] = b remaining 2n 1 bits independently and uniformly chosen. Exercise: I[X : Y] = o(1) but T(X : Y) = Θ(log n)

Information Theory Preliminaries (X, Y) - pair of random variables 1. Entropy: H[X] x p(x) log 1 p(x), where p(x) = Pr[X = x]. minimum expected number of bits to encode X (upto ±1) 2. Conditional Entropy: H[Y X] x Pr[X = x] H[Y X=x ] 3. Joint Entropy: H[XY] = H[X] + H[Y X] 4. Mutual Information: I[X : Y] H[X] + H[Y] H[XY] = H[Y] H[Y X]

Information Theory Preliminaries (Contd) 1. Independence: X and Y are independent if Pr[X = x, Y = y] = Pr[X = x] Pr[Y = y], for all x, y Equivalently, I[X : Y] = 0. 2. Markov Chain: X -Z -Y is called a Markov chain if X and Y are conditionally independent given Z (ie., I[X : Y Z] = 0) 3. Data Processing Inequality: X -Z -Y = I[X : Z] I[X : Y]

Transmitting Correlated Variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Question: What is the minimum "expected" number of bits (i.e, z ) Alice needs to send Bob? (say, T(X : Y))

Transmitting Correlated Variables x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Question: What is the minimum "expected" number of bits (i.e, z ) Alice needs to send Bob? (say, T(X : Y)) T(X : Y) = min H[Z] Z where the minimum is over all Markov chains X -Z -Y (i.e., Z such that X and Y are independent given Z)

Asymptotic Version (with error) x 1 x 2. x m Alice -z Bob Input: x 1, x 2,..., x m - i.i.d. samples from X ỹ 1 ỹ 2. ỹ m

Asymptotic Version (with error) x 1 x 2. x m Alice -z Bob Input: x 1, x 2,..., x m - i.i.d. samples from X Output: ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ ỹ 1 ỹ 2. ỹ m

Asymptotic Version (with error) x 1 x 2. x m Alice -z Bob Input: x 1, x 2,..., x m - i.i.d. samples from X Output: ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ ỹ 1 ỹ 2. ỹ m T λ (X m, Y m ) = min E[ Z ] Common Information: C(X : Y) lim inf λ 0 [ lim m T λ (X m : Y m ] ) m

Asymptotic Version (with error) Theorem (Wyner 1975) C(X : Y) = min I[XY : Z], Z where the minimum is taken over all Z such that X -Z -Y.

Asymptotic Version (with error) Theorem (Wyner 1975) C(X : Y) = min I[XY : Z], Z where the minimum is taken over all Z such that X -Z -Y. For instance in the example, C(X : Y) = 2 o(1) i.e., Can send significantly less in the asymptotic case (2 bits on average) compared to the one-shot case (Θ(log n) bits).

Asymptotic Version x 1 x 2. x m Alice -z Bob ỹ 1 ỹ 2. ỹ m ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ T λ (X m, Y m ) = min E[ Z ]

Asymptotic Version with Shared Randomness Random String R x 1 ỹ 1 x 2 ỹ 2. x m Alice -z Bob. ỹ m ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ T R λ (Xm, Y m ) = min E[ Z ]

Asymptotic Version with Shared Randomness Random String R x 1 ỹ 1 x 2 ỹ 2. x m Alice -z Bob. ỹ m ỹ 1, ỹ 2,..., ỹ m such that ( 1 ((X 1, Y 1 ),..., (X m, Y m )) (X 1, Ỹ 1 ),..., (X m, Ỹ m )) λ T R λ (Xm, Y m ) = min E[ Z ] Theorem (Winter 2002) [ lim inf lim λ 0 m Tλ R(Xm : Y m ] ) = I[X : Y] m

Main Result: One-shot Version R x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Communication: T R (X : Y) = E[ Z ]

Main Result: One-shot Version R x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Communication: T R (X : Y) = E[ Z ] Theorem (Main Result) I[X : Y] T R (X : Y) I[X : Y] + 2 log I[X : Y] + O(1)

Main Result: One-shot Version R x Alice -z Bob y Input (to Alice): x R X Output (from Bob): y R Y X=x (ie., conditional distribution) Communication: T R (X : Y) = E[ Z ] Theorem (Main Result) I[X : Y] T R (X : Y) I[X : Y] + 2 log I[X : Y] + O(1) Characterization of Mutual Information (upto lower order logarithmic terms)

One-shot vs. Asymptotic

One-shot vs. Asymptotic Typical Sets For large n, n i.i.d samples of X fall in "typical sets" Typical sets all elements equally probable and number of elements 2 nh[x]. Asymptotic statements arguably properties of typical sets as opposed to the underlying distributions.

One-shot vs. Asymptotic Typical Sets Applications For large n, n i.i.d samples of X fall in "typical sets" Typical sets all elements equally probable and number of elements 2 nh[x]. Asymptotic statements arguably properties of typical sets as opposed to the underlying distributions. Asymptotic versions not strong enough for applications.

Outline Introduction Proof of Main Result Rejection Sampling Procedure Applications of Main Result Communication Complexity: Direct Sum Result

Generating one distribution from another P, Q two distributions such that Supp(P) Supp(Q). Rejection Sampling Procedure: q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P

Generating one distribution from another P, Q two distributions such that Supp(P) Supp(Q). Rejection Sampling Procedure: q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P Question: What is the minimum expected length of the index (i.e., E[l(i )] over all such procedures?

Naive procedure Sample according to P to obtain item x Wait till item x appears in the stream and output corresponding index

Naive procedure Sample according to P to obtain item x Wait till item x appears in the stream and output corresponding index E[l(i )] x p(x) log 1 q(x)

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x)

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric S(P Q) < Supp(P) Supp(Q)

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric S(P Q) < Supp(P) Supp(Q) S(P Q) = 0 P Q

Relative Entropy P, Q two distributions S(P Q) = x p(x) log p(x) q(x) Properties: Asymmetric S(P Q) < Supp(P) Supp(Q) S(P Q) = 0 P Q S(P Q) 0

Rejection Sampling Lemma Lemma (Rejection Sampling Lemma) There exists a rejection sampling procedure that generates P from Q such that E[l(i )] S(P Q) + 2 log S(P Q) + O(1).

Proof of Main Result Fact: I[X : Y] = E x X [ S (Y X=x Y) ]

Proof of Main Result Fact: I[X : Y] = E x X [ S (Y X=x Y) ] Proof. Common random string: sequence of samples from marginal Y On input x, Alice performs rejection sampling procedure to generate Y X=x from Y E[ Z ] = E x X [ S(Y X=x Y) + 2 log S(Y X=x Y) + O(1) ] I[X : Y] + 2 log I[X : Y] + O(1)

Greedy Approach to Rejection Sampling q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P

Greedy Approach to Rejection Sampling q 1 q 2 q 3 q 4 q 5......... q i......... Proc i Input: An infinite stream of independently drawn samples from Q Output: Index i such that q i is a sample from P Greedy Approach: At each iteration, fill distribution P with best possible sub-distribution of Q, while maintaining that sum is always less then P

Greedy Approach 1. Set p 1 (x) p(x) [p i (x) = Probability for item x that still needs to be satisfied] 2. Set s 0 0 [s i = Pr[Greedy stops before examining (i + 1)th sample]] 3. For i 1 to 3.1 Examine sample q i 3.2 If q i = x, n o p With probability min i (x), 1 (1 s i 1 output i ) q(x) 3.3 Updates: For all x, set p i+1(x) n o p p i(x) (1 s i 1)q(x) min i (x), 1 (1 s i 1 = p )q(x) i(x) α i,x Set s i s i 1 + P x αi,x

Greedy Approach 1. Set p 1 (x) p(x) [p i (x) = Probability for item x that still needs to be satisfied] 2. Set s 0 0 [s i = Pr[Greedy stops before examining (i + 1)th sample]] 3. For i 1 to 3.1 Examine sample q i 3.2 If q i = x, n o p With probability min i (x), 1 (1 s i 1 output i ) q(x) 3.3 Updates: For all x, set p i+1(x) n o p p i(x) (1 s i 1)q(x) min i (x), 1 (1 s i 1 = p )q(x) i(x) α i,x Set s i s i 1 + P x αi,x Clearly, distribution generated is P.

Expected Index Length

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i].

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i]. Note p(x) = i α i,x

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i]. Note p(x) = i α i,x Suppose α i+1,x > 0, i.e., previous iterations not sufficient to provide the required probability p(x) to item x. Hence,

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i]. Note p(x) = i α i,x Suppose α i+1,x > 0, i.e., previous iterations not sufficient to provide the required probability p(x) to item x. Hence, i (1 s j 1 ) q(x) < p(x) j=1 i(1 s i )q(x) < p(x).

Expected Index Length s i = Pr[ Greedy stops before examining sample q i+1 ] α i,x = Pr[ Greedy outputs x in iteration i]. Note p(x) = i α i,x Suppose α i+1,x > 0, i.e., previous iterations not sufficient to provide the required probability p(x) to item x. Hence, i (1 s j 1 ) q(x) < p(x) j=1 i(1 s i )q(x) < p(x) i < 1 p(x) 1 s i q(x).

Expected Index Length E[l(i)] E[log i] = α i,x log i x i ( 1 α i,x log p(x) ) 1 s x i i 1 q(x) = 1 α i,x (log + log p(x) ) 1 s x i i 1 q(x) = p(x) log p(x) q(x) + ( ) 1 α i,x log 1 s x i x i 1 = S(P Q) + 1 1 α i log S(P Q) + log 1 s i i 1 0 = S(P Q) + O(1) 1 1 s ds

Expected Index Length E[l(i)] E[log i] = α i,x log i x i ( 1 α i,x log p(x) ) 1 s i 1 q(x) x i = 1 α i,x (log + log p(x) 1 s x i i 1 q(x) = p(x) log p(x) q(x) + ( ) 1 α i,x log 1 s x i x i 1 = S(P Q) + 1 1 α i log S(P Q) + 1 s i i 1 0 = S(P Q) + O(1) Actually, l(i) = log i + 2 log log i, hence extra log term in final result ) log 1 1 s ds

Greedy Approach (Contd) Clearly, the greedy approach generates the target distribution P Furthermore, it can be shown that the expected index length is at most E[l(i)] S(P Q) + 2 log S(P Q) + O(1).

Outline Introduction Proof of Main Result Rejection Sampling Procedure Applications of Main Result Communication Complexity: Direct Sum Result

Two Party Communication Complexity Model [Yao] f : X Y Z Alice Bob

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z m 1 x Alice y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z m 1 m 2 x Alice y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice m 1 m 2 m 3 m 4.. m k y Bob

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice m 1 m 2 m 3 m 4.. m k f (x, y) y Bob k-round protocol computing f

Two Party Communication Complexity Model [Yao] f : X Y Z x Alice m 1 m 2 m 3 m 4.. m k f (x, y) y Bob k-round protocol computing f Question: How many bits must Alice and Bob exchange to compute f?

Direct Sum Question Alice m 1 m 2 m 3 m 4.. m k Bob

Direct Sum Question (x 1, x 2,..., x t ) Alice m 1 m 2 m 3 m 4.. m k (f (x 1, y 1 ), f (x 2, y 2 ),..., f (x t, y t )) (y 1, y 2,..., y t ) Bob

Direct Sum Question (x 1, x 2,..., x t ) Alice m 1 m 2 m 3 m 4.. m k (f (x 1, y 1 ), f (x 2, y 2 ),..., f (x t, y t )) (y 1, y 2,..., y t ) Bob Direct Sum Question: Does the number of bits communicated increase t fold?

Direct Sum Question (x 1, x 2,..., x t ) Alice m 1 m 2 m 3 m 4.. m k (f (x 1, y 1 ), f (x 2, y 2 ),..., f (x t, y t )) (y 1, y 2,..., y t ) Bob Direct Sum Question: Does the number of bits communicated increase t fold? Direct Product Question: Keeping number of bits communicated fixed, does success probability fall exponentially in t?

Communication Complexity Measures randomized communication complexity: R k ɛ(f ) = min Π max (number of bits communicated) (x,y) Π - k-round public-coins randomized protocol, that computes f correctly with probability at least 1 ɛ on each input (x, y).

Communication Complexity Measures randomized communication complexity: R k ɛ(f ) = min Π max (number of bits communicated) (x,y) Π - k-round public-coins randomized protocol, that computes f correctly with probability at least 1 ɛ on each input (x, y). distributional communication complexity: For a distribution µ on the inputs X Y, D µ,k ɛ (f ) = min Π max (number of bits communicated) (x,y) where Π - "deterministic" k-round protocol for f with average error at most ɛ under µ.

Communication Complexity Measures randomized communication complexity: R k ɛ(f ) = min Π max (number of bits communicated) (x,y) Π - k-round public-coins randomized protocol, that computes f correctly with probability at least 1 ɛ on each input (x, y). distributional communication complexity: For a distribution µ on the inputs X Y, D µ,k ɛ (f ) = min Π max (number of bits communicated) (x,y) where Π - "deterministic" k-round protocol for f with average error at most ɛ under µ. Theorem (Yao s minmax principle) R k ɛ(f ) = max µ Dµ,k ɛ (f )

Direct Sum Results R k ɛ(f ) vs. R k ɛ(f t ) and D µ,k ɛ (f ) vs. D µt,k ɛ (f t )

Direct Sum Results R k ɛ(f ) vs. R k ɛ(f t ) and D µ,k ɛ (f ) vs. D µt,k ɛ (f t ) 1. [Chakrabarti, Shi, Wirth and Yao 2001] Direct Sum result for "Equality function" in Simultaneous message passing model.

Direct Sum Results R k ɛ(f ) vs. R k ɛ(f t ) and D µ,k ɛ (f ) vs. D µt,k ɛ (f t ) 1. [Chakrabarti, Shi, Wirth and Yao 2001] Direct Sum result for "Equality function" in Simultaneous message passing model. 2. [Jain, Radhakrishnan and Sen 2005] Extended above result to all functions in simultaneous message passing model and one-way communication model.

Direct Sum Results R k ɛ(f ) vs. R k ɛ(f t ) and D µ,k ɛ (f ) vs. D µt,k ɛ (f t ) 1. [Chakrabarti, Shi, Wirth and Yao 2001] Direct Sum result for "Equality function" in Simultaneous message passing model. 2. [Jain, Radhakrishnan and Sen 2005] Extended above result to all functions in simultaneous message passing model and one-way communication model. 3. [Jain, Radhakrishnan and Sen 2003] For bounded round communication models, for any f and any product distribution µ, ( ) D µt,k δ ɛ (f t 2 ) t 2k Dµ,k ɛ+2δ (f ) 2

Improved Direct Sum Result Theorem For any function f and any product distribution µ, D µt,k ɛ (f t ) t ( ) δd µ,k 2 ɛ+δ (f ) O(k).

Improved Direct Sum Result Theorem For any function f and any product distribution µ, D µt,k ɛ (f t ) t ( ) δd µ,k 2 ɛ+δ (f ) O(k). Applying Yao s minmax principle, ( t ( )) R k ɛ(f t ) max δd µ,k product µ 2 ɛ+δ (f ) O(k).

Information Cost X Alice Private Coins Protocol Π M M transcript Y Bob

Information Cost X Alice Information Cost of Π wrt µ: Private Coins Protocol Π M M transcript Y Bob IC µ (Π) = I[XY : M]

Information Cost X Alice Information Cost of Π wrt µ: Private Coins Protocol Π M M transcript Y Bob IC µ (Π) = I[XY : M] For a function f, let IC µ,k ɛ (f ) = min Π ICµ (Π), where Π k-round private-coins protocols for f with error at most ɛ under µ.

Three Lemmata Lemma (Direct Sum for Information Cost) For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ).

Three Lemmata Lemma (Direct Sum for Information Cost) For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Lemma (IC upper bounds distributional complexity) IC µ,k ɛ (f ) D µ,k ɛ (f ).

Three Lemmata Lemma (Direct Sum for Information Cost) For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Lemma (IC upper bounds distributional complexity) IC µ,k ɛ (f ) D µ,k ɛ (f ). Lemma ((Improved) Message Compression) D µ,k ɛ+δ (f ) 1 [ δ 2 IC µ,k ɛ ] (f ) + O(k).

Three Lemmata Lemma (Direct Sum for Information Cost) For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Lemma (IC upper bounds distributional complexity) IC µ,k ɛ (f ) D µ,k ɛ (f ). Lemma ((Improved) Message Compression) D µ,k ɛ+δ (f ) 1 [ δ 2 IC µ,k ɛ ] (f ) + O(k). Three Lemmata imply improved direct sum result.

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ).

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Private Coins Protocol Π achieves I = IC ɛ µ,k (f t ) X 1, X 2,..., X t Alice M M transcript Y 1, Y 2,..., Y t Bob

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Private Coins Protocol Π achieves I = IC ɛ µ,k (f t ) X 1, X 2,..., X t Alice M M transcript Y 1, Y 2,..., Y t Bob Chain rule: I[XY : M] t i=1 I[X iy i : M]

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Private Coins Protocol Π achieves I = IC ɛ µ,k (f t ) X 1, X 2,..., X t Alice M M transcript Y 1, Y 2,..., Y t Bob Chain rule: I[XY : M] t i=1 I[X iy i : M] Claim: For each i, I[X i Y i : M] IC µ,k ɛ (f )

Direct Sum for Information Cost For µ product distribution, IC µt,k ɛ (f t ) t IC µ,k ɛ (f ). Private Coins Protocol Π achieves I = IC ɛ µ,k (f t ) X 1, X 2,..., X t Alice M M transcript Y 1, Y 2,..., Y t Bob Chain rule: I[XY : M] t i=1 I[X iy i : M] Claim: For each i, I[X i Y i : M] IC µ,k ɛ (f ) Proof: On input X i and Y i, Alice and Bob fill in other components (based on product distribution µ) and perform above protocol

IC upper bounds distributional complexity IC µ,k ɛ (f ) D µ,k ɛ (f ).

IC upper bounds distributional complexity IC µ,k ɛ (f ) D µ,k ɛ (f ). Proof. Let Π be a protocol that achieves D µ,k ɛ (f ) and M be its transcript. Then, D µ,k ɛ (f ) E[ M ] H[M] I[XY : M] IC µ,k ɛ (f )

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ Private Coins Protocol Π m 1 m 2 m 3 m 4. m k Information Cost I [ 2 IC µ,k ɛ ] (f ) + O(k)

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ [ 2 IC µ,k ɛ ] (f ) + O(k) Private Coins Protocol Π Public Coins Protocol Π m 1 m 2 z 1 m1 m 3 m 4. m k Information Cost I m 2 z 2 z 3 m 4 z 4. z k m3 mk k i=1 E [Z i] 2I + O (k)

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ [ 2 IC µ,k ɛ ] (f ) + O(k) Private Coins Protocol Π Public Coins Protocol Π m 1 m 2 z 1 m1 m 3 m 4. m k Information Cost I m 2 z 2 z 3 m 4 z 4. z k m3 mk k i=1 E [Z i] 2I + O (k) I = I[XY : M] = I[XY : M 1 M 2... M k ] = k I[XY : M i M 1 M 2... M i 1 ] i=1

Message Compression To prove: D µ,k ɛ+δ (f ) 1 δ [ 2 IC µ,k ɛ ] (f ) + O(k) Private Coins Protocol Π Public Coins Protocol Π m 1 m 2 z 1 m1 m 3 m 4. m k Information Cost I I = I[XY : M] = I[XY : M 1 M 2... M k ] = Sufficient to prove, m 2 z 2 z 3 m 4 z 4. z k m3 mk k i=1 E [Z i] 2I + O (k) k I[XY : M i M 1 M 2... M i 1 ] i=1 For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1)

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice m 1 m 2 m 2. m i. m k m1 Y Bob

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 m 2. m i. m k m1 Y Bob

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 z 2. m i. m k m1 Y Bob

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 z 2. m i. m k m1 Y Bob Conditioned on all earlier messages, message M i is independent of Bob s Input Y. Hence, I i = I[XY : M i m 1... m i 1 ] = I[X : M i m 1... m i 1 ]

Message Compression (Contd) To prove: For all i, E[Z i ] 2I[XY : M i M 1 M 2... M i 1 ] + O(1) X Alice z 1 m 2 z 2. z i. m k m1 mi Y Bob Conditioned on all earlier messages, message M i is independent of Bob s Input Y. Hence, I i = I[XY : M i m 1... m i 1 ] = I[X : M i m 1... m i 1 ] Main Result implies M i can be generated on Bob s side sending only I i + 2 log I i + O(1) 2I i + O(1) bits

Summarizing Results A characterization of mutual information and relative entropy in terms of communication complexity (modulo lower order log terms) An improved direct sum result for communication complexity Thank You