Communication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log

Similar documents
CS Foundations of Communication Complexity

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Lecture Lecture 9 October 1, 2015

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Lecture 16 Oct 21, 2014

Lecture: Quantum Information

An Information Complexity Approach to the Inner Product Problem

Near-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

An exponential separation between quantum and classical one-way communication complexity

Lecture 11: Quantum Information III - Source Coding

Lecture 18: Quantum Information Theory and Holevo s Bound

Information Complexity vs. Communication Complexity: Hidden Layers Game

EE5585 Data Compression May 2, Lecture 27

Tight Bounds for Distributed Functional Monitoring

Computing and Communications 2. Information Theory -Entropy

Exponential Separation of Quantum Communication and Classical Information

14. Direct Sum (Part 1) - Introduction

Lecture 19 October 28, 2015

Lecture 5 - Information theory

THE UNIVERSITY OF CHICAGO COMMUNICATION COMPLEXITY AND INFORMATION COMPLEXITY A DISSERTATION SUBMITTED TO

Classification & Information Theory Lecture #8

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Lecture 20: Lower Bounds for Inner Product & Indexing

CS Communication Complexity: Applications and New Directions

Communication vs information complexity, relative discrepancy and other lower bounds

Lower bounds for Edit Distance and Product Metrics via Poincaré-Type Inequalities

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: September 25, A quick reminder about random variables and convexity

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Distributed Statistical Estimation of Matrix Products with Applications

Extractors and the Leftover Hash Lemma

1 Basic Information Theory

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 3: Oct 7, 2014

Lecture 2: August 31

Quantitative Information Flow. Lecture 7

On Function Computation with Privacy and Secrecy Constraints

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

arxiv: v5 [cs.cc] 10 Jul 2014

Information Theory + Polyhedral Combinatorics

The Shannon s basic inequalities refer to the following fundamental properties of entropy function:

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Series 7, May 22, 2018 (EM Convergence)

ECE 4400:693 - Information Theory

Hands-On Learning Theory Fall 2016, Lecture 3

Lecture 21: Quantum communication complexity

Nondeterminism LECTURE Nondeterminism as a proof system. University of California, Los Angeles CS 289A Communication Complexity

CSCI-B609: A Theorist s Toolkit, Fall 2016 Oct 4. Theorem 1. A non-zero, univariate polynomial with degree d has at most d roots.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Problem Set: TT Quantum Information

Inaccessible Entropy and its Applications. 1 Review: Psedorandom Generators from One-Way Functions

2 Natural Proofs: a barrier for proving circuit lower bounds

Interactive information and coding theory

Fourier analysis of boolean functions in quantum computation

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Entropies & Information Theory

CS 630 Basic Probability and Information Theory. Tim Campbell

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Information Theory Primer:

(Classical) Information Theory III: Noisy channel coding

Lecture 15 - Zero Knowledge Proofs

CS Introduction to Complexity Theory. Lecture #11: Dec 8th, 2015

Majority is incompressible by AC 0 [p] circuits

1 Review of The Learning Setting

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms

Lecture 11: Key Agreement

Computational Systems Biology: Biology X

EXPONENTIAL SEPARATION OF QUANTUM AND CLASSICAL ONE-WAY COMMUNICATION COMPLEXITY

CS286.2 Lecture 8: A variant of QPCP for multiplayer entangled games

Math General Topology Fall 2012 Homework 1 Solutions

Lecture 8: Channel Capacity, Continuous Random Variables

A Tight Lower Bound for High Frequency Moment Estimation with Small Error

Lecture 26: Arthur-Merlin Games

Noisy-Channel Coding

Tutorial on Quantum Computing. Vwani P. Roychowdhury. Lecture 1: Introduction

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Course Introduction. 1.1 Overview. Lecture 1. Scribe: Tore Frederiksen

LECTURE 13. Last time: Lecture outline

Metric spaces and metrizability

Information Theory, Statistics, and Decision Trees

The one-way communication complexity of the Boolean Hidden Matching Problem

1 Randomized Computation

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Uniformly Additive Entropic Formulas

An approach from classical information theory to lower bounds for smooth codes

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Shannon s Noisy-Channel Coding Theorem

QB LECTURE #4: Motif Finding

On the tightness of the Buhrman-Cleve-Wigderson simulation

An Introduction to Quantum Computation and Quantum Information

Lecture 14 February 28

Lecture 13: Lower Bounds using the Adversary Method. 2 The Super-Basic Adversary Method [Amb02]

Interactive Channel Capacity

Lecture 3: Lower bound on statistically secure encryption, extractors

Transcription:

Communication Complexity 6:98:67 2/5/200 Lecture 6 Lecturer: Nikos Leonardos Scribe: Troy Lee Information theory lower bounds. Entropy basics Let Ω be a finite set and P a probability distribution on Ω. The entropy of a random variable X distributed according to P is defined as H(X) = x Ω P (x) log P (x). Intuitively, entropy is a means to quantify the amount of uncertainty in a distribution If the distribution is focused on a single element the entropy is zero; for the uniform distribution on Ω the entropy is maximal at log Ω. These examples set the bounds for the range of entropy, 0 H(X) log Ω. Entropy can also be interpreted in terms of compression Shannon s source coding theorem states that the optimal expected code word length of elements of a random variable X is the entropy of X. We will also make use of the conditional entropy. First consider conditioning on a single outcome. H(X Y = y) = P (x y) log P (x y). x This quantity can actually be larger than H(X). Imagine the case where random bit if Y = 0 X = 0 if Y = Then H(X) < yet H(X Y = 0) =. The conditional entropy is the expectation of the last quantity over Y H(X Y ) = E y [H(X Y = y)] H(X). This quantity is at most the entropy of X. Finally, let the joint entropy of X, Y be H(X, Y ) = H(X) + H(Y X).

.2 Mutual Information For lower bounds, we will use mutual information. For two random variables Z, Π this is defined as I(Z; Π) = H(Z) H(Z Π) = H(Π) H(Π Z) = H(Π) + H(Z) H(Π, Z). In applications to communication complexity, typically Z will be a distribution over inputs X Y and Π will be a distribution over protocol transcripts. To see how this can work to show lower bounds, let us look at a warmup example, the index function. The index function is defined as Index : 0, } n [n] 0, } where Index(x, i) = x i. Consider the one-way complexity of the index function from Alice to Bob. Let Π(X, R) be a random variable over the messages of Alice which depends on the distribution X over Alice s inputs and R random coins of Alice. Notice that H(Π) will be a lower bound on the maximum length of a message of Alice. We will actually lower bound the potentially smaller quantity I(X; Π). Use the uniform distribution over Alice s inputs. Then we have I(X; Π) = H(X) H(X Π) = n H(X Π). Now it remains to upper bound H(X Π). Using the fact that H(Y, Z) = H(Y ) + H(Z Y ) we have H(X Π) H(X j Π). j As we are dealing with one-way communication, conditioning on a particular input to Bob does not change Alice s action. Thus we have H(X j Π) = H(X j Π, B = j) that is, given that Bob s input is actually j. Finally we have by correctness of the protocol that this entropy must be small, at most H(ɛ) if the error probability is ɛ. Putting everything together we have Rɛ A B (Index) I(X; Π) ( H(ɛ))n. This example comes from Ablayev, Lower bounds for one-way probabilistic communication complexity and their application to space complexity, Theoretical Computer Science, Vol. 57(2), pg. 39 59, 996. 2 Set Intersection Lower bound We will now show a Ω(n) lower bound for the randomized two-party complexity of the set intersection problem. We will follow the proof of Bar-Yossef et al. An information statistics 2

approach to data stream and communication complexity, JCSS 68, pg 702 732, 200. We refer the reader there for full details. Consider again a random variable Π((X, Y ), R A, R B ) over transcripts. This variable depends on the distribution of inputs (X, Y ) and Alice s random bits R A and Bob s random bits R B. It is actually important for this framework that we work with private coin complexity. Define the information cost of the protocol Π as IC((X, Y ); Π). The information cost of a function is then IC(f) = min IC(Π). Π Πcorrect 2. Direct sum The idea for the proof will be to show a direct sum theorem for the information cost measure that the information cost for set intersection must be n times the information cost of the one-bit and function. To do this we will use the following fact. Fact. If Z = (Z,..., Z n ) are mutually independent then I(Z; Π) I(Z ; Π) +... + I(Z n ; Π). We will define a distribution on inputs where (X, Y ),..., (X n, Y n ) are mutually independent. In this case, I((X, Y ); Π) I((X, Y ); Π) +... + I((X n, Y n ); Π). The goal will now be to relate I((X i, Y i ); Π) to the information cost of the one-bit AND function. The distribution we will use on (X i, Y i ) is P (0, 0) = /2, P (, 0) = /, P (0, ) = /. Notice that this distribution has the property that it only gives weight to inputs which evaluate to zero on AND. This is still OK, as in the definition of information cost we just minimize over correct protocols thus the trivial protocol which always outputs zero, although it works for this distribution, is excluded. It will be useful to view this distribution another way, as a mixture of product distributions. Introduce another variable D i uniformly distributed over 0, }. Then we define 0 if D i = 0 X i = random bit D i = 0 if D i = Y i = random bit D i = 0 Now for a fixed value of D i we have a product distribution on (X i, Y i ) and the mixture of these two product distributions is the distribution defined above. The key to the direct sum property is the following claim. 3

Claim 2. I((X i, Y i ); Π D) IC(AND). Proof. We design a protocol for AND by simulating the protocol for SET INTERSECTION with variables outside of (X i, Y i ) fixed. For notational convenience, let i =. I((X, Y ); Π D) = E d2,...,d n [I((X, Y ); Π D, D 2 = d 2,..., D n = d n )] We will show that each term of this expectation is at least IC(AND) which will then give the claim. Consider a fixing D 2 = d 2,..., D n = d n. We design a protocol for AND(A, B) using the protocol Π. As D j is fixed for j =,..., n, the distribution over (X j, Y j ) is a product distribution and so can be simulated by Alice and Bob without communication using their private random bits. Alice and Bob then run the protocol Π on the input (A, B), (x 2, y 2 ),..., (x n, y n ). This claim gives that IC(SI) nic(and). Now we just have to show a lower bound on the information complexity of the AND function on one bit. 2.2 One-bit AND function We want to lower bound I((A, B); Π D) = I((A, B); Π D = 0) + I((A, B); Π D = ). As 2 2 these are symmetrical we can just focus on I((A, B); Π D = ) = I(A; Π(A, 0) D = ). If the distribution on transcripts is very different on Π(0, 0) and Π(, 0) then we will be able to determine A from looking at the transcripts, implying the information complexity is large. To do this, we will transform the problem from one about mutual information to one about a metric, the Hellinger distance. Instead of viewing Π(a, b) as a probability distribution, consider instead Ψ(a, b) the unit vector which is the entrywise square root of Π(a, b). With this transformation, Hellinger distance simply becomes a scaled version of Euclidean distance: h(ψ, Ψ 2 ) = 2 Ψ Ψ 2. We will need three key properties of Hellinger distance and its relation to mutual information. See the above paper for proof of these statements.. Mutual information and Hellinger distance: Let u, v 0, } 2 be two inputs to AND, and U R u, v}. As before let Ψ(u) be the unit vector formed by the entrywise square root of Π(u). I(U; Π) 2 Ψ(u) Ψ(v) 2. 2. Soundness: If AND(u) AND(v) and Π is a protocol with error at most ɛ, then 2 Ψ(u) Ψ(v) 2 2 ɛ. 3. Cut and paste: Let u = (x, y), v = (x, y ) and u = (x, y ), v = (x, y). Then Ψ(u) Ψ(v) = Ψ(u ) Ψ(v ).

Given these three properties we can quickly finish the proof. 2 Ψ(0, 0) Ψ(, 0)) 2 + Ψ(0, 0) Ψ(0, )) 2 ( Ψ(0, 0) Ψ(, 0) + Ψ(0, 0) + Ψ(0, ) )2 Ψ(, 0) Ψ(0, ) 2 = Ψ(0, 0) Ψ(, ) ( 2 ɛ). Where the first inequality follows by Cauchy-Schwarz, the second by triangle inequality, the third by cut and paste, and the last by soundness. 5