Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Similar documents
Lecture 16: Communication Complexity

Interactive information and coding theory

Lecture 16 Oct 21, 2014

Exponential Separation of Quantum Communication and Classical Information

Information Complexity vs. Communication Complexity: Hidden Layers Game

Communication vs information complexity, relative discrepancy and other lower bounds

Lecture Lecture 9 October 1, 2015

THE UNIVERSITY OF CHICAGO COMMUNICATION COMPLEXITY AND INFORMATION COMPLEXITY A DISSERTATION SUBMITTED TO

CS Foundations of Communication Complexity

Communication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log

Lecture 18: Quantum Information Theory and Holevo s Bound

Lecture 21: Quantum communication complexity

CS Introduction to Complexity Theory. Lecture #11: Dec 8th, 2015

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Interactive Channel Capacity

Lecture 11: Quantum Information III - Source Coding

Lower Bound Techniques for Multiparty Communication Complexity

An Information Complexity Approach to the Inner Product Problem

CS Communication Complexity: Applications and New Directions

Direct product theorem for discrepancy

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Tight Bounds for Distributed Functional Monitoring

Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback

Hardness of MST Construction

Direct product theorem for discrepancy

Other Topics in Quantum Information

How to Compress Interactive Communication

Lecture th January 2009 Fall 2008 Scribes: D. Widder, E. Widder Today s lecture topics

Efficient Probabilistically Checkable Debates

CS286.2 Lecture 8: A variant of QPCP for multiplayer entangled games

Lecture 8: Shannon s Noise Models

Information measures in simple coding problems

Asymmetric Communication Complexity and Data Structure Lower Bounds

Communication Lower Bounds via Critical Block Sensitivity

Competing Provers Protocols for Circuit Evaluation

Parallel Repetition of entangled games on the uniform distribution

Problem Set: TT Quantum Information

Fourier analysis of boolean functions in quantum computation

On the tightness of the Buhrman-Cleve-Wigderson simulation

Entropies & Information Theory

PERFECTLY secure key agreement has been studied recently

Nondeterminism LECTURE Nondeterminism as a proof system. University of California, Los Angeles CS 289A Communication Complexity

Lecture 1: Perfect Secrecy and Statistical Authentication. 2 Introduction - Historical vs Modern Cryptography

Direct product theorem for discrepancy

Communication Complexity

Quantum Communication Complexity

Majority is incompressible by AC 0 [p] circuits

Context: Theory of variable privacy. The channel coding theorem and the security of binary randomization. Poorvi Vora Hewlett-Packard Co.

Lecture: Quantum Information

CS Foundations of Communication Complexity

14. Direct Sum (Part 1) - Introduction

Lower Bounds for Number-in-Hand Multiparty Communication Complexity, Made Easy

Combinatorial Auctions Do Need Modest Interaction

Notes on Complexity Theory Last updated: November, Lecture 10

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Quantum Gates, Circuits & Teleportation

Automata Theory and Formal Grammars: Lecture 1

Shannon s noisy-channel theorem

Streaming and communication complexity of Hamming distance

Quantum Error Correcting Codes and Quantum Cryptography. Peter Shor M.I.T. Cambridge, MA 02139

Lecture 20: Lower Bounds for Inner Product & Indexing

Lecture 10 + additional notes

Gambling and Information Theory

1 Secure two-party computation

From Secure MPC to Efficient Zero-Knowledge

6.896 Quantum Complexity Theory November 4th, Lecture 18

Noisy channel communication

Lecture 26: Arthur-Merlin Games

Computing and Communications 2. Information Theory -Entropy

Report on PIR with Low Storage Overhead

Secure Multi-Party Computation. Lecture 17 GMW & BGW Protocols

Grothendieck Inequalities, XOR games, and Communication Complexity

Lecture 22. m n c (k) i,j x i x j = c (k) k=1

Towards Optimal Deterministic Coding for Interactive Communication

Partitions and Covers

Quantum rate distortion, reverse Shannon theorems, and source-channel separation

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

1 Randomized Computation

Tight Bounds for Distributed Streaming

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Lecture 2: August 31

arxiv: v3 [cs.lg] 23 Apr 2018

The Adaptive Classical Capacity of a Quantum Channel,

Introduction to Cryptography Lecture 13

Upper Bounds on the Capacity of Binary Intermittent Communication

Homework Set #2 Data Compression, Huffman code and AEP

Stream Computation and Arthur- Merlin Communication

Lecture 5: February 21, 2017

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Lecture 1: Introduction, Entropy and ML estimation

Kolmogorov complexity and its applications

Lecture 15: Privacy Amplification against Active Attackers

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Parallel Coin-Tossing and Constant-Round Secure Two-Party Computation

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Communication Complexity

Constant-rate coding for multiparty interactive communication is impossible

Transcription:

Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Coding vs complexity: a tale of two theories Coding Goal: data transmission Different channels Big questions are answered with theorems BSC 1/3 can transmit 0.052 trits per application Computational Complexity Goal: computation Models of computation Big questions are conjectures One day, we ll prove EXP requires > n 3 NAND gates

A key difference Information theory is a very effective language: fits many coding situations perfectly Shannon s channel coding theory is continuous : Turn the channel into a continuous resource; Separate the communication channel from how it is used 3

Theory of computation is discrete Von Neumann (~1948): Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics. The theory of automata, will have to share this unattractive property of formal logic. It will have to be, from the mathematical point of view, combinatorial rather than analytical. 4

Overview Today: Will discuss the extension of the information language to apply to problems in complexity theory. 5

Background: Shannon s entropy Assume a lossless binary channel. A message X is distributed according to some prior μ. The inherent amount of bits it takes to transmit X is given by its entropy H X = μ X = x log 2 (1/μ[X = x]). A X μ communication channel 6 B

Shannon s Noiseless Coding Theorem The cost of communicating many copies of X scales as H(X). Shannon s source coding theorem: Let C n X be the cost of transmitting n independent copies of X. Then the amortized transmission cost lim C n(x)/n = H X. n Operationalizes H X. 7

H(X) is nicer than C n (X) Sending a uniform trit T in {1,2,3}. Using the prefix-free encoding {0,10,11} sending on trit T 1 costs C 1 = 5/3 1.667 bits. Sending two trits (T 1 T 2 ) costs C 2 = 29 9 bits using the encoding {000,001,010,011,100,101,110,1110,1111}. The cost per trit is 29/18 1.611 < C 1. C 1 + C 1 C 2. 8

H(X) is nicer than C n (X) C 1 = 15, C 9 2 = 29 9 C 1 + C 1 C 2. The entropy H(T) = log 2 3 1.585. We have H T 1 T 2 = log 2 9 = H T 1 + H(T 2 ). H T is additive over independent variables. C n = n log 2 3 ± o(n). 9

Today We will discuss generalizing information and coding theory to interactive computation scenarios: using interaction over a channel to solve a computational problem In Computer Science, the amount of communication needed to solve a problem is studied by the area of communication complexity. 10

Communication complexity [Yao 79] Considers functionalities requiring interactive computation. Focus on the two party setting first. X A & B implement a functionality F(X,Y). Y F(X,Y) A e.g. F(X,Y) = X=Y? B 11

Communication complexity Goal: implement a functionality F(X, Y). A protocol π(x, Y) computing F(X, Y): Shared randomness R X m 1 (X,R) m 2 (Y,m 1,R) m 3 (X,m 1,m 2,R) Y A B F(X,Y) Communication cost CC π = #of bits exchanged.

Communication complexity (Distributional) communication complexity with input distribution μ and error ε: CC F, μ, ε. Error ε w.r.t. μ: CC F, μ, ε min π:μ π X,Y F X,Y ε CC(π) (Randomized/worst-case) communication complexity: CC(F, ε). Error ε on all inputs. Yao s minimax: CC F, ε = max μ CC(F, μ, ε). 13

A tool for unconditional lower bounds about computation Streaming; Data structures; Distributed computing; VLSI design lower bounds; Circuit complexity; One of two main tools for unconditional lower bounds. Connections to other problems in complexity theory (e.g. hardness amplification). 14

Set disjointness and intersection Alice and Bob each given a set X 1,, n, Y {1,, n} (can be viewed as vectors in 0,1 n ). Intersection Int n X, Y = X Y. Disjointness Disj n X, Y = 1 if X Y =, and 0 otherwise A non-trivial theorem [Kalyanasundaram-Schnitger 87, Razborov 92]: CC Disj n, 1/4 = Ω(n). Exercise: Solve Disj n with error 0 (say, 1/n) in 0.9n bits of communication. Can you do 0.6n? 0.4n?

Direct sum Int n is just n times 2-bit AND. Disj n is a disjunction of 2-bit ANDs. What is the connection between the communication cost of one AND and the communication cost of n ANDs? Understanding the connection between the hardness of a problem and the hardness of its pieces. A natural approach to lower bounds. 16

How does CC scale with copies? CC F n, μ n, ε /n? Recall: CC F, μ, ε? lim n C n (X)/n = H X Information complexity is the corresponding scaling limit for CC F n, μ n, ε /n. Helps understand problems composed of smaller problems. 17

Interactive information complexity Information complexity :: communication complexity as Shannon s entropy :: transmission cost 18

Information theory in two slides For two (potentially correlated) variables X, Y, the conditional entropy of X given Y is the amount of uncertainty left in X given Y: H X Y E y~y H X Y = y. One can show H XY = H Y + H(X Y). This important fact is knows as the chain rule. If X Y, then H XY = H X + H Y X = H X + H Y. 19

Mutual information The mutual information is defined as I X; Y = H X H X Y = H Y H(Y X) How much knowing X reduce the uncertainty of Y? Conditional mutual information: I X; Y Z H X Z H(X YZ) Simple intuitive interpretation. 20

The information cost of a protocol Prior distribution: X, Y μ. X Y A Protocol Protocol π transcript Π Depends IC(π, μ) = I(Π; Y X) + I(Π; X Y) Π and μ what Alice learns about Y + what Bob learns about X B on both

Example F is X = Y?. μ is a distribution where X = Y w.p. ½ and (X, Y) are random w.p. ½. X Y SHA-256(X) [256 bits] X=Y? [1 bit] A B IC(π, μ) = I(Π; Y X) + I(Π; X Y) 1 + 129 = 130 bits what Alice learns about Y + what Bob learns about X

The information complexity of a problem Communication complexity: CC F, μ, ε Analogously: min π computes F with error ε CC(π). Needed! IC F, μ, ε inf IC(π, μ). π computes F with error ε (Easy) fact: IC F, μ, ε CC F, μ, ε. 23

Information = amortized communication Recall: lim n C n (X)/n = H X Theorem: [B.-Rao 11] lim n CC(F n, μ n, ε)/n = IC F, μ, ε. Corollary: lim n CC(Int n, 0 + )/n = IC AND, 0

The two-bit AND Alice and Bob each have a bit X, Y {0,1} distributed according to some μ on 0,1 2. Want to compute X Y, while revealing to each other as little as possible to each others inputs (w.r.t. the worst μ). Answer IC(AND, 0) is a number between 1 and 2. 25

The two-bit AND Results [B.-Garg-Pankratov-Weinstein 13]: IC AND, 0 1.4922 bits. Find the value of IC AND, μ, 0 for all priors μ and exhibit the informationtheoretically optimal protocol for computing the AND of two bits. Studying IC AND, μ, 0 as a function R +4 /R + R + is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions). 26

The two-bit AND Studying IC AND, μ, 0 as a function R +4 /R + R + is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions). We adopt a guess and verify strategy, although the general question of computing the information complexity of a function from its truth table is a very interesting one. 27

The optimal protocol for AND 1 X {0,1} Y {0,1} A B If X=1, A=1 If Y=1, B=1 If X=0, A=U [0,1] If Y=0, B=U 0 [0,1]

Raise your hand when your number is reached The optimal protocol for AND 1 X {0,1} Y {0,1} A B If X=1, A=1 If Y=1, B=1 If X=0, A=U [0,1] If Y=0, B=U 0 [0,1]

Corollary: communication complexity of intersection Corollary: lim ε 0 CC Int n, ε 1.4922 n ± o n. Specifically, e.g. CC Int n, 1 n 1.4922 n ± o n. Note: Require ω(1) rounds of interaction. Using r rounds results in +Θ n r 2 cost! 30

Communication complexity of Disjointness With some additional work, obtain a tight bound on the communication complexity of Disj n with tiny error: where C DISJ lim CC Disj n, ε = C DISJ n ± o(n), ε 0 max μ:μ 1,1 =0 IC AND, μ, 0 0.4827 Intuition: Disj n is an n-wise repetition of AND, where the probability of a (1,1) is very low ( 1). 31

Beyond two parties Disjointness in the coordinator model [B.-Ellen- Oshman-Pitassi- Vaikuntanathan 13]. k players. Each p i holding a subset S i {1,, n} Want to decide whether the intersection S i is empty. k 32

Disj in the coordinator model k players, input length n. Naïve protocol: O(n k) communication. Turns out to be asymptotically optimal! The argument uses information complexity. The hard part is to design the hard distribution and the right information cost measure. 33

The hard distribution S i {1,, n}. Want to decide whether the intersection S i is empty. Should have very few (close to 0) intersections. 34

Attempt #1: The hard distribution Plant many 0 s (e.g. 50%): Coordinator keeps querying players until she finds a 0: ~O n communication k 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 n 35

Attempt #2: k The hard distribution Each player sends its 0 s: still O n log n communication Plant one zero in each coordinate 1 0 0 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 n 36

The hard distribution Mix the two attempts. Each coordinate has an RV M i B 1/3. If M i = 0, plant many 0 s (e.g. 50%). If M i = 1, plant a single 0. M i k 0 1 0 0 1 0 1 1 0 1 1 1 1 1 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 n 37

The information cost notion Assume the coordinator knows the M i s, player j knows the X ij. The information cost: what the coordinator learns about the X ij s+ what each player learns about the M i s Proving that the sum total is Ω(n k) requires some work, but the hardest part are the definitions. 38

Intuition for hardness M i X ij Focus on a single i, j pair: i th coordinate, j th player. (M i, X ij ) are equally likely to be (0,0), (0,1) and (1,1) If M i = 1, then the coordinator needs to know X ij (which is almost certainly 1 in this case). Either P i will learn about M i, or will reveal too much about X ij when M i = 0. 39

Multiparty information complexity We don t have a multiparty information complexity theory for general distributions. There is a fair chance that the difficulty is conceptual. One key difference between 2 players and 3+ players is the existence of secure multiparty computation. 40

Beyond communication Many applications to other interactive communication regimes: Distributed joint computation & estimation; Streaming; Noisy coding We will briefly discuss a noncommunication application: two prover games. 41

Two-prover games Closely connected to hardness of approximation: Probabilistically Checkable Proofs and the Unique Games Conjecture. A nice way of looking at constraint satisfaction problems. 42

The odd cycle game A B Alice and Bob want to convince Victoria that the 7-cycle is 2-colorable. Asks them to color the same or adjacent vertices. Accepts if consistent. 43

The odd cycle game V b A V a B V a V b OK V 44

The odd cycle game A V a V b B V a V b OK V 45

The odd cycle game A V a B V b V a V b V 46

The odd cycle game An example of a unique game. If the cycle is even: Alice and Bob win with probability 1. For odd cycle of length m, win with probability p 1 = 1 1 2m. What about winning many copies of the game simultaneously? 47

Simultaneous challenges V a 1 V b 1 V a 2 V b 2 V b 3 V a 3 V a k V b k Alice gets V a 1, V a 2, V a 3,, V a k and returns a vector of colors. Bob gets V b 1, V b 2, V b 3,, V b k and returns a vector of colors. Avoid jail if all color pairs are consistent.

Play k = m 2 copies. Parallel repetition A naïve strategy: 1 1 2m Can one do better? m 2 = e Θ m 1 49

Play k = m 2 copies. Parallel repetition A naïve strategy: 1 1 m 2 = e Θ m 1 2m It turns out that one can win m 2 copies of the odd cycle game with a constant probability [Raz 08]. Proof by exhibiting a strategy. 50

Connection to foams Connected to foams : tilings of R d with a shape A so that A + Z d = R d. What can the smallest surface area of A be? [Feige-Kindler-O Donnell 07] 51

Connection to foams Obvious upper bound: O(d). Obvious lower bound (sphere of volume 1): Ω( d). [Feige-Kindler-O Donnell 07]: Noticed a connection between the problems. [Kindler-O Donnel-Rao-Wigderson 08]: a construction of foams of surface area O( d) based on Raz s strategy. 52

An information-theoretic view A V a B V b Advice on where to cut the cycle wins the game with probability 1 if the cut does not pass through the challenge edge. 53

An information-theoretic view A V a B V b Merlin can give such advice at information cost O 1 m 2. 54

The distribution V b V a V b KL-divergence between the two distributions is Θ 1 m 2 Statistical distance is Θ 1 m 55

Taking m 2 copies Total information revealed by Merlin: m 2 O 1 m 2 = O 1. Can be simulated successfully with O 1 communication, or with no communication with probability Ω(1).

Parallel repetition Using similar intuition (but more technical work) can obtain a general tight parallel repetition problem in the small value regime. [B.-Garg 15] If one copy of a game G has success probability δ < 1/2, then m copies have success probability < δ Ω m (*for projection games ; for general games tight bound a bit more complicated) [Dinur-Steurer 14] obtained the result for projection games using spectral techniques. 57

Challenges Information complexity beyond two communicating parties. Continuous measures of complexity. 58

Thank You! 59