Lecture 16 Oct 21, 2014

Similar documents
Information Complexity vs. Communication Complexity: Hidden Layers Game

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Lecture Lecture 9 October 1, 2015

Exponential Separation of Quantum Communication and Classical Information

Information & Correlation

Lecture 18: Quantum Information Theory and Holevo s Bound

Lecture 11: Quantum Information III - Source Coding

Lecture 1: Shannon s Theorem

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Interactive information and coding theory

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Lecture 20: Lower Bounds for Inner Product & Indexing

Interactive Channel Capacity

CS Communication Complexity: Applications and New Directions

Entropy as a measure of surprise

Communication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback

Noisy channel communication

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Intro to Information Theory

Lecture: Quantum Information

Lecture 10 Oct. 3, 2017

CS4800: Algorithms & Data Jonathan Ullman

Lecture 6: Quantum error correction and quantum capacity

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Towards Optimal Deterministic Coding for Interactive Communication

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Communication vs information complexity, relative discrepancy and other lower bounds

(Classical) Information Theory III: Noisy channel coding

Lecture 6: Expander Codes

to mere bit flips) may affect the transmission.

Entropies & Information Theory

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

A Mathematical Theory of Communication

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

The binary entropy function

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrovsky. Lecture 10

Asymmetric Communication Complexity and Data Structure Lower Bounds

ECE 587 / STA 563: Lecture 5 Lossless Compression

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

Lecture 5, CPA Secure Encryption from PRFs

ECE 587 / STA 563: Lecture 5 Lossless Compression

18.310A Final exam practice questions

EE5585 Data Compression May 2, Lecture 27

Lecture 11: Continuous-valued signals and differential entropy

How to Compress Interactive Communication

14. Direct Sum (Part 1) - Introduction

Randomness. What next?

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Entanglement and information

Lecture 3,4: Multiparty Computation

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 11. Error Correcting Codes Erasure Errors

Internal Compression of Protocols to Entropy

CSCI 2570 Introduction to Nanocomputing

Kernelization Lower Bounds: A Brief History

25.2 Last Time: Matrix Multiplication in Streaming Model

Lecture 1 : Data Compression and Entropy

Lecture 21: Quantum communication complexity

Hardness of MST Construction

Entanglement Manipulation

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

6.02 Fall 2011 Lecture #9

An introduction to basic information theory. Hampus Wessman

CMPUT 675: Approximation Algorithms Fall 2014

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Lecture 15: Privacy Amplification against Active Attackers

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

CS Foundations of Communication Complexity

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Communication Complexity

Dept. of Linguistics, Indiana University Fall 2015

Partitions and Covers

Asymptotic redundancy and prolixity

Lecture 1: Introduction, Entropy and ML estimation

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory, Statistics, and Decision Trees

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

COS597D: Information Theory in Computer Science October 19, Lecture 10

Tutorial on Quantum Computing. Vwani P. Roychowdhury. Lecture 1: Introduction

Entanglement and Quantum Teleportation

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 8: Shannon s Noise Models

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

1 Introduction to information theory

Evidence that the Diffie-Hellman Problem is as Hard as Computing Discrete Logs

Homework Set #2 Data Compression, Huffman code and AEP

(Classical) Information Theory II: Source coding

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

An Information Complexity Approach to the Inner Product Problem

3F1 Information Theory, Lecture 3

Streaming and communication complexity of Hamming distance

Lecture 11: Polar codes construction

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Introduction to Cryptography Lecture 13

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Transcription:

CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve with the average number of bits sent almost precisely equal to the entropy. Then we will talk about communication complexity and information cost, which is the asymptotic number of bits two parties need to transmit in a conversation in order to compute some function of their inputs. We will also talk about adaptive sparse recovery, which is like the conversation version of sparse recovery. 2 Information and compression 2. Entropy Suppose we have a distribution P over the letters. We want to know how many bits we need to send. In particular, we want a prefix code x c x {, }, which is basically a binary tree. In the example in Figure??, the string TAN is encoded as. We want to minimize the average length E x P c x, which we will call l. E T A O I N Figure : A binary tree corresponding to a prefix code. Theorem ([?]). l H(P ), where H(P ) = x P (x) log 2 P (x) = E x P log 2 P (x) Huffman[?] gave a simple optimal construction of a prefix code with l < H(P ) +. is the entropy.

Suppose p = { with probability / with probability 99/. Then H(p) = 99 log + log 99 7 + 8. In this case, Huffman s code still needs about bit. So, the entropy is essentially rounded up. In general, if x,..., x n P, then we can encode (x,..., x n ) in at most H(x,..., x n ) + = nh(p ) + bits, while it cannot be done in less than nh(p ) bits. This is a prettily strong direct sum statement. Solving copy costs H(P ) l < H(P ) +, while solving n copies costs on average H(P ) l < H(P ) + /n. This shows that one can gain just a little but not much by doing multiple copies. 2.2 Huffman coding The algorithm of Huffman coding is actually quite simple. Given the distribution, the algorithm sorts the letters by frequency. Then, the last two letters are merged and their frequencies are summed up into a new node. This process repeats until everything is merged together, resulting a binary tree. Figure?? shows the first two iterations of the algorithms. (i) % 7%....69%.6%.5%.2% E T A I O N... K Z X Q (ii) % 7%....7%.69%.6% E T A I O N... K Z X Q (iii) % 7%....29%.7% E T A I O N... K Z X Q Figure 2: The first two iterations of the algorithm. (i) Initially, the letters are sorted by frequency. (ii) In the first iteration, X and Q are merged. (iii) In the second iteration, K and Z are merged. We can show that c x < log P (x) +. (Intuitively, if P (x) = 2 k, then there are at most 2 k nodes with larger frequencies and they can fit into a complete binary tree. So, c x k.) This implies l = E c x < H(P ) +. 3 Communication Complexity and Information Cost 3. Communication Complexity In the communication setting, Alice has some input x and Bob has some input y. They share some public randomness and want to compute f(x, y). Alice sends some message m, and then Bob 2

responds with m 2, and then Alice responds with m 3, and so on. At the end, Bob outputs f(x, y). They can choose a protocol Π, which decides how to assign what you send next based on the messages you have seen so far and your input. The total number of bits transfered is Π = m i. The communication complexity of the protocol Π is CC µ (Π) = E µ ( Π ), where µ is a distribution over the inputs (x, y) and the protocol. The communication complexity of the function f for a distribution µ is CC µ (f) = The communication complexity of the function f is min CC µ(π). Π solves f with 3/4 prob CC(f) = max µ CC µ(f). 3.2 Information Cost Information cost is related to communication complexity, as entropy is related to compression. Recall that the entropy is H(X) = p(x) log p(x). Now, the mutual information I(X; Y ) = H(X) H(X Y ) between X and Y is how much a variable Y tells you about X. It is actually interesting that we also have I(X; Y ) = H(Y ) H(Y X). The information cost of a protocol Π is IC(Π) = I(X; Π Y ) + I(Y ; Π X). This is how much Bob learns from the protocol about X plus how much Alice learns from the protocol about Y. The information cost of a function f is IC(f) = min IC(Π). Π solves f For all protocol Π, we have IC(Π) E Π = CC(Π), because there are at most b bits of information if there are only b bits transmited in the protocol. Taking the minimum over all protocols implies IC(f) CC(f). This is analogous to Shannon s result that H l. It is really interesting that the asymptotic statement is true. Suppose we want to solve n copies of the communication problem. Alice given x,..., x n and Bob given y,..., y n, they want to solve f(x, y ),..., f(x n, y n ), each failing at most /4 of the time. We call this problem the direct sum f n. Then, for all functions f, it is not hard to show that IC(f n ) = nic(f). Theorem 2 ([?]). CC(f n ) n IC(f) as n. In the limit, this theorem suggests that information cost is the right notion. 3

3.3 Separation of Information and Communication The remaining question is, for a single function, whether CC(f) IC(f), in particular whether CC(f) = IC(f)O() + O(). If this is true, it would prove the direct sum conjecture CC(f n ) ncc(f) O(). The recent paper by Ganor, Kol and Raz [?] showed that it is not true. They gave a function f for which IC(f) = k and CC(f) 2 Ω(k). This is the best because it was known before this that CC(f) 2 O(IC(f)). The function that they gave has input size 2 22k. So, it is still open whether CC(f) IC(f) log log input. A binary tree with depth 2 2k is split into levels of width k. For every node v in the tree, there are two associated values x v and y v. There is a random special level of width k. Outside this special level, we have x v = y v for all v. We think about x v and y v as which direction you ought to go. So, if they are both, you want to go in one direction. If they are both, you want to go in the other. Within the special level, the values x v and y v are uniform. At the bottom of the special level, v is good if the path to v is mostly (8%) following directions. The goal is to agree on any leaf v where v is a descendent of some good vertex. What makes it tricky is that you do not know where the special level is, because if you knew where the special level was, then O(k) communication suffices. The problem is you do not know where the special level is. You can try binary searching to find the special level, taking O(2 k ) communication. This is basically the best you can do apparently. We can construct a protocol with information cost only O(k). It is okay to transmit something very large as long as the amount of information contained in it is small. Alice can transmit her path and Bob just follows it, and that is a large amount of communication but it is not so much information because Bob knows what the first set would be. The issue is that it still gives you 2 k bits of information knowing where the special level is. The idea is instead that Alice chooses a noisy path where 9% of the time follows her directions and % deviates. This path is transmitted to Bob. It can be shown that this protocol only has O(k) information. Therefore, many copies can get more efficient. 4 Adaptive Sparse Recovery Adaptive sparse recovery is like the conversation version of sparse recovery. In non-adpative sparse recovery, Alice has i [n] and sets x = e i + w. She transmits y = Ax = Ae i + w. Bob receives y and recovers y ˆx x î i. In this one-way conversation, we showed 4

in last class that I(î; i) I(y; i) m(.5 log( + SNR)) m H(i) H(i î) m log n (.25 log n + ) m m log n. In the adaptive case, we have something more of a conversation. Alice knows x. Bob sends v and Alice sends back v, x. Then, Bob sends v 2 and Alice sends back v 2, x. And then, Bob sends v 3 and Alice sends back v 3, x, and so on. To show a lower bound, consider stage r. Define P as the distribution of (i y,..., y r ). Then, the observed information by round r is b = log n H(P ) = E i P log(np i ). For a fixed v depending on P, as i P, we know that I( v, x ; i) ( 2 log + E i P vi 2 ) v i 2 2 /n. With some algebra (Lemma 3. in [?]), we can bound the above expression by O(b + ). It means that on average the number of bits that you get at the next stage is 2 times what you had at the previous stage. This implies that R rounds take Ω(R log /R n) measurements. And in general, it takes Ω(log log n) measurements. References [BR] M. Braverman, A. Rao. Information Equals Amortized Communication. FOCS 2: 748-757 [GKR4] A. Ganor, G. Kol, R. Raz. Exponential Separation of Information and Communication. ECCC, Revision of Report No. 49 (24) [Huf52] D. A. Huffman. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE, vol.4, no.9, pp.98,, Sept. 952 [PW3] Eric Price, David P. Woodruff. Lower Bounds for Adaptive Sparse Recovery. SODA 23: 652-663. [Sha48] C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal vol.27, no.4, pp.623-656, Oct. 948 5