Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Similar documents
Information Theoretic Limits of Randomness Generation

Distributed Lossless Compression. Distributed lossless compression system

Communication Cost of Distributed Computing

The Gallager Converse

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Extended Gray Wyner System with Complementary Causal Side Information

A Comparison of Superposition Coding Schemes

3F1 Information Theory, Lecture 1

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Lecture 22: Final Review

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Distributed Functional Compression through Graph Coloring

Entropies & Information Theory

Lecture 10: Broadcast Channel and Superposition Coding

(Classical) Information Theory III: Noisy channel coding

Distributed Source Coding Using LDPC Codes

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

EE5585 Data Compression May 2, Lecture 27

Graph Coloring and Conditional Graph Entropy

ECEN 655: Advanced Channel Coding

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

On Common Information and the Encoding of Sources that are Not Successively Refinable

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Reliable Computation over Multiple-Access Channels

Motivation for Arithmetic Coding

Interactive Decoding of a Broadcast Message

Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime

Chapter 2: Source coding

New Results on the Equality of Exact and Wyner Common Information Rates

Lecture 4 Noisy Channel Coding

EE 4TM4: Digital Communications II. Channel Capacity

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Chapter 9 Fundamental Limits in Information Theory

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Information Theory - Entropy. Figure 3

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 5 Channel Coding over Continuous Channels

Joint Source-Channel Coding for the Multiple-Access Relay Channel

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road UNIT I

Slepian-Wolf Code Design via Source-Channel Correspondence

Multiterminal Source Coding with an Entropy-Based Distortion Measure

On Multiple User Channels with State Information at the Transmitters

ECE Information theory Final (Fall 2008)

Secret Key and Private Key Constructions for Simple Multiterminal Source Models

Digital Communication Systems ECS 452. Asst. Prof. Dr. Prapun Suksompong 5.2 Binary Convolutional Codes

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Subset Typicality Lemmas and Improved Achievable Regions in Multiterminal Source Coding

Upper Bounds on the Capacity of Binary Intermittent Communication

ON DISTRIBUTED ARITHMETIC CODES AND SYNDROME BASED TURBO CODES FOR SLEPIAN-WOLF CODING OF NON UNIFORM SOURCES

Homework Set #2 Data Compression, Huffman code and AEP

Lecture 4 Channel Coding

Lecture 15: Conditional and Joint Typicaility

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Turbo Compression. Andrej Rikovsky, Advisor: Pavol Hanus

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Interactive Hypothesis Testing with Communication Constraints

Principles of Communications

1 Background on Information Theory

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Katalin Marton. Abbas El Gamal. Stanford University. Withits A. El Gamal (Stanford University) Katalin Marton Withits / 9

Coding Techniques for Primitive Relay Channels

X 1 : X Table 1: Y = X X 2

The Poisson Channel with Side Information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

ECE Information theory Final

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Performance of Polar Codes for Channel and Source Coding

Channel Polarization and Blackwell Measures

Lecture 3: Channel Capacity

1 Introduction to information theory

Information Theory and Coding Techniques

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Low Density Parity Check (LDPC) Codes and the Need for Stronger ECC. August 2011 Ravi Motwani, Zion Kwok, Scott Nelson

Quantum rate distortion, reverse Shannon theorems, and source-channel separation

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Proceedings of the DATA COMPRESSION CONFERENCE (DCC 02) /02 $ IEEE

COMM901 Source Coding and Compression. Quiz 1

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Lecture 5: Asymptotic Equipartition Property

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

5218 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006

Lecture 11: Polar codes construction

Universal Incremental Slepian-Wolf Coding

Lecture 2: August 31

Variable Length Codes for Degraded Broadcast Channels

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Design of Optimal Quantizers for Distributed Source Coding

Practical Coding Scheme for Universal Source Coding with Side Information at the Decoder

Exercise 1. = P(y a 1)P(a 1 )

Simultaneous Nonunique Decoding Is Rate-Optimal

Entropy as a measure of surprise

Transcription:

Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014

Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2 / 36

Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 A couple of professors at Stanford were being quoted in the press saying that CDMA violated the laws of physics. El Gamal (Stanford University) Disclaimer Viterbi Lecture 2 / 36

About this lecture Common information between correlated information sources El Gamal (Stanford University) Outline Viterbi Lecture 3 / 36

About this lecture Common information between correlated information sources The lecture is tutorial in nature with many examples El Gamal (Stanford University) Outline Viterbi Lecture 3 / 36

About this lecture Common information between correlated information sources The lecture is tutorial in nature with many examples No proofs, but a potentially interesting framework and new result El Gamal (Stanford University) Outline Viterbi Lecture 3 / 36

About this lecture Common information between correlated information sources The lecture is tutorial in nature with many examples No proofs, but a potentially interesting framework and new result Outline: Brief introduction to information theory Information sources and measuring information El Gamal (Stanford University) Outline Viterbi Lecture 3 / 36

About this lecture Common information between correlated information sources The lecture is tutorial in nature with many examples No proofs, but a potentially interesting framework and new result Outline: Brief introduction to information theory Information sources and measuring information Brief introduction to network information theory Correlated information sources and measuring common information El Gamal (Stanford University) Outline Viterbi Lecture 3 / 36

About this lecture Common information between correlated information sources The lecture is tutorial in nature with many examples No proofs, but a potentially interesting framework and new result Outline: Brief introduction to information theory Information sources and measuring information Brief introduction to network information theory Correlated information sources and measuring common information Some of the basic models, ideas, and results of information theory El Gamal (Stanford University) Outline Viterbi Lecture 3 / 36

Information theory The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning... These semantic aspects of communication are irrelevant to the engineering problem. A Mathematical Theory of Communication, Shannon (1948) El Gamal (Stanford University) Information theory Viterbi Lecture 4 / 36

Information theory Mathematical models for the source and the channel El Gamal (Stanford University) Information theory Viterbi Lecture 5 / 36

Information theory Mathematical models for the source and the channel Measures of information (entropy and mutual information) Limits on compression and communication El Gamal (Stanford University) Information theory Viterbi Lecture 5 / 36

Information theory Mathematical models for the source and the channel Measures of information (entropy and mutual information) Limits on compression and communication Bits as a universal interface between the source and the channel El Gamal (Stanford University) Information theory Viterbi Lecture 5 / 36

Information theory Coding schemes that achieve the Shannon limits Viterbi decoding algorithm (Viterbi 1967) Lempel Ziv compression algorithm (Ziv Lempel 1977, 1978) Trellis coded modulation (Ungerboeck 1982) Turbo codes (Berrou Glavieux 1996) LDPC codes (Gallager 1963, Richardson Urbanke 2008) Polar codes (Arıkan 2009) El Gamal (Stanford University) Information theory Viterbi Lecture 5 / 36

Information source An information source which produces a message or sequence of messages to be communicated to the receiving terminal. The message may be of various types: e.g. (a) A sequence of letters as in a telegraph of teletype system; (b) A single function of time f(t) as in radio or telephony;... A Mathematical Theory of Communication, Shannon (1948) El Gamal (Stanford University) Information sources Viterbi Lecture 6 / 36

Modeling the information source We can think of a discrete source as generating the message, symbol by symbol...a physical system, or a mathematical model of a system which produces such a sequence of symbols governed by a set of probabilities, is known as a stochastic process. El Gamal (Stanford University) Information sources Viterbi Lecture 7 / 36

Measuring information of the source Can we define a quantity which will measure, in some sense, how much information is produced by such a process, or better, at what rate information is produced? El Gamal (Stanford University) Information sources Viterbi Lecture 8 / 36

Measuring information of the source Suppose we have a set of possible events whose probabilities of occurrence are p 1, p 2,..., p n....if there is such a measure, say H(p 1, p 2,..., p n ), it is reasonable to require of it the following properties: 1. H should be continuous in the p i. 2. If all the p i are equal, H should be monotonic increasing function of n. 3. If a choice be broken down into two successive choices, the original H should be the weighted sum of the individual values of H. El Gamal (Stanford University) Information sources Viterbi Lecture 8 / 36

Measuring information of the source Theorem 2: The only H satisfying the three above assumptions is of the form: H = K n i=1 p i logp i El Gamal (Stanford University) Information sources Viterbi Lecture 8 / 36

Measuring information of the source Theorem 2: The only H satisfying the three above assumptions is of the form: H = K n i=1 p i logp i This theorem, and the assumptions required for its proof, are in no way necessary for the present theory.... The real justification of these definitions, however, will reside in their implications. A Mathematical Theory of Communication, Shannon (1948) El Gamal (Stanford University) Information sources Viterbi Lecture 8 / 36

Setups for measuring information Entropy arises naturally as a measure of information in many settings El Gamal (Stanford University) Measuring information Viterbi Lecture 9 / 36

Setups for measuring information Entropy arises naturally as a measure of information in many settings Will describe three of them Zero-error compression Randomness extraction Source simulation El Gamal (Stanford University) Measuring information Viterbi Lecture 9 / 36

Setups for measuring information Entropy arises naturally as a measure of information in many settings Will describe three of them Zero-error compression Randomness extraction Source simulation Assume a discrete memoryless source (X, p(x)) (DMS X) X generates independent identically distributed (i.i.d.) X 1,X 2,... p(x) El Gamal (Stanford University) Measuring information Viterbi Lecture 9 / 36

Setups for measuring information Entropy arises naturally as a measure of information in many settings Will describe three of them Zero-error compression Randomness extraction Source simulation Assume a discrete memoryless source (X, p(x)) (DMS X) X generates independent identically distributed (i.i.d.) X 1,X 2,... p(x) Example: Bernoulli source with parameter p [0, 1] (Bern(p) source) X = {0,1}, p X (1) = p; generates i.i.d X 1,X 2,... Bern(p) El Gamal (Stanford University) Measuring information Viterbi Lecture 9 / 36

Setups for measuring information Entropy arises naturally as a measure of information in many settings Will describe three of them Zero-error compression Randomness extraction Source simulation Assume a discrete memoryless source (X, p(x)) (DMS X) X generates independent identically distributed (i.i.d.) X 1,X 2,... p(x) Example: Bernoulli source with parameter p [0, 1] (Bern(p) source) X = {0,1}, p X (1) = p; generates i.i.d X 1,X 2,... Bern(p) Example: DNA source, X = {A,G,C,T}; p(x) = [p 1 p 2 p 3 p 4 ] El Gamal (Stanford University) Measuring information Viterbi Lecture 9 / 36

Zero-error compression DMS X X n c(x n ) {0,1} X Encoder Decoder n X n = (X 1,X 2,...,X n ) i.i.d. p(x) Variable-length prefix-free code: c(x n ) {0,1} = {0,1,00,01,10,11,...} El Gamal (Stanford University) Measuring information Viterbi Lecture 10 / 36

Zero-error compression DMS X X n c(x n ) {0,1} X Encoder Decoder n X n = (X 1,X 2,...,X n ) i.i.d. p(x) Variable-length prefix-free code: c(x n ) {0,1} = {0,1,00,01,10,11,...} Example: X = {A,G,C,T}, p(x) = [5/8 1/4 1/16 1/16], n = 1 0 0 A 1 0 1 10 G 0 110 C 1 111 T El Gamal (Stanford University) Measuring information Viterbi Lecture 10 / 36

Zero-error compression DMS X X n c(x n ) {0,1} X Encoder Decoder n X n = (X 1,X 2,...,X n ) i.i.d. p(x) Variable-length prefix-free code: c(x n ) {0,1} = {0,1,00,01,10,11,...} Example: X = {A,G,C,T}, p(x) = [5/8 1/4 1/16 1/16], n = 1 0 1 0 A 0 1 10 G 0 1 110 C 111 T Let L n be the codeword length and R n = (1/n)E(L n ) bits/symbol E(L 1 ) = 1 5 8 +2 1 4 +3 1 8 = 1.5 bits El Gamal (Stanford University) Measuring information Viterbi Lecture 10 / 36

Zero-error compression DMS X X n c(x n ) {0,1} X Encoder Decoder n X n = (X 1,X 2,...,X n ) i.i.d. p(x) Variable-length prefix-free code: c(x n ) {0,1} = {0,1,00,01,10,11,...} Let L n be the codeword length and R n = (1/n)E(L n ) bits/symbol Measure the information rate of X by: R = inf n min codes R n El Gamal (Stanford University) Measuring information Viterbi Lecture 10 / 36

Zero-error compression DMS X X n c(x n ) {0,1} X Encoder Decoder n X n = (X 1,X 2,...,X n ) i.i.d. p(x) Variable-length prefix-free code: c(x n ) {0,1} = {0,1,00,01,10,11,...} Let L n be the codeword length and R n = (1/n)E(L n ) bits/symbol Measure the information rate of X by: R = inf n min codes R n Information rate is the entropy (Shannon 1948) R = H(X) = p(x) log p(x) bits/symbol x El Gamal (Stanford University) Measuring information Viterbi Lecture 10 / 36

Entropy examples X Bern(p): H(X) = H(p) = plogp (1 p)log(1 p) 1 0.8 0.6 H(X) 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p El Gamal (Stanford University) Measuring information Viterbi Lecture 11 / 36

Entropy examples X Bern(p): H(X) = H(p) = plogp (1 p)log(1 p) 1 0.8 0.6 H(X) 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 For DNA source: H(X) = 1.424 bits (E(L 1 ) = 1.5) p El Gamal (Stanford University) Measuring information Viterbi Lecture 11 / 36

Randomness extraction...,x 2,X 1 Extractor B n X is a DMS; B n is an i.i.d. Bern(1/2) sequence Let L n be length of X sequence and R n = n/e(l n ) bits/symbol El Gamal (Stanford University) Measuring information Viterbi Lecture 12 / 36

Randomness extraction...,x 2,X 1 Extractor B n X is a DMS; B n is an i.i.d. Bern(1/2) sequence Let L n be length of X sequence and R n = n/e(l n ) bits/symbol Example: Let X be Bern(1/3) source, n = 1 (B 1 Bern(1/2)) X 1 = 0 X 1 = 1 X 2 = 0 1 0 X 2 = 1 01 0 10 1 0001 0 0010 1 1101 0 1110 1 El Gamal (Stanford University) Measuring information Viterbi Lecture 12 / 36

Randomness extraction...,x 2,X 1 Extractor B n X is a DMS; B n is an i.i.d. Bern(1/2) sequence Let L n be length of X sequence and R n = n/e(l n ) bits/symbol Example: Let X be Bern(1/3) source, n = 1 (B 1 Bern(1/2)) X 1 = 0 X 1 = 1 X 2 = 0 1 0 X 2 = 1 01 0 10 1 0001 0 0010 1 1101 0 1110 1 E(L 1 ) = 4 9 2+ 5 9 (2+E(L 1)) R 1 = 1/E(L 1 ) = 2 9 El Gamal (Stanford University) Measuring information Viterbi Lecture 12 / 36

Randomness extraction...,x 2,X 1 Extractor B n X is a DMS; B n is an i.i.d. Bern(1/2) sequence Let L n be length of X sequence and R n = n/e(l n ) bits/symbol The information rate of X: R = sup n max extractor R n El Gamal (Stanford University) Measuring information Viterbi Lecture 12 / 36

Randomness extraction...,x 2,X 1 Extractor B n X is a DMS; B n is an i.i.d. Bern(1/2) sequence Let L n be length of X sequence and R n = n/e(l n ) bits/symbol The information rate of X: R = sup n max extractor R n Information rate is the entropy (Elias 1972) R = H(X) bits/symbol El Gamal (Stanford University) Measuring information Viterbi Lecture 12 / 36

Source simulation...,b 2,B 1 Generator X n B is a Bern(1/2) source; X n i.i.d. p(x) sequence Let L n be length of the B sequence and R n = (1/n)E(L n ) bits/symbol El Gamal (Stanford University) Measuring information Viterbi Lecture 13 / 36

Source simulation...,b 2,B 1 Generator X n B is a Bern(1/2) source; X n i.i.d. p(x) sequence Let L n be length of the B sequence and R n = (1/n)E(L n ) bits/symbol Example: Let X be Bern(1/3) source, n = 1 B 1 = 0 0 0 B 1 = 1 B 2 = 0 B 2 = 1 10 1 110 0 1110 1 P{X 1 = 1} = 1 4 + 1 4 2 + 1 4 3 + = 1 3 E(L 1 ) = 1 2 1+ 1 4 2+ 1 4 (2+E(L 1)) E(L 1 ) = 2 El Gamal (Stanford University) Measuring information Viterbi Lecture 13 / 36

Source simulation...,b 2,B 1 Generator X n B is a Bern(1/2) source; X n i.i.d. p(x) sequence Let L n be length of the B sequence and R n = (1/n)E(L n ) bits/symbol The information rate of X: R = inf n min generator R n El Gamal (Stanford University) Measuring information Viterbi Lecture 13 / 36

Source simulation...,b 2,B 1 Generator X n B is a Bern(1/2) source; X n i.i.d. p(x) sequence Let L n be length of the B sequence and R n = (1/n)E(L n ) bits/symbol The information rate of X: R = inf n min generator R n Information rate is the entropy (Knuth Yao 1976) R = H(X) bits/symbol El Gamal (Stanford University) Measuring information Viterbi Lecture 13 / 36

Summary Entropy arises naturally as a measure of information rate: The minimum average description length in bits/symbol of X The maximum number of bits/symbol that can be extracted from X The minimum number of bits/symbol needed to simulate X El Gamal (Stanford University) Measuring information Viterbi Lecture 14 / 36

Correlated sources Sensor network Los Angeles San Francisco San Diego El Gamal (Stanford University) Correlated sources Viterbi Lecture 15 / 36

Correlated sources Wireless multipath X 2 X 3 X 1 El Gamal (Stanford University) Correlated sources Viterbi Lecture 15 / 36

Correlated sources Distributed secret key generation X Y Key Key El Gamal (Stanford University) Correlated sources Viterbi Lecture 15 / 36

Correlated sources Distributed simulation W X 1 X 2 X 3 X 4 El Gamal (Stanford University) Correlated sources Viterbi Lecture 15 / 36

Correlated sources How to measure common information between correlated sources El Gamal (Stanford University) Measuring common information Viterbi Lecture 16 / 36

Network information theory Communication network Source Node Establishes limits on communication/distributed processing in networks El Gamal (Stanford University) Measuring common information Viterbi Lecture 16 / 36

Network information theory Communication network Source Node Establishes limits on communication/distributed processing in networks First paper (Shannon 1961): Two-way communication channels Significant progress in 1970s (Cover 1972, Slepian Wolf 1973) Internet and wireless communication revived interest (EG Kim 2011) El Gamal (Stanford University) Measuring common information Viterbi Lecture 16 / 36

Measuring common information Setups for measuring common information: Distributed compression Distributed key generation Distributed simulation El Gamal (Stanford University) Measuring common information Viterbi Lecture 17 / 36

Measuring common information Setups for measuring common information: Distributed compression Distributed key generation Distributed simulation Assume a 2-discrete memoryless source (X Y,p(x, y)) (2-DMS (X,Y)) (X,Y) generates i.i.d. sequence (X 1,Y 1 ),(X 2,Y 2 ),... p(x, y) El Gamal (Stanford University) Measuring common information Viterbi Lecture 17 / 36

Measuring common information Setups for measuring common information: Distributed compression Distributed key generation Distributed simulation Assume a 2-discrete memoryless source (X Y,p(x, y)) (2-DMS (X,Y)) (X,Y) generates i.i.d. sequence (X 1,Y 1 ),(X 2,Y 2 ),... p(x, y) Examples: 1 p 1 1 X Bern(1/2) 0 1 p Doubly symmetric binary source DSBS 0 Y El Gamal (Stanford University) Measuring common information Viterbi Lecture 17 / 36

Measuring common information Setups for measuring common information: Distributed compression Distributed key generation Distributed simulation Assume a 2-discrete memoryless source (X Y,p(x, y)) (2-DMS (X,Y)) (X,Y) generates i.i.d. sequence (X 1,Y 1 ),(X 2,Y 2 ),... p(x, y) Examples: 1 p 1 p 1 1 1 1 X Bern(1/2) 0 1 p Doubly symmetric binary source DSBS 0 Y X Bern(1/2) 0 1 p Symmetric binary erasure source SBES e 0 Y El Gamal (Stanford University) Measuring common information Viterbi Lecture 17 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Let (X,Y) be a 2-DMS; prefix-free codes c 1 (x n ), c 2 (y n ) Let L jn be length of c j and R jn = (1/n)E(L jn ), j = 1,2 El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Let (X,Y) be a 2-DMS; prefix-free codes c 1 (x n ), c 2 (y n ) Let L jn be length of c j and R jn = (1/n)E(L jn ), j = 1,2 Measure the common information rate between X and Y by R C = H(X)+H(Y) inf n min codes (R 1n +R 2n ) bits/symbol-pair El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Let (X,Y) be a 2-DMS; prefix-free codes c 1 (x n ), c 2 (y n ) Let L jn be length of c j and R jn = (1/n)E(L jn ), j = 1,2 Measure the common information rate between X and Y by R C = H(X)+H(Y) inf n min codes (R 1n +R 2n ) bits/symbol-pair Example: X = (U,W), Y = (V,W), where U,V,W are independent: R C = (H(U)+H(W))+(H(V)+H(W)) (H(U)+H(V)+H(W)) El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Measure the common information rate between X and Y by R C = H(X)+H(Y) inf n min codes (R 1n +R 2n ) bits/symbol-pair Example: SBES (H(X) = 1, H(Y) = (1 p)+h(p)) 1 p 1 1 X Bern(1/2) e Y 0 1 p 0 El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Example: SBES (H(X) = 1, H(Y) = (1 p)+h(p)) Can show that R C = 1 p: Encoder 1 sends X n : R 1n = 1 bit/symbol Encoder 2 sends erasure location sequence: inf n R 2n = H(p) bit/symbol R C H(X)+H(Y) (R 1n +inf n R 2n ) = 1 p El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Example: For SBES, RC = 1 p Example: For DSBS, R C = 0 for 0 < p < 1 1 p 1 X Bern(1/2) 1 Y 0 1 p 0 El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed zero-error compression X n Y n Encoder 1 c 1 (X n ) Encoder 2 c 2 (Y n ) Decoder X n,y n Example: For SBES, RC = 1 p Example: For DSBS, R C = 0 for 0 < p < 1 No computable expression for RC is known (Körner Orlitsky 1998) El Gamal (Stanford University) Measuring common information Viterbi Lecture 18 / 36

Distributed secret key generation X n Alice Bob Y n K n (X,Y) is a 2-DMS Define key rate as R n = (1/n)H(K n ) bits/symbol-pair K n El Gamal (Stanford University) Measuring common information Viterbi Lecture 19 / 36

Distributed secret key generation X n Alice Bob Y n K n (X,Y) is a 2-DMS Define key rate as R n = (1/n)H(K n ) bits/symbol-pair The common information rate: R = sup n max extractors R n K n El Gamal (Stanford University) Measuring common information Viterbi Lecture 19 / 36

Distributed secret key generation X n Alice Bob Y n K n (X,Y) is a 2-DMS Define key rate as R n = (1/n)H(K n ) bits/symbol-pair The common information rate: R = sup n max extractors R n Example: X = (U,W), Y = (V,W), where U,V,W are independent Then again R = H(W) K n El Gamal (Stanford University) Measuring common information Viterbi Lecture 19 / 36

Distributed secret key generation X n Alice Bob Y n K n Example: Consider (X,Y) with pmf: W = 1 2 3 X W = 1 W = 2 Y 1 2 3 4 1 0.1 0.2 0 0 0.1 0.1 0.1 0.1 0 0 0 0 W = 2 4 0 0 0.2 0.1 K n El Gamal (Stanford University) Measuring common information Viterbi Lecture 19 / 36

Distributed secret key generation X n Alice Bob Y n K n Example: Consider (X,Y) with pmf: W = 1 2 3 X W = 1 W = 2 Y 1 2 3 4 1 0.1 0.2 0 0 0.1 0.1 0.1 0.1 0 0 0 0 W = 2 4 0 0 0.2 0.1 K n Alice and Bob can agree on i.i.d. key W n R n = H(W) El Gamal (Stanford University) Measuring common information Viterbi Lecture 19 / 36

Distributed secret key generation X n Alice Bob Y n K n Example: Consider (X,Y) with pmf: W = 1 2 3 X W = 1 W = 2 Y 1 2 3 4 1 0.1 0.2 0 0 0.1 0.1 0.1 0.1 0 0 0 0 W = 2 4 0 0 0.2 0.1 K n Alice and Bob can agree on i.i.d. key W n R n = H(W) Turns out: R = H(W) = 0.881 bits/symbol-pair El Gamal (Stanford University) Measuring common information Viterbi Lecture 19 / 36

Gács Körner (1973), Witsenhausen (1975) common information Arrange p(x, y) in block diagonal form with largest # of blocks: X Y W = 1 0 0 0 W = 2 0 0 0 W = k W is called the common part between X and Y It is the highest entropy r.v. that X and Y can agree on K(X;Y) = H(W) is called GKW common information El Gamal (Stanford University) Measuring common information Viterbi Lecture 20 / 36

Gács Körner (1973), Witsenhausen (1975) common information Arrange p(x, y) in block diagonal form with largest # of blocks: X Y W = 1 0 0 0 W = 2 0 0 0 W = k W is called the common part between X and Y The common part between X n and Y n is W n El Gamal (Stanford University) Measuring common information Viterbi Lecture 20 / 36

Gács Körner (1973), Witsenhausen (1975) common information Arrange p(x, y) in block diagonal form with largest # of blocks: X Y W = 1 0 0 0 W = 2 0 0 0 W = k Common information rate is the GKW common information R = K(X;Y) = H(W) El Gamal (Stanford University) Measuring common information Viterbi Lecture 20 / 36

Distributed simulation W n Alice Bob X n Y n W n is a common information random variable Wish to generate (X n,y n ) i.i.d. p(x, y) El Gamal (Stanford University) Measuring common information Viterbi Lecture 21 / 36

Distributed simulation W n Alice Bob X n Y n W n is a common information random variable Wish to generate (X n,y n ) i.i.d. p(x, y) Define the distributed simulation rate R n = (1/n)H(W n ) The common information rate: R S = inf nmin Wn,generatorsR n El Gamal (Stanford University) Measuring common information Viterbi Lecture 21 / 36

Distributed simulation W n Alice Bob X n Y n W n is a common information random variable Wish to generate (X n,y n ) i.i.d. p(x, y) Define the distributed simulation rate R n = (1/n)H(W n ) The common information rate: R S = inf nmin Wn,generatorsR n Example: X = (U,W), Y = (V,W), where U,V,W are independent Then again R S = H(W) (set W n =W n ) El Gamal (Stanford University) Measuring common information Viterbi Lecture 21 / 36

Distributed simulation W n Alice Bob X n Y n W n is a common information random variable Wish to generate (X n,y n ) i.i.d. p(x, y) Define the distributed simulation rate R n = (1/n)H(W n ) The common information rate: RS = inf nmin Wn,generatorsR n No computable expression for R S is known El Gamal (Stanford University) Measuring common information Viterbi Lecture 21 / 36

Distributed simulation Example: SBES (Kumar Li EG, ISIT 2014) 1 p 1 1 X Bern(1/2) e Y 0 1 p 0 El Gamal (Stanford University) Measuring common information Viterbi Lecture 22 / 36

Distributed simulation Example: SBES (Kumar Li EG, ISIT 2014) R S = 1 if p 1/2 H(p) if p > 1/2 1 0.8 0.6 R S 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p El Gamal (Stanford University) Measuring common information Viterbi Lecture 22 / 36

Summary Entropy is a universal measure of information El Gamal (Stanford University) Measuring common information Viterbi Lecture 23 / 36

Summary Entropy is a universal measure of information For X = (U,W), Y = (V,W), where U,V,W are independent Common information is measured by H(W) El Gamal (Stanford University) Measuring common information Viterbi Lecture 23 / 36

Summary Entropy is a universal measure of information In general, common information has several measures: For zero-error distributed compression: R C (no computable expression) For distributed key generation: K(X;Y) (GKW common information) For distributed simulation: R S (no computable expression) El Gamal (Stanford University) Measuring common information Viterbi Lecture 23 / 36

Summary Entropy is a universal measure of information In general, common information has several measures: For zero-error distributed compression: R C (no computable expression) For distributed key generation: K(X;Y) (GKW common information) For distributed simulation: R S (no computable expression) We can say more by considering relaxed versions of these setups Distributed lossless compression Distributed approximate simulation Common information for approximate key generation same as for exact El Gamal (Stanford University) Measuring common information Viterbi Lecture 23 / 36

Lossless compression X n M n [1 : 2 nr ] X Encoder Decoder n Let X be a DMS Use fixed-length (block) codes (M n {1,...,2 nr } is an nr-bit sequence) Probability of error: P (n) e = P{ X n X n } El Gamal (Stanford University) Approximate common information Viterbi Lecture 24 / 36

Lossless compression X n M n [1 : 2 nr ] X Encoder Decoder n Let X be a DMS Use fixed-length (block) codes (M n {1,...,2 nr } is an nr-bit sequence) Probability of error: P (n) e = P{ X n X n } R is achievable if a sequence of codes such that lim n P (n) e = 0 The information rate of X: R = inf{r : R is achievable} El Gamal (Stanford University) Approximate common information Viterbi Lecture 24 / 36

Lossless compression X n M n [1 : 2 nr ] X Encoder Decoder n Let X be a DMS Use fixed-length (block) codes (M n {1,...,2 nr } is an nr-bit sequence) Probability of error: P (n) e = P{ X n X n } R is achievable if a sequence of codes such that lim n P (n) e = 0 The information rate of X: R = inf{r : R is achievable} Information rate is the entropy (Shannon 1948) Same as in zero-error! R = H(X) El Gamal (Stanford University) Approximate common information Viterbi Lecture 24 / 36

Distributed lossless compression X n Y n Encoder 1 M 1n [1 : 2 nr 1 ] Encoder 2 M 2n [1 : 2 nr 2 ] Decoder X n,ŷ n Again assume fixed length codes Probability of error: P (n) e = P{( X n,ŷ n ) (X n,y n )} (R 1,R 2 ) achievable if a sequence of codes such that lim n P (n) e = 0 El Gamal (Stanford University) Approximate common information Viterbi Lecture 25 / 36

Distributed lossless compression X n Y n Encoder 1 M 1n [1 : 2 nr 1 ] Encoder 2 M 2n [1 : 2 nr 2 ] Decoder X n,ŷ n Again assume fixed length codes Probability of error: P (n) e = P{( X n,ŷ n ) (X n,y n )} (R 1,R 2 ) achievable if a sequence of codes such that lim n P (n) e = 0 The common information rate R = H(X)+H(Y) inf{r 1 +R 2 : (R 1,R 2 ) is achievable} El Gamal (Stanford University) Approximate common information Viterbi Lecture 25 / 36

Distributed lossless compression The common information rate R = H(X)+H(Y) inf{r 1 +R 2 : (R 1,R 2 ) is achievable} Common information rate is mutual information (Slepian Wolf 1973) R = I(X;Y) = H(X)+H(Y) H(X,Y) The same compression rate as if X and Y are compressed together! El Gamal (Stanford University) Approximate common information Viterbi Lecture 26 / 36

Distributed lossless compression The common information rate R = H(X)+H(Y) inf{r 1 +R 2 : (R 1,R 2 ) is achievable} Common information rate is mutual information (Slepian Wolf 1973) R = I(X;Y) = H(X)+H(Y) H(X,Y) Example: (X,Y) is SBES, I(X;Y) = 1 p = RC (zero-error) 1 p 1 1 X Bern(1/2) e Y 0 1 p 0 El Gamal (Stanford University) Approximate common information Viterbi Lecture 26 / 36

Distributed lossless compression The common information rate R = H(X)+H(Y) inf{r 1 +R 2 : (R 1,R 2 ) is achievable} Common information rate is mutual information (Slepian Wolf 1973) R = I(X;Y) = H(X)+H(Y) H(X,Y) Example: (X,Y) is SBES, I(X;Y) = 1 p = RC (zero-error) Example: (X,Y) is DSBS, I(X;Y) = 1 H(p), R C = 0, 0 < p < 1 1 p 1 1 X Bern(1/2) Y 0 1 p 0 El Gamal (Stanford University) Approximate common information Viterbi Lecture 26 / 36

Distributed lossless compression The common information rate R = H(X)+H(Y) inf{r 1 +R 2 : (R 1,R 2 ) is achievable} Common information rate is mutual information (Slepian Wolf 1973) R = I(X;Y) = H(X)+H(Y) H(X,Y) Example: (X,Y) is SBES, I(X;Y) = 1 p = R C (zero-error) Example: (X,Y) is DSBS, I(X;Y) = 1 H(p), R C = 0, 0 < p < 1 Also, I(X;Y) = 0 only if X and Y are independent Hence, captures dependence better than previous measures El Gamal (Stanford University) Approximate common information Viterbi Lecture 26 / 36

Distributed approximate simulation W n [1 : 2 nr ] Alice Bob X n Ŷ n W n randomly distributed over [1 : 2 nr ] (random nr-bit sequence) Wish to simulate (X n,y n ) i.i.d. p(x, y) approximately El Gamal (Stanford University) Approximate common information Viterbi Lecture 27 / 36

Distributed approximate simulation W n [1 : 2 nr ] Alice Bob X n Ŷ n W n randomly distributed over [1 : 2 nr ] (random nr-bit sequence) Wish to simulate (X n,y n ) i.i.d. p(x, y) approximately Total variation distance between ( X n,ŷ n ) and (X n,y n ): d n = (x n,y n ) p X n,ŷ n(xn, y n ) n i=1 p X,Y (x i, y i ) El Gamal (Stanford University) Approximate common information Viterbi Lecture 27 / 36

Distributed approximate simulation W n [1 : 2 nr ] Alice Bob X n Ŷ n W n randomly distributed over [1 : 2 nr ] (random nr-bit sequence) Wish to simulate (X n,y n ) i.i.d. p(x, y) approximately Total variation distance between ( X n,ŷ n ) and (X n,y n ): d n = (x n,y n ) p X n,ŷ n(xn, y n ) n i=1 p X,Y (x i, y i ) R is achievable if a sequence of generators with lim n d n = 0 The common information rate: R = inf{r : R is achievable} El Gamal (Stanford University) Approximate common information Viterbi Lecture 27 / 36

Distributed approximate simulation Common information rate is Wyner (1975) common information R = J(X;Y) = min X W Y I(X,Y;W) El Gamal (Stanford University) Approximate common information Viterbi Lecture 28 / 36

Distributed approximate simulation Common information rate is Wyner (1975) common information R = J(X;Y) = min X W Y I(X,Y;W) Example: SBES 1 1 p 1 X Bern(1/2) e Y 0 1 p 0 El Gamal (Stanford University) Approximate common information Viterbi Lecture 28 / 36

Distributed approximate simulation Common information rate is Wyner (1975) common information Example: SBES R = J(X;Y) = min X W Y I(X,Y;W) 1 0.8 J(X; Y) 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p El Gamal (Stanford University) Approximate common information Viterbi Lecture 28 / 36

Distributed approximate simulation Common information rate is Wyner (1975) common information R = J(X;Y) = min X W Y I(X,Y;W) Example: SBES, J(X;Y) = R S (exact common information) 1 0.8 J(X; Y) 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p El Gamal (Stanford University) Approximate common information Viterbi Lecture 28 / 36

Summary There are several well-motivated measures of common information El Gamal (Stanford University) Summary Viterbi Lecture 29 / 36

Summary There are several well-motivated measures of common information For distributed compression: Zero-error: R C (no computable expression) Lossless: I(X;Y) (mutual information) El Gamal (Stanford University) Summary Viterbi Lecture 29 / 36

Summary There are several well-motivated measures of common information For distributed compression: Zero-error: R C (no computable expression) Lossless: I(X;Y) (mutual information) For distributed key generation: K(X; Y) (GKW common information) Same for approximate key agreement El Gamal (Stanford University) Summary Viterbi Lecture 29 / 36

Summary There are several well-motivated measures of common information For distributed compression: Zero-error: R C (no computable expression) Lossless: I(X;Y) (mutual information) For distributed key generation: K(X; Y) (GKW common information) Same for approximate key agreement For distributed simulation: Exact simulation: R S (no computable expression) Approximate simulation: J(X;Y) (Wyner common information) El Gamal (Stanford University) Summary Viterbi Lecture 29 / 36

Comparing common information measures 0 K(X;Y) R C I(X;Y) J(X;Y) R S min{h(x),h(y)} El Gamal (Stanford University) Comparing common information measures Viterbi Lecture 30 / 36

Comparing common information measures 0 K(X;Y) R C I(X;Y) J(X;Y) R S min{h(x),h(y)} Common information between X and Y information of each Strict inequality for SBES: H(X) = 1, H(Y) = (1 p)+h(p) R S = H(p) if p > 1/2 El Gamal (Stanford University) Comparing common information measures Viterbi Lecture 30 / 36

Comparing common information measures 0 K(X;Y) R C I(X;Y) J(X;Y) R S min{h(x),h(y)} Common information for approximate distributed simulation for exact Equal for SBES (ISIT 2014) Open problem: Are they equal in general? El Gamal (Stanford University) Comparing common information measures Viterbi Lecture 30 / 36

Comparing common information measures 0 K(X;Y) R C I(X;Y) J(X;Y) R S min{h(x),h(y)} Strict inequality for SBES: I(X;Y) = 1 p J(X;Y) = 1 if p 1/2 El Gamal (Stanford University) Comparing common information measures Viterbi Lecture 30 / 36

Comparing common information measures 0 K(X;Y) R C I(X;Y) J(X;Y) R S min{h(x),h(y)} Common information for distributed zero-error compression for lossless Strict inequality for DSBS: R C = 0 I(X;Y) = 1 H(p) El Gamal (Stanford University) Comparing common information measures Viterbi Lecture 30 / 36

Comparing common information measures 0 K(X;Y) R C I(X;Y) J(X;Y) R S min{h(x),h(y)} Common part needs to be sent only once in distributed compression Strict inequality for SBEC: K(X;Y) = 0 R C = 1 p El Gamal (Stanford University) Comparing common information measures Viterbi Lecture 30 / 36

Conclusion Entropy is a universal measure of information For both zero-error and lossless compression For randomness generation and source simulation El Gamal (Stanford University) Conclusion Viterbi Lecture 31 / 36

Conclusion Entropy is a universal measure of information For both zero-error and lossless compression For randomness generation and source simulation The story for common information is much richer: Five well-motivated common information measures K R C I J R S They are not always equal! Some don t seem to be computable in general Different for zero-error distributed compression from lossless Are they also different for exact distributed simulation from approximate? El Gamal (Stanford University) Conclusion Viterbi Lecture 31 / 36

Acknowledgments Work on exact distributed simulation is joint with: Gowtham Kumar Cheuk Ting Li Young-Han Kim provided many valuable suggestions El Gamal (Stanford University) Acknowledgments Viterbi Lecture 32 / 36

Thank You!

References Arıkan, E. (2009). Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory, 55(7), 3051 3073. Berrou, C. and Glavieux, A. (1996). Near optimum error correcting coding and decoding: Turbo-codes. IEEE Trans. Comm., 44(10), 1261 1271. Cover, T. M. (1972). Broadcast channels. IEEE Trans. Inf. Theory, 18(1), 2 14. El Gamal, A. and Kim, Y.-H. (2011). Network Information Theory. Cambridge, Cambridge. Elias, P. (1972). The efficient construction of an unbiased random sequence. The Annals of Mathematical Statistics, 43(3), 865 870. Gács, P. and Körner, J. (1973). Common information is far less than mutual information. Probl. Control Inf. Theory, 2(2), 149 162. Gallager, R. G. (1963). Low-Density Parity-Check Codes. MIT Press, Cambridge, MA. Knuth, D. E. and Yao, A. C. (1976). The complexity of random number generation. In Algorithms and Complexity (Proc. Symp., Carnegie Mellon Univ., Pittsburgh, Pa., 1976), pp. 357 428. Academic Press, New York. El Gamal (Stanford University) Acknowledgments Viterbi Lecture 34 / 36

References (cont.) Körner, J. and Orlitsky, A. (1998). Zero-error information theory. IEEE Trans. Inf. Theory, 44(6), 2207 2229. Orlitsky, A. and Roche, J. R. (2001). Coding for computing. IEEE Trans. Inf. Theory, 47(3), 903 917. Richardson, T. and Urbanke, R. (2008). Modern Coding Theory. Cambridge University Press, Cambridge. Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J., 27(3), 379 423, 27(4), 623 656. Shannon, C. E. (1961). Two-way communication channels. In Proc. 4th Berkeley Symp. Math. Statist. Probab., vol. I, pp. 611 644. University of California Press, Berkeley. Slepian, D. and Wolf, J. K. (1973). Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory, 19(4), 471 480. Ungerboeck, G. (1982). Channel coding with multilevel/phase signals. IEEE Trans. Info. Theory, 28(1), 55 67. Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory, 13(2), 260 269. El Gamal (Stanford University) Acknowledgments Viterbi Lecture 35 / 36

References (cont.) Witsenhausen, H. S. (1975). On sequences of pairs of dependent random variables. SIAM J. Appl. Math., 28(1), 100 113. Wyner, A. D. (1975). The common information of two dependent random variables. IEEE Trans. Inf. Theory, 21(2), 163 179. Ziv, J. and Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3), 337 343. Ziv, J. and Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory, 24(5), 530 536. El Gamal (Stanford University) Acknowledgments Viterbi Lecture 36 / 36