Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Similar documents
Lecture 3: Channel Capacity

LECTURE 10. Last time: Lecture outline

Lecture 11: Polar codes construction

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

LECTURE 13. Last time: Lecture outline

Lecture 15: Conditional and Joint Typicaility

Capacity of a channel Shannon s second theorem. Information Theory 1/33

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Shannon s Noisy-Channel Coding Theorem

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Shannon s noisy-channel theorem

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 4 Noisy Channel Coding

(Classical) Information Theory III: Noisy channel coding

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Lecture 22: Final Review

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Solutions to Homework Set #3 Channel and Source coding

Lecture 11: Quantum Information III - Source Coding

An introduction to basic information theory. Hampus Wessman

ECE Information theory Final (Fall 2008)

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Lecture 8: Shannon s Noise Models

Shannon s Noisy-Channel Coding Theorem

Lecture 7 September 24

EE5585 Data Compression May 2, Lecture 27

Noisy-Channel Coding

Lecture 14 February 28

Lecture 3: Error Correcting Codes

X 1 : X Table 1: Y = X X 2

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

EE 4TM4: Digital Communications II. Channel Capacity

Exercise 1. = P(y a 1)P(a 1 )

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 10: Broadcast Channel and Superposition Coding

Lecture 8: Channel Capacity, Continuous Random Variables

Noisy channel communication

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Error Detection and Correction: Hamming Code; Reed-Muller Code

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

ELEC546 Review of Information Theory

Homework Set #2 Data Compression, Huffman code and AEP

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Math 512 Syllabus Spring 2017, LIU Post

Reliable Computation over Multiple-Access Channels

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Appendix B Information theory from first principles

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

ECE Information theory Final

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Interactive Decoding of a Broadcast Message

Lecture 1: Shannon s Theorem

Lecture 5 Channel Coding over Continuous Channels

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Lecture 4: Codes based on Concatenation

Channel Coding for Secure Transmissions

Solutions to Homework Set #2 Broadcast channel, degraded message set, Csiszar Sum Equality

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

CSCI 2570 Introduction to Nanocomputing

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Lecture 1: Introduction, Entropy and ML estimation

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

On the Duality between Multiple-Access Codes and Computation Codes

Quantum Information Theory and Cryptography

6.02 Fall 2011 Lecture #9

Network Coding on Directed Acyclic Graphs

Lecture 2. Capacity of the Gaussian channel

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Entropies & Information Theory

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Lecture 2: August 31

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Lecture 5: Asymptotic Equipartition Property

Physical Layer and Coding

Chapter 9. Gaussian Channel

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

ECEN 655: Advanced Channel Coding

The Gallager Converse

Lecture 2 Linear Codes

Lecture Notes for Communication Theory

ECE 534 Information Theory - Midterm 2

On Multiple User Channels with State Information at the Transmitters

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

Week 3: January 22-26, 2018

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

3F1 Information Theory, Lecture 3

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Transcription:

5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke Intuitive justification for upper bound on channel capacity universe X X n () universe Y size 2 nh(y X) on average want mostly disjoint X n (2) size 2 nh(x) size 2 nh(y ) Suppose that we are sending codewords from a set X n into a channel and we receive things from Y n. The number of different codewords we have available at the input side of the channel is approximately 2 nh(x), the number of typical sequences. Similarly the number of possibilities at the output is approximately 2 nh(y ). Each of the input codewords can go to several different outputs, on average there are about 2 nh(y X) possible outputs each input can be sent to by the channel. If we hope for the images of each of the inputs to be mostly disjoint, then the number of inputs times the size of their image under the channel must be less than the size of the channel output. This gives # codewords 2nH(Y ) 2 nh(y X) 2nI(X;Y ), () therefore the capacity is no greater than I(X; Y ) (maximized over all input probability distributions P (X)). 2 Achieveability This is an intuitive explanation of why channel capacity of C max P (X) I(X; Y ) is achievable. Make a matrix with a row for each typical input sequence X n (there are 2 nh(x) of these) and a column for each typical output sequence Y n (there are 2 nh(y ) of these). Mark each cell that corresponds to a jointly typical sequence (there are 2 nh(x,y ) of these). Suppose we transmit X n () and receive Y n. Then it is likely that (X n (), Y n ) is jointly typical, and so the receiver can determine X (n) as long as there is no other possible input X n (2) such

X n 2 nh(x) typ. seq. Y n, 2 nh(y ) typ. seq. 2 nh(x,y ) jointly typ. seq. that (X n (2), Y n ) is jointly typical. The density of the jointly typical sequences as depicted in the matrix diagram is the number of jointly typical sequences divided by the size of the matrix, 2 nh(x,y ) 2 nh(x) 2 nh(y ) 2 ni(x;y ). (2) So if the number of possible input codewords is 2 nr with R I(X; Y ) ɛ then the probability of error is 2 nr 2 ni(x;y ) 2 ɛn as n. (3) 3 Upper bound on rate We will show that communication is only possible at rates that don t exceed channel capacity. First of all, it is interesting to note that plotting P e against rate gives something resembling a step function with P e jumping rather abruptly from close to (due to the achievability shown last time) to close to (due to what we will show now) as the rate goes above channel capacity. Suppose a source W {,..., 2 Rn } is to be sent through a channel: W {,..., 2 Rn } X n Y n Source Encode Channel Decode Ŵ We will show that if P e Pr(Ŵ W ) as n then R I(X; Y ). As a warm-up exercise, first consider the case P e. Take W to be uniform on {,..., 2 Rn }. Call the decoding operation g. Since we have perfect decoding (i.e. P e ), we get W Ŵ g(y n ) H(W Y n ), leading to nr H(W ) (4) H(W Y n ) + I(W ; Y n ) (5) I(W ; Y n ). (6) Now apply the data processing inequality. Since we have the Markov chain W X n Y n, we get I(W ; Y n ) I(X n ; Y n ). It is tempting at this point to just say nr I(X n ; Y n ) ni(x i ; Y i ) and call it quits, however mutual information doesn t work that way in general (in this case it does, 2

but this requires proof!) What we have is nr I(X n ; Y n ) (7) H(Y n ) H(Y n X n ) (8) H(Y n X n ) (9) H(Y i Y,..., Y i, X n ). (chain rule) () Since the channel is memoryless, Y i is independent of X,..., X i and Y,..., Y i (i.e. Y i only depends on X i ) so this reduces to nr H(Y i Y,..., Y i, X n ) () H(Y i X i ) (2) I(X i ; Y i ) (3) ni(x i ; Y i ) (4) nc. (5) Therefore R C. Now for the general case, with P e. What changes now is that we no longer have W g(y n ) and so H(W Y n ) (but Ŵ g(y n ) still holds). Recycling the previous work from the warm-up exercise gives nr H(W Y n ) + nc. Fano s inequality applies here, giving h(p e ) + P e log(2 nr ) H(W Y n ). (6) This is a bit too complicated, so simplify it with the bounds h(p e ) and log(2 nr ) nr to get + nrp e H(W Y n ). (7) Plugging this into nr nc + H(W Y n ) gives R C + /n + RP e. Then, as n and P e, this becomes R C + o(). Comment: If R < C there exists a code with P e 2 Θ(R,C)n for n where Θ is some constant depending on R and C. Comment: If R > C then for all coding/decoding schemes, P e as n (note: n is essential since with n a coin toss gives P e /2). 4 Joint source coding (a.k.a. source-channel separation) theorem Often we want to communicate a source that is not uniformly distributed. Consider a source V and define H : H(V). We could first compress this source using a source code, and then encode that with a channel encoding: 3

V W X n Source Source Code Encode... The source code is possible if R > H and transmission is possible if R < C, so this two stage process is possible if C > H. However, is it possible to do better by combining the source coding and channel encoding? In fact this would not help, and we will show that C > H is both necessary and sufficient for communication (sufficient is clear because of the two-stage process, necessary is what we will now show). Theorem. If V,..., V n are i.i.d. samples from V then there exists a source-channel code with P e Pr( ˆV n V n ) if and only if C > H(V). Proof. By the definition of mutual information, nh(v) H(V n ) H(V n ˆV n ) + I(V n ; ˆV n ). Fano s inequality bounds the first term, H(V n ˆV n ) h(p e ) + P e log( V n ) (8) + P e n log( V n ). (9) The data processing inequality applies to the second term, I(V n ; ˆV n ), because of the Markov chain V n X n Y n ˆV n, giving I(V n ; ˆV n ) I(X n ; Y n ). Putting it all together, nh(v) H(V n ˆV n ) + I(V n ; ˆV n ) (2) Taking the limit n and P e then gives H(V) C. [ + P e n log( V n )] + I(X n ; Y n ) (2) + P e n log( V n ) + nc. (22) Cover+Thomas has nh(v) H(V n ) rather than equality, but I don t understand why. 4

5 Linear codes The code presented last class is slow, and it takes exponential space to even store the codebook. Linear codes allow for efficient codebook storage and, in some cases, efficient decoding. We will focus on two particular channels: Binary symmetric channel (BSC p ) Capacity C h(p) p p p p Binary erasure channel (BEC α ) Capacity C α α α α α? We will encode {,..., 2 Rn } X n {, } n. For simplicity, assume that k Rn is an integer. This assumption doesn t hurt in the n limit. With this, we can say we are encoding {,..., 2 k } X n {, } n, in other words, from a k bit string to an n bit string. Think of these bit strings as vectors. The encoding operation will be a linear transformation, x Gx where G is an n k matrix. Arithmetic is done modulo 2 here, or in other words we are working in the finite field F 2. Such a code is called a linear code. Note that G can be stored efficiently. The claim (which we do not prove here) is that for both BEC α and BSC p the linear code obtained through choosing a random G will achieve the channel capacity. How to decode? One option is to use joint typicality. For BSC p, (a, b) is jointly typical if (a, b) pn where is the Hamming distance. So to decode y {, } n, just search for x {, } k such that (p ɛ) (Gx, y) (p + ɛ). Actually, one need not be concerned with picking out vectors that are too similar, so it works just as well to use (Gx, y) (p + ɛ). But finding such vectors directly is hard to do. Another option is maximum likelihood decoding: find the x which minimizes (Gx, y). This is also hard to do (in general NP-hard). It always works, but is harder to deal with than joint typicality. For BEC α there is an easy decoding strategy. This channel is nice enough to tell us where the errors are (e.g. it sends???). Just throw out the erased bits to get a reduced vector y S, where S is the set of non-erased positions. Also filter the rows of G to get G S. The remaining bits went through with no error, so we have y S G S x exactly. If S is a bit larger than k (i.e. G S has a bit more rows than columns) then G S is likely to be full rank and y S G S x can be solved for the unique solution of x using standard linear algebra techniques. 5