Lecture 4 Noisy Channel Coding

Similar documents
Lecture 4 Channel Coding

Lecture 5 Channel Coding over Continuous Channels

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Capacity of a channel Shannon s second theorem. Information Theory 1/33

EE 4TM4: Digital Communications II. Channel Capacity

Lecture 3: Channel Capacity

LECTURE 10. Last time: Lecture outline

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Lecture 6 Channel Coding over Continuous Channels

Appendix B Information theory from first principles

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

for some error exponent E( R) as a function R,

Lecture 9 Polar Coding

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

ELEC546 Review of Information Theory

Lecture 15: Conditional and Joint Typicaility

LECTURE 13. Last time: Lecture outline

(Classical) Information Theory III: Noisy channel coding

Lecture 8: Shannon s Noise Models

Noisy channel communication

Entropies & Information Theory

Shannon s Noisy-Channel Coding Theorem

X 1 : X Table 1: Y = X X 2

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Chapter 9 Fundamental Limits in Information Theory

Reliable Computation over Multiple-Access Channels

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Shannon s noisy-channel theorem

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Lecture 14 February 28

Second-Order Asymptotics in Information Theory

Shannon s Noisy-Channel Coding Theorem

Lecture 22: Final Review

ECEN 655: Advanced Channel Coding

Lecture 10: Broadcast Channel and Superposition Coding

Computing and Communications 2. Information Theory -Entropy

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 2. Capacity of the Gaussian channel

Lecture 7 Introduction to Statistical Decision Theory

A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 4: Proof of Shannon s theorem and an explicit code

Solutions to Homework Set #3 Channel and Source coding

Network coding for multicast relation to compression and generalization of Slepian-Wolf

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Lecture 11: Quantum Information III - Source Coding

LECTURE 3. Last time:

Capacity of AWGN channels

Lecture 11: Polar codes construction

Shannon s A Mathematical Theory of Communication

Noisy-Channel Coding

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

lossless, optimal compressor

Equivalence for Networks with Adversarial State

The Poisson Channel with Side Information

Distributed Lossless Compression. Distributed lossless compression system

Exercise 1. = P(y a 1)P(a 1 )

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Upper Bounds on the Capacity of Binary Intermittent Communication

UCSD ECE 255C Handout #12 Prof. Young-Han Kim Tuesday, February 28, Solutions to Take-Home Midterm (Prepared by Pinar Sen)

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

1 Introduction to information theory

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

5 Mutual Information and Channel Capacity

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

Multicoding Schemes for Interference Channels

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Arimoto Channel Coding Converse and Rényi Divergence

CSCI 2570 Introduction to Nanocomputing

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Subset Universal Lossy Compression

Delay, feedback, and the price of ignorance

5958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

ECE Information theory Final (Fall 2008)

Block 2: Introduction to Information Theory

Covert Communication with Channel-State Information at the Transmitter

Source and Channel Coding for Correlated Sources Over Multiuser Channels

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

The Method of Types and Its Application to Information Hiding

4 An Introduction to Channel Coding and Decoding over BSC

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Exercises with solutions (Set B)

Lecture 4 Capacity of Wireless Channels

ECE Information theory Final

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Transcription:

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4

The Channel Coding Problem w Channel Encoder x N Noisy Channel y N Channel Decoder bw Meta Description 1 Message: Random message W Unif [1 : 2 K ]. 2 Channel: Consist of an input alphabet X, an output alphabet Y, and a family of conditional distributions { p ( y k x k, y k 1) k N } determining the stochastic relationship between the output symbol y k and the input symbol x k along with all past signals ( x k 1, y k 1). 3 Encoder: Encode the message w by a length N codeword x N X N. 4 Decoder: Reconstruct message ŵ from the channel output y N. 5 Efficiency: Maximize the code rate R K N bits/channel use, given certain decoding criterion. 2 / 56 I-Hsiang Wang IT Lecture 4

Decoding Criterion: Vanishing Error Probability w Channel Encoder x N Noisy Channel A key performance measure: Error Probability P (N) e Question: Is it possible to get zero error probability? y N Channel Decoder bw { } P W Ŵ. Ans: Probably not, unless the channel noise has some special structure. Following the development of lossless source coding, Shannon turned the attention to answering the following question: Is it possible to have a sequence of encoder/decoder pairs such that P e (N) 0 as N? If so, what is the largest possible code rate R where vanishing error probability is possible? Recall: In lossless source coding, we see that the infimum of compression rates where vanishing error probability is possible is H ({S i } ). 3 / 56 I-Hsiang Wang IT Lecture 4

Rate R Block Length N Probability of Error P (N) e Capacity: Take N, Require P (N) e 0 = sup R = C. Error Exponent: Take N, Fix rate R = min P (N) e 2 NE(R). Finite Block Length: Fix N, Require ( P ) (N) e ε V = sup R = C N Q 1 (ε) + O log N N. Remark: For source coding, one can establish a similar framework. 4 / 56 I-Hsiang Wang IT Lecture 4

In this lecture we only focus on capacity. In other words, we ignore the issue of finite block length (FBL). FBL performance can be obtained via techniques extending from CLT. We do not pursue finer analysis on the error probability via large deviation techniques either. 5 / 56 I-Hsiang Wang IT Lecture 4

Discrete Memoryless Channel (DMC) In order to demonstrate the key ideas in channel coding, in this lecture we shall focus on discrete memoryless channels (DMC) defined below. Definition 1 (Discrete Memoryless Channel) A discrete channel ( X, { p ( y k x k, y k 1) k N }, Y ) is memoryless if k N, p ( y k x k, y k 1) = p Y X (y k x k ). In other words, Y k X k ( X k 1, Y k 1). Here the conditional p.m.f. p Y X is called the channel law or channel transition function. Question: is our definition of a channel sufficient to specify p ( y N x N ), the stochastic relationship between the channel input (codeword) x N and the channel output y N? 6 / 56 I-Hsiang Wang IT Lecture 4

p ( y N x N) = p ( x N, y N) p (x N ) p ( x N, y N) N = p ( x k, y k x k 1, y k 1) = k=1 N p ( y k x k, y k 1) p ( x k x k 1, y k 1) k=1 Hence, we need to further specify { p ( x k x k 1, y k 1) k N }, which cannot be obtained from p ( x N). Interpretation: { p ( x k x k 1, y k 1) k N } is induced by the encoding function, which implies that the encoder can potentially make use of the past channel output, i.e., feedback. 7 / 56 I-Hsiang Wang IT Lecture 4

DMC without Feedback w Channel Encoder x k Noisy Channel y k w Channel Encoder x k y k 1 D Noisy Channel y k No Feedback With Feedback Suppose in the system, the encoder has no knowledge about the realization of the channel output, then, p ( x k x k 1, y k 1) = p ( x k x k 1) for all k N, and it is said the the channel has no feedback. In this case, specifying { p ( y k x k, y k 1) k N } suffices to specify p ( y N x N). Proposition 1 (DMC without Feedback) For a DMC ( X, p Y X, Y ) without feedback, p ( y N x N) = N p Y X (y i x i ). k=1 8 / 56 I-Hsiang Wang IT Lecture 4

Overview In this lecture, we would like to establish the following (informally described) noisy channel coding theorem due to Shannon: For a DMC ( X, p Y X, Y ), the maximum code rate with vanishing error probability is the channel capacity C max I (X ; Y ). p X ( ) The above holds regardless of the availability of feedback. To demonstrate this result, we organize the lecture as follows: 1 Give the problem formulation, state the main theorem, and visit a couple of examples to show how to compute channel capacity. 2 Prove the converse part: an achievable rate cannot exceed C. 3 Prove the achievability part with a random coding argument. 9 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity 1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 10 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity 1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 11 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Channel Coding without Feedback: Problem Setup w Channel Encoder x N Noisy Channel y N Channel Decoder bw 1 A ( 2 NR, N ) channel code consists of an encoding function (encoder) enc N : [1 : 2 K ] X N that maps each message w to a length N codeword x N, where K NR. a decoding function (decoder) dec N : Y N [1 : 2 K ] { } that maps a channel output sequence y N to a reconstructed message ŵ or an error message. 2 The error probability is defined as P (N) e { } P W Ŵ. 3 A rate R is said to be achievable if there exist a sequence of ( 2 NR, N ) codes such that P (N) e 0 as N. The channel capacity is defined as C sup {R R : achievable}. 12 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Channel Coding Theorem for Discrete Memoryless Channel Theorem 1 (Channel Coding Theorem for DMC without Feedback) The capacity C of the DMC p (y x) without feedback is given by C = max I (X ; Y ). (1) p(x) The capacity formula (1) is intuitive, since I (X ; Y ) represents the amount of information about the channel input X that one can infer from the channel output Y. The maximization over p (x) stands for choosing the best possible input distribution so that the amount of information transfer is maximized. 13 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Rest of the lecture: 1 First we give some examples of noisy channels to show how to compute capacity. 2 Then, we prove that for any rate R > C, it is impossible to have vanishing error probability (converse). 3 Finally, we prove that for any R < C, there exist a sequence of encoding/decoding schemes such that the error probability vanishes as blocklength tends to (achievability), based on a probabilistic argument called random coding. 14 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Binary Symmetric Channel A binary symmetric channel (BSC) consists of Binary input/output X = Y = {0, 1}. [ ] 1 p p Channel law p (y x) =. p 1 p The capacity of BSC is C BSC = 1 H b (p). 1 X 0 1 p p p 1 p Y 0 1 To compute BSC capacity, observe I (X ; Y ) = H (Y ) H (Y X ), and H (Y X = 0 ) = H (Y X = 1 ) = H b (p) = H (Y X ) = H b (p). H (Y ) log 2 = 1, with equality iff Y is uniform. Question: Is it possible to choose a p (x) such that Y is uniform? Ans: Yes, choose X to be uniform = C = max I (X ; Y ) = 1 H b (p). p(x) 15 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Binary Erasure Channel A binary erasure channel (BEC) consists of Binary input X = {0, 1} and output with erasure Y = {0, 1, }. [ ] 1 p p 0 Channel law p (y x) =. 0 p 1 p The capacity of BEC is C BEC = 1 p. X 0 1 1 p p p 1 p Y 0 1 Suppose we begin with I (X ; Y ) = H (Y ) H (Y X ). Then, H (Y X = 0 ) = H (Y X = 1 ) = H b (p) = H (Y X ) = H b (p). H (Y ) log 3, with equality iff Y is uniform. Question: Is it possible to choose a p (x) such that Y is uniform? Ans: No. So, we cannot say that max p(x) H (Y ) = log 3. 16 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity X 0 1 p Y 0 Y 0 1 X 0 p 1 p 1 p 1 1 1 1 1 Instead, we can start with I (X ; Y ) = H (X ) H (X Y ). Then, we have the reverse channel law p (x y) = 1 0 α 1 α, where α P {X = 0}. 0 1 H (X Y = 0 ) = H (X Y = 1 ) = 0, H (X Y = ) = H b (α) = H (X ) = H (X Y ) = P {Y = } = ph (X ). H (X ) 1, with equality iff X is uniform. Hence, C BEC = max p(x) (1 p) H (X ) = 1 p. 17 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Erasure Channel We can generalize BEC to the following erasure channel: Input X, output Y = X { }. 1 p, y = x Channel law p (y x) = p, y = 0, otherwise A motivation for this model is from networking, where the erasure models the packet drop. Exercise 1 Show that the capacity of the erasure channel is C EC = (1 p) log X. 18 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Symmetric Channel In computing the capacity of BSC, we observe that 1 H (Y X ) = H b (p) regardless of p (x). Why? Because all rows of p (y x) are permutations of a same probability vector [ p 1 p ]. 2 H (Y ) = log Y can be attained, that is, Y can be made uniform by choosing X to be uniform. Why? Because all columns of p (y x), have the same sum x p (y x). Definition 2 (Symmetric Channel) A symmetric channel is a channel with channel law p (y x) satisfying (1) all rows of p (y x) are permutations of a same probability vector p, and (2) all columns of p (y x), have the same sum x p (y x). Exercise 2 Show that the capacity of a symmetric channel is log Y H (p). 19 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Computing Capacity of DMC via Convex Optimization For a DMC, we are able to find its capacity efficiently by revoking efficient algorithms in solving convex programs, since I (X ; Y ) is a concave function of p (x) for fixed p (y x). Proposition 2 I (X ; Y ) is a concave function of p (x) for fixed p (y x). pf: By definition, I (X ; Y ) = H (Y ) H (Y X ). H (Y X ) = x p (x) H (Y X = x) is a linear function of p (x), because H (Y X = x) = p (y x) log p (y x) is constant for fixed p (y x). H (Y ) is a concave function of p (y). p (y) is a linear function of p (x) for fixed p (y x). Hence, H (Y ) is a concave function of p (x) for fixed p (y x). Putting the above together, we complete the proof. 20 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity 1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 21 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Proof of the (Weak) Converse (1) We would like to show that for every sequence of ( 2 NR, N ) codes such that P (N) e 0 as N, the rate R max I (X ; Y ). p(x) pf: Note that W Unif [1 : 2 K ] and hence K = H (W ). ( ) ( ) NR H (W ) = I W ; Ŵ + H W Ŵ (2) I ( W ; Y N ) ( + 1 + P (N) e log ( 2 K + 1 )) (3) N k=1 I ( W ; Y k Y k 1 ) ( ) + 1 + P (N) e (NR + 2) (2) is due to K = NR NR and chain rule. (3) is due to W Y N Ŵ and Fano s inequality. (4) is due to chain rule and 2 K + 1 2 NR+1 + 1 2 2 NR+1. (4) 22 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Proof of the (Weak) Converse (2) ( ) Set ε N 1 N 1 + P (N) e (NR + 2), we see that ε N 0 as N because lim N P (N) e = 0. The next step is to relate N k=1 I ( W ; Y k Y k 1 ) to I (X ; Y ), by the following manipulation: I ( W ; Y k Y k 1 ) I ( W, Y k 1 ; Y k ) I ( W, Y k 1, X k ; Y k ) (5) = I (X k ; Y k ) max I (X ; Y ) (6) p(x) (5) is due to the fact that conditioning reduces entropy. (6) is due to DMC: p ( y k x k, y k 1, w ) = p ( y k x k, y k 1) = p (y k x k ) = Y k X k ( W, X k 1, Y k 1) = Y k X k ( W, Y k 1). 23 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Proof of the (Weak) Converse (3) Hence, we have NR N k=1 I ( W ; Y k Y k 1 ) + Nε N N max I (X ; Y ) + Nε N p(x) = R max I (X ; Y ) + ε N, N. p(x) Taking N, we have: R max I (X ; Y ) if it is achievable. p(x) Remark: Similar to the source coding problem, a stronger version of the converse holds in the channel coding problem as well: if R > C, then 1 as N for any encoding/decoding functions. P (N) e 24 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity 1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 25 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Channel Coding with Feedback: Problem Setup w Channel Encoder x N Noisy Channel y N Channel Decoder bw D 1 A ( 2 NR, N ) channel code consists of an encoding function (encoder) enc N : [1 : 2 K ] Y N 1 X N that maps each message w to a length N codeword x N, where K NR, and the k-th symbol x k is a function of ( w, y k 1) for all k [1 : N]. a decoding function (decoder) dec N : Y N [1 : 2 K ] { } that maps a channel output sequence y N to a reconstructed message ŵ or an error message. 2 The error probability is defined as P (N) e { } P W Ŵ. 3 A rate R is said to be achievable if there exist a sequence of ( 2 NR, N ) codes such that P (N) e 0 as N. The channel capacity is defined as C sup {R R : achievable}. 26 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Dependency Graph: Without vs. With Feedback X N X 1 p Y X Y N Y 1 enc N X 2 Y 2 dec N W cw X k Y k X N No Feedback Y N 27 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Dependency Graph: Without vs. With Feedback X N X 1 p Y X Y N Y 1 enc N X 2 Y 2 dec N W cw X k Y k X N With Feedback Y N 28 / 56 I-Hsiang Wang IT Lecture 4

Channel Capacity Proof of the Weak Converse Feedback Capacity Feedback Capacity Theorem 2 (Channel Coding Theorem for DMC with Feedback) The capacity of the DMC p (y x) with feedback is given by (1), the same as that without feedback. In other words, feedback does not increase the channel capacity for DMC. The proof is immediate because in the converse proof of channel coding theorem without feedback, we do not make use of the assumption that there is no feedback. In other words, the proof is identical even with feedback. Remark: Although feedback does not increase capacity, it does improve the reliability (error exponent) and finite-blocklength performance greatly. Furthermore, the design and the complexity of the coding scheme may also be greatly simplified and reduced due to feedback. The details are out of scope of this lecture. 29 / 56 I-Hsiang Wang IT Lecture 4

1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 30 / 56 I-Hsiang Wang IT Lecture 4

1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 31 / 56 I-Hsiang Wang IT Lecture 4

Overview In order to prove the achievability part of Theorem 1, we need to show the following mathematical statement: R < C, R 0, a sequence of ( 2 NR, N ) codes such that lim N P(N) e = 0. In general, to prove the existence of certain objects satisfying some desirable properties, there are two possible ways: 1 Explicitly construct an object and prove that the properties hold. 2 Assume that no objects can satisfy the properties, and show contradiction. The achievability proof presented in this lecture is more of the second flavor, and in fact belongs to the so-called probabilistic method. 32 / 56 I-Hsiang Wang IT Lecture 4

The Probabilistic Method What is the probabilistic method? Roughly speaking, in order to show the existence of certain objects satisfying some desirable properties, One first imposes particular probability distribution over the possible object space. Then, by showing that on average the properties hold or the properties hold with non-zero probability, one concludes the existence of such objects. Example 1 Given a set of n-dimensional unit vectors {v 1, v 2,..., v k }, show that x i {±1}, i = 1,..., k such that k i=1 x iv i k. 33 / 56 I-Hsiang Wang IT Lecture 4

pf: Let {X i } k i=1 be i.i.d. r.v. s with P {X i = 1} = P {X i = 1} = 1 2. Define V [ k i=1 X iv i. Compute E V 2] as follows: [ E V 2] = E [ V T V ] [( k = E X i v T i k = E i=1 j=1 i=1 k X i X j v T i v j = ) ( k )] X i v i k i=1 i=1 j=1 k E [X i X j ] v T i v j {X i } are mutually independent, E [X i X j ] = E [X i ] E [X j ] = 0 for i j. [ E V 2] = k i=1 E [ ] X 2 i vi 2 = k. Hence, x i {±1}, i = 1,..., k such that k i=1 x iv i k. Otherwise, [ E V 2] should be less than k, leading to contradiction. 34 / 56 I-Hsiang Wang IT Lecture 4

Paul Erdős (1913 1996) 35 / 56 I-Hsiang Wang IT Lecture 4

Coding over Noisy Channel Before we prove the main theorem, let us set up a few notations related to coding over noisy channel. 1 Codebook c = { x N (1), x N (2)... x N ( 2 K)} consists of the 2 K codewords and is the range of the encoding function. 2 ML Decoder (maximum likelihood) is the optimal decoder that minimizes the probability of error P e (N) when the messages are uniformly chosen (uniform prior): ŵ ML = arg max w [1:2K ] p ( y N x N (w) ). } W 3 Probability of Error of Message m: λ m P {Ŵ W = m In principle, one can derive the ML decoding rule and compute P (N) e given codebook. But, there are some challenges toward proving the channel coding theorem. for a 36 / 56 I-Hsiang Wang IT Lecture 4

Challenges and Work-Arounds First, the expression of error probability of ML is usually intractable, and it is hard to obtain any insight regarding the asymptotic behaviors. Second, it is unclear how to construct the codebook and the corresponding decoding scheme. In summary, to prove the achievability part of the channel coding theorem, there are two main challenges we shall overcome: 1 How to show the existence of good codebooks? We circumvent the issue of explicit construction by using a random coding argument (a kind of the probabilistic method) 2 How to analyze the error probability? We circumvent the issue of ML decoding error analysis by using a suboptimal decoder and derive upper bounds on the probability of error of the chosen decoder. 37 / 56 I-Hsiang Wang IT Lecture 4

Proof Program 1 Random Codebook Generation: Generate an ensemble of codebooks according to certain probability distribution. Hence, codebook C becomes a random object. 2 Error Probability Analysis: Goal: Show that as N, E C [ P (N) e,ml (C) ] 0, and conclude that there must exist a codebook c such that the decoding error probability P (N) e,ml 0. To simplify analysis, we shall introduce suboptimal decoders and give a tractable upper bound of error probability using union of events bound. 38 / 56 I-Hsiang Wang IT Lecture 4

Random Codebook Generation A simple way is to i.i.d. generate 2 K codewords, and each codeword p ( x N) i=1 p X (x i ). In other words, if we stack all 2 K codewords together into a 2 K N matrix C, the elements of the matrix C will be i.i.d. distributed according to p X : (each row is a codeword) X 1 (1) X 2 (1) X N (1) X 1 (2) X 2 (2) X N (2) c =...... ( X ) ( 1 2 K X ) ( 2 2 K X ) N 2 K and p (c) P {C = c} = 2 K w=1 N i=1 p X (x i (w)). It turns out the symmetry in such codebook ensemble distribution helps simplify the analysis. 39 / 56 I-Hsiang Wang IT Lecture 4

Encoding and Decoding For a realization c of the codebook random ensemble C, we describe the encoding and decoding methods below. Encoding: for a message m [1 : 2 K ], choose the m-th row of the codebook c and send it out. Decoding: ideally one would like to use the following ML decoding rule: ŵ ML = arg max w [1:2K ] p ( y N x N (w) ). However, the performance of ML decoder is usually not tractable, as mentioned before. Instead, we introduce a suboptimal decoder based on typical sequences as follows: ŵ T = a unique w such that ( x N (w), y N) T ε (N) (X, Y). Note: there are some other suboptimal decoders can be used, such as threshold decoders. 40 / 56 I-Hsiang Wang IT Lecture 4

Error Probability Analysis (1) Since the ML decoder is optimal, we can analyze the performance of the typicality decoder and use it as[ an upper] bound. Hence, our goal is turned to proving lim N E C P (N) e,t (C) = 0. 1 The first step [ is to use] the symmetry of codebook ensemble to simplify E C P (N) e,t (C) and argue that we can focus on analyzing the error probability of the first codeword X N (1) averaged over C: [ ] [ E C P (N) e,t (C) = E C 2 ] K 2 K λ m (C) = 2 K E C [λ m (C)] m=1 m = 2 K m E C [λ 1 (C)] = E C [λ 1 (C)] = P {Error, averaged over C W = 1} 41 / 56 I-Hsiang Wang IT Lecture 4

Error Probability Analysis (2) 2 For notational simplicity, use E denote the text Error event and drop the averaged over C. Our next focus is to upper bound P {E W = 1} P 1 (E). The trick here is to distinguish into two different kinds of errors: { (X E a N (1), Y N) } / T ε (N) E = E a E t, { (X E t N (w), Y N) } T ε (N) for some w 1 The core is whether or not the joint sequence ( { X N (w), Y N) are (X ε-typical. Let us define A w N (w), Y N) } T ε (N). We can then rewrite E a = A c 1, E t = w 1 A w, and hence E = E a E t = A c 1 ( w 1A w ). 42 / 56 I-Hsiang Wang IT Lecture 4

Error Probability Analysis (3) 3 We are now ready to apply the union of events bound: P 1 {E} = P 1 {A c 1 ( w 1 A w )} P 1 {A c 1} + 2 K w=2 P 1 {A w }. Next, we shall develop upper bounds on the probability that the actual transmitted codeword X N (1) and the actual received signal Y N are not (jointly) typical. the probability that some other (random) codeword X N ( 1) and the actual received signal Y N are (jointly) typical. Lemma 1 (A Key Lemma) N(I(X ;Y ) δ(ε)) P 1 {A 1 } 1 ε for N large enough, and P 1 {A w } 2 for all w 1, where δ (ε) 0 as ε 0. 43 / 56 I-Hsiang Wang IT Lecture 4

Error Probability Analysis (4) 4 Finally, let us put all the above together and apply Lemma 1: [ ] E C P (N) e,t (C) P {E} = P {E W = 1} P 1 {E} P 1 {A c 1} + ε + 2 K w=2 2 K w=2 P 1 {A w } N(I(X ;Y ) δ(ε)) 2 N(I(X ;Y ) δ(ε) R) ε + 2 As long as R I (X ; Y ) δ(ε), we are able to make [ P {E} ] 2ε for N large enough, which is equivalent to lim N E C P (N) e,t (C) 0. 44 / 56 I-Hsiang Wang IT Lecture 4

Completion of the We have shown [ that as] long as R I (X ; Y ) δ(ε), lim N E C P (N) e,t (C) 0, and hence there must exist a realization of codebook c such that P (N) e,t (c) 0 as N. Finally, taking the codebook generating distribution p X = arg max p(x) I (X ; Y ), we conclude that R < C = max p(x) I (X ; Y ), R is achievable. 45 / 56 I-Hsiang Wang IT Lecture 4

Proof of Lemma 1 (1): Recap of Typicality Recall: by definition, an ε-typical (vector) sequence (x n, y n ) shall satisfy π (a, b x n, y n ) p X,Y (a, b) εp X,Y (a, b), (a, b) X Y. (Note: we can think of (X, Y) as a r.v. and apply the same definition of typicality!) Hence, if (X n, Y n ) n i=1 p X,Y (x i, y i ), then we have 0 (x n, y n ) T (n) ε (X, Y) = x n T (n) ε (X), y n T ε (n) (Y). 1 (x n, y n ) T ε (n) (X, Y), 1 n log p (xn, y n ) H (X, Y ) δ(ε), where δ(ε) = εh (X, Y ). ( ) 2 p T ε (n) (X, Y) 1 ε for n large enough. 3 T (n) ε (X, Y) 2 n(h(x,y )+δ(ε)). 4 T ε (n) (X, Y) (1 ε)2 n(h(x,y ) δ(ε)) for n large enough. 46 / 56 I-Hsiang Wang IT Lecture 4

Proof of Lemma 1 (2): Typical with Actual Codeword { (X Let us first consider P 1 {A 1 } = P N (1), Y N) } T (N) W = 1. We are averaging over a random codebook ensemble C, and the random codebook is generated element-by-element i.i.d. based on p X. DMC without feedback implies p ( y N x N) = N i=1 p Y X (y i x i ). Hence, given W = 1, ( X N (1), Y N) has the following joint distribution: p ( x N, y N) = p ( x N) p ( y N x N) = N = N i=1 p X,Y (x i, y i ) ε i=1 p X (x i ) N i=1 p Y X (y i x i ) By Property 2 (LLN), we see that for N large enough, { (X P 1 {A 1 } = P N (1), Y N) } T (N) W = 1 1 ε. ε 47 / 56 I-Hsiang Wang IT Lecture 4

Proof of Lemma 1 (3): Typical with a Wrong Codeword { (X Consider P 1 {A w } = P N (w), Y N) } T (N) W = 1 for w 1. Note that we are averaging over a random codebook ensemble C, and the random codebook is generated element-by-element i.i.d. based on p X. Hence, although X N (1) and X N (w) have the same marginal distribution p X, they are actually independent. Due to DMC, ( X N (1), Y N) X N (w). Hence, Y N X N (w), and P 1 {A w } = (x N,y N ) T (N) ε 2 N(1+ε)H(X,Y ) }{{} cardinality upper bound on typical set = 2 N(I(X ;Y ) δ(ε)), ε p ( x N) p ( y N) 2 N(1 ε)h(x) }{{} upper bound on prob. of a typical sequence where δ (ε) = ε (H (X, Y ) + H (X) + H (Y)) 0 as ε 0. 2 N(1 ε)h(y) }{{} upper bound on prob. of a typical sequence 48 / 56 I-Hsiang Wang IT Lecture 4

Some Reflections Reflection 1: Mutual independence of codewords. In the random coding argument of the proof, 2 K N elements of the codebook matrix C are generated i.i.d., and hence the 2 K rows { X N (1),..., X N ( 2 K)} are mutually independent. However, in the proof we only require pairwise independence: X N (1) X N (w), w 1. Reflection 2: Typicality decoder. We use typicality decoder other than the optimal ML decoder to find tractable upper bounds on the error probability. There are other suboptimal decoders can be used. For example, the following threshold decoder can also work: ( ŵ th a unique w such that i x N (w) ; y N) > β, where i ( x N ; y N) log p(xn,y N ) p(x N )p(y N ) = N k=1 log p Y X (y k x k ) p X (x k ), and β I (X ; Y ) ε. 49 / 56 I-Hsiang Wang IT Lecture 4

1 Channel Capacity and the Weak Converse Channel Capacity Proof of the Weak Converse Feedback Capacity 2 50 / 56 I-Hsiang Wang IT Lecture 4

Joint Source-Channel Coding: Problem Setup Source {S i } s Ns Nc x ync Channel Encoder p Y X Decoder bs Ns Destination Source model: discrete stationary ergodic with entropy rate H ({S i } ). Channel model: DMC p Y X with channel capacity C ( ) p Y X. ) 1 A ( S NcR, N c joint source-channel code consists of an encoding function (encoder) enc Nc : S Ns X Nc that maps each source sequence s Ns to a length N c codeword x Nc, N s N c R. a decoding function (decoder) dec Nc : Y N c S N s that maps a channel output sequence y N c to a reconstructed sequence ŝ N s. { } 2 The error probability is defined as P (N c) e P S Ns ŜN s. 3 A rate R is said ) to be achievable if there exist a sequence of ( S NcR, N c codes such that P (N c) e 0 as N c. 51 / 56 I-Hsiang Wang IT Lecture 4

Theorem Theorem 3 () C 1 If R < H({S i} ), then R is achievable, i.e., lossless reconstruction of source {S i } is possible via the noisy channel p Y X. C 2 Conversely, if R > H({S i } ), then R is not achievable, i.e., lossless reconstruction is impossible. Source s Ns Source Encoder b K Channel Encoder x Nc Binary Interface Noisy Channel Destination bs Ns Source Decoder b b K Channel Decoder y Nc 52 / 56 I-Hsiang Wang IT Lecture 4

Proof of Achievability pf: (Achievability Part): Choose a ( 2 N sr s, N s ) lossless source code with Rs = H ({S i } ) + ε s. Choose a ( 2 N cr c, N c ) channel code with Rc = C ε c. Due the the channel coding theorem, the binary sequence b K lives in the digital interface between the source and the channel coders can be decoded with vanishing error probability. Due to the lossless source coding theorem, the source sequences can be reconstructed with vanishing error probability as long as the bit sequence b K can be successfully decoded by the channel decoder. Concatenate the above two codes together, we see that as long as N s R s < N c R c N s N c < R c C ε R s = c H({S i } )+ε s, the separation scheme is able to reconstruct the source sequence with vanishing error probability. C Since ε s, ε c can be made arbitrarily small, as long as R < H({S i} ), it is achievable. 53 / 56 I-Hsiang Wang IT Lecture 4

Proof of Converse pf: (Converse Part): We shall prove that achievable R, R N s H ({S i } ) H ( S ) ) Ns = I (S Ns ; ŜNs ( I ( S N s ; Y N c ) + N c k=1 1 + P (N c) e I ( S N s ; Y k Y k 1 ) ( + ) + H (S Ns Ŝ Ns ) N s log S 1 + P (N c) e C H({S i } ). ) N s log S (7) (8) N c (C + ε Nc ), where ε Nc 0 as N c. (9) (7) is due to the property of entropy rate and chain rule. (8) is due to S Ns Y Nc ŜNs and Fano s inequality. (9) is due to similar steps as in the channel coding converse proof. Hence, R Ns N c C H({S i} ) if R is achievable. 54 / 56 I-Hsiang Wang IT Lecture 4

Summary 55 / 56 I-Hsiang Wang IT Lecture 4

Channel coding theorem: C = max p(x) I (X ; Y ), for DMC p Y X with or without feedback Weak converse: Fano s inequality, data processing inequality, and DMC assumption Achievability: random coding argument, typicality decoder Feedback does not increase the capacity of DMC. Symmetric channel capacity = log Y H (p ), where p permutes all rows of p Y X. Erasure channel capacity = (1 p) log X. Joint source-channel coding theorem: C R < = R is achievable; R > H({S i} ) C H({S i} ) = R is not achievable. Source-channel separation is optimal. 56 / 56 I-Hsiang Wang IT Lecture 4