Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Similar documents
Lecture 6 I. CHANNEL CODING. X n (m) P Y X

EE 4TM4: Digital Communications II. Channel Capacity

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 4 Noisy Channel Coding

Lecture 3: Channel Capacity

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 15: Conditional and Joint Typicaility

(Classical) Information Theory III: Noisy channel coding

LECTURE 13. Last time: Lecture outline

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 4: Linear Codes. Copyright G. Caire 88

LECTURE 10. Last time: Lecture outline

CSCI 2570 Introduction to Nanocomputing

Shannon s noisy-channel theorem

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Shannon s Noisy-Channel Coding Theorem

ELEC546 Review of Information Theory

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Lecture 2. Capacity of the Gaussian channel

Lecture 22: Final Review

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 5 Channel Coding over Continuous Channels

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 4 Channel Coding

Lecture 10: Broadcast Channel and Superposition Coding

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

ECE Information theory Final (Fall 2008)

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #3 Channel and Source coding

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Entropies & Information Theory

Chapter 9 Fundamental Limits in Information Theory


ECE Information theory Final

Appendix B Information theory from first principles

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Noisy channel communication

X 1 : X Table 1: Y = X X 2

Lecture 8: Shannon s Noise Models

Distributed Lossless Compression. Distributed lossless compression system

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Multicoding Schemes for Interference Channels

Noisy-Channel Coding

Simultaneous Nonunique Decoding Is Rate-Optimal

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

16.36 Communication Systems Engineering

Exercises with solutions (Set B)

Lecture 4: Proof of Shannon s theorem and an explicit code

Shannon s Noisy-Channel Coding Theorem

Exercise 1. = P(y a 1)P(a 1 )

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

UCSD ECE 255C Handout #12 Prof. Young-Han Kim Tuesday, February 28, Solutions to Take-Home Midterm (Prepared by Pinar Sen)

Covert Communication with Channel-State Information at the Transmitter

The Method of Types and Its Application to Information Hiding

Capacity bounds for multiple access-cognitive interference channel

4 An Introduction to Channel Coding and Decoding over BSC

ECEN 655: Advanced Channel Coding

for some error exponent E( R) as a function R,

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Physical Layer and Coding

COMM901 Source Coding and Compression. Quiz 1

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Source and Channel Coding for Correlated Sources Over Multiuser Channels

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 4 Capacity of Wireless Channels

Lecture 6 Channel Coding over Continuous Channels

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Lecture 14 February 28

Chapter 9. Gaussian Channel

Quiz 2 Date: Monday, November 21, 2016

LECTURE 3. Last time:

On the Capacity of the Two-Hop Half-Duplex Relay Channel

On the Capacity Region of the Gaussian Z-channel

3F1 Information Theory, Lecture 3

ELEMENT OF INFORMATION THEORY

The Poisson Channel with Side Information

On Function Computation with Privacy and Secrecy Constraints

EE5585 Data Compression May 2, Lecture 27

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

The Capacity Region for Multi-source Multi-sink Network Coding

Relay Networks With Delays

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

A Comparison of Superposition Coding Schemes

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Channel Coding for Secure Transmissions

Lecture 18: Gaussian Channel

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 4 Capacity of Wireless Channels

Energy State Amplification in an Energy Harvesting Communication System

Solutions to Homework Set #2 Broadcast channel, degraded message set, Csiszar Sum Equality

Transcription:

Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122

M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel (DMC): a (stationary) DMC (X,P Y X, Y) consists of an input alphabet X, an output alphabet Y and a transition pmf P Y X such that P(Y n = y X n = x) = The memoryless property implies that ny P Y X (y i x i ) i=1 P Y i = y i M = m, X i =(x 1 (m),...,x i (m)),y i 1 =(y 1,...,y i 1 ) = P Y X (y i x i (m)) Copyright G. Caire (Sample Lectures) 123

Channel Coding Definition 12. consists of A block code C with rate R and block length n (an (R, n)-code) 1. A message set M =[1:2 nr ]= 1,...,2 nr. 2. A codebook {x(1),...,x(2 nr )}, i.e., an array of dimension 2 nr n over X, each row of which is a codeword. 3. An encoding function f : M! X n, such that f(m) =x(m) for m 2 M. 4. A decoding function g : Y n! M such that bm = g(y) is the decoded message. Copyright G. Caire (Sample Lectures) 124

Probability of Error Definition 13. Individual message probability of error: the conditional probability of error given that message m is transmitted is P e,m (C) =P(g(Y n ) 6= m X n = x(m)) Definition 14. Maximal probability of error: P e,max (C) = max m2m P e,m(c) Definition 15. Average probability of error: nr X2 P e (C) =2 nr m=1 P e,m (C) Copyright G. Caire (Sample Lectures) 125

Achievable Rates and Capacity Definition 16. Achievable rate: A rate R is said to be achievable if there exist a sequence of (R, n)-codes {C n } with probability of error P e,max (C n )! 0 as n!1. Definition 17. Channel capacity: The channel capacity is the supremum of all achievable rates. The above is an operational definition of capacity. A coding theorem (in information theory) consists of finding a formula, i.e., an explicit expression, for C in terms of the characteristics of the problem, i.e., in terms of P Y X. Copyright G. Caire (Sample Lectures) 126

Role of Mutual Information When the input X n is i.i.d. P X, then (X n,y n ) are i.i.d. with (X i,y i ) P X P Y X and Y n has an induced marginal distribution Y i P Y. There are 2 nh(y ) typical output sequences. If the input is typical, the probability of a non-typical output is negligible. For > 0 T (n) (Y x). > 0, for x 2 T (n) 0 (X) there are 2 nh(y X) typical outputs in How many non-overlapping typical output sets we can pack in T (n) (Y )? 2 nh(y ) 2 nh(y X) =2n(H(Y ) H(Y X)) =2 ni(x;y ) Copyright G. Caire (Sample Lectures) 127

The Channel Coding Theorem Theorem 11. Channel Coding Theorem: The capacity of the DMC (X,P Y X, Y) is given by C = max P X I(X; Y ) Example: Capacity of the BSC: A BSC is defined by Y = X Z, where X = Y = {0, 1}, addition is modulo-2, and Z Bernoulli-p. We have C = max P X I(X; Y ) = max P X {H(Y ) H(Y X)} = max P X {H(Y ) H(X Z X)} = max P X H(Y ) H 2 (p) =1 H 2 (p) Copyright G. Caire (Sample Lectures) 128

Capacity of the BEC Example: Capacity of the BEC: A BEC is defined by the diagram here below: 0 1 1-e e e 1-e 0 e 1 We have C = max P X I(X; Y ) = max P X {H(X) H(X Y )} = max P X {H(X) eh(x)} = max P X (1 e)h(x) = 1 e where the last equality holds by choosing X to be Bernoulli- 1 2. Copyright G. Caire (Sample Lectures) 129

Symmetric channels Strongly symmetric channels: the transition matrix P with elements P r,s = P Y X (y = s x = r) has the property that every row is a permutation of the first row, and each column is a permutation of the first column. Weakly symmetric channels: every row of P is a permutation of the first row. For strongly symmetric channels: achieved by X Uniform on X. For weekly symmetric channels: C = log Y H(P 1,1,...,P 1, Y ) C = max P X H(Y ) H(P 1,1,...,P 1, Y ) Copyright G. Caire (Sample Lectures) 130

Additive-noise channels A discrete memoryless additive noise channel is defined by X = Y = F q (or, more in general, some additive group). P Y X is induced by the random mapping Y i = x i + Z i It follows that P Y X (y x) =P Z (y x) Hence, additive noise channels over F q are always strongly symmetric, and have capacity C = log q H(Z). Copyright G. Caire (Sample Lectures) 131

Computing capacity: convex maximization Maximization of the mutual information: convex optimization problem maximize subject to I(p, P) = X r X p r =1 r 0 apple p r apple 1, 8 r X p r P r,s log s P r,s P r 0 p r 0P r 0,s Recall that we have proved that the mutual information I(p, P) seen as a function of the input probability vector p for fixed transition matrix P, is a concave function. Copyright G. Caire (Sample Lectures) 132

Proof of the Channel Coding Theorem (1) Direct Part (achievability): we wish to prove that for any R<Cthere exists a sequence of (R, n) codes with vanishing error probability. Random coding: instead of building a specific family of codes (very difficult), we average over a random ensemble of codes. Fix P X and generate a 2 nr n codebook at random with i.i.d. entries P X. The codebook (natural ordering encoding function) is revealed to transmitter and receiver before the communication takes place. Encoding: x(m) is the m-th row of the generated codebook. Decoding: Joint Typicality Decoding. Let y denote the received observed channel output. Then, g(y) = bm 2 M if this is the unique index s.t. (x( bm), y) 2 T (n) (X, Y ) declare error otherwise Copyright G. Caire (Sample Lectures) 133

Proof of the Channel Coding Theorem (2) Analysis of the probability of error: P (n) e = X C P(C)P e (n) (C) = X C nr X2 = 2 nr nr X2 P(C)2 nr m=1 nr X2 = 2 nr = X C m=1 m=1 P e,m (C) X P(C)P e,m (C) C X P(C)P e,1 (C) C P(C)P e,1 (C) =P(g(Y n ) 6= 1 M = 1) Copyright G. Caire (Sample Lectures) 134

Proof of the Channel Coding Theorem (3) We let E = {g(y n ) 6= 1} denote the conditional error event, and notice that E E 1 [ E 2 where and E 1 = {(X n (1),Y n ) /2 T (n) (X, Y )} E 2 = {(X n (m),y n ) 2 T (n) (X, Y ) for some m 6= 1} By the Union Bound we have P(g(Y n ) 6= 1 M = 1) = P(E M = 1) apple P(E 1 [ E 2 M = 1) apple P(E 1 M = 1) + P(E 2 M = 1) Copyright G. Caire (Sample Lectures) 135

Proof of the Channel Coding Theorem (4) For the first term, notice that since (X n (1),Y n ) are jointly distributed according to Q n i=1 P X(x i )P Y X (y i x i ), then by the LLN P(E 1 M = 1)! 0, as n!1 For the second term, for m 6= 1we have that X n (m) and Y n are distributed as the product of marginals Q n i=1 P X(x i )P Y (y i ), therefore P(E 2 M = 1) apple 2 nr 2 n(i(x;y ) ( )) by the Packing Lemma. It follows that for any > 0 and sufficiently large n P (n) e apple +2 n(i(x;y ) R ( )) apple 2, for R<I(X; Y ) ( ) Copyright G. Caire (Sample Lectures) 136

Proof of the Channel Coding Theorem (5) Consequences: 1. For any n there exists at least one code that perform not worse than the ensemble average; 2. We can choose P X in order to maximize I(X; Y ); 3. ( ) vanishes by considering smaller and smaller. From average to maximal error probability (expurgation). Fix > 0, and let C n? be a code with P e (C n)? apple and rate R>C ( ). Sort the codewords such that P e,1 (C? n) apple P e,2 (C? n) apple apple P e,2 nr(c? n) Copyright G. Caire (Sample Lectures) 137

Proof of the Channel Coding Theorem (6) Define the expurgated code e C? n = {x(1),...,x(2 nr 1 )} (best half of the codewords). It follows that P e,max ( e C? n)=p e,2 nr 1(C? n) apple 2 END OF THE DIRECT PART Copyright G. Caire (Sample Lectures) 138

Proof of the Channel Coding Theorem (7) Proof of the converse part: There exist no codes with rate R>Cand arbitrarily small error probability. It is more convenient to prove the following equivalent statement: suppose that a sequence of (R, n)-codes (C n } exists, such that P e (C n )=P e (n)! 0. Then, it must be R apple C. We start from Fano Inequality: consider the joint n-letter distribution induced by the message M Uniform on M, by the encoding function f and by the channel P Y X. Then... Copyright G. Caire (Sample Lectures) 139

Proof of the Channel Coding Theorem (8) nr = H(M) = H(M) H(M Y n )+H(M Y n ) nx apple I(M; Y n )+1+nP e (n) R = I(M; Y i Y i apple = nx I(M,Y i 1 ; Y i )+n n = i=1 i=1 nx I(M,Y i i=1 nx I(X i ; Y i )+n n apple nc + n n i=1 1 )+n n 1,X i ; Y i )+n n END OF THE CONVERSE PART Copyright G. Caire (Sample Lectures) 140

A discrete (stationary) memoryless channel (DMC) (X, p(y x), Y) consists of two finite sets X, Y, and a collection of conditional pmfs p(y x) on Y Feedback Capacity (1) By memoryless, we mean that when the DMC (X, p(y x), Y) is used over n transmissions with message M and input X n, the output Yi at time i 2 [1 : n] i i 1 is distributed according to xp(y i xy, iy 1 ), m) = p(yi xi ) i (M, Faculty of Electrical Engineering and Computer Systems Department of Telecommunication Systems Information and Communication Theory Prof. Dr. Giuseppe Caire Einsteinufer 25 10587 Berlin Telefon +49 (0)30 314-29668 Telefax +49 (0)30 314-28320 caire@tu-berlin.de Sekretariat HFT6 Patrycja Chudzik M Encoder Xn YYin p(y x) Decoder M Telefon +49 (0)30 314-28459 Telefax +49 (0)30 314-28320 sekretariat@mk.tu-berlin.de Message Estimate Channel Yi 1 A (2nR, n) code with rate R consists of: 0 bits/transmission for the DMC (X, p(y x), Y) 1. A message set [1 : 2nR] = {1, 2,...,D 2 nr } 2. An encoding function (encoder) xn : [1 : 2nR]! X n that assigns a codeword xn(m) to each message m 2 [1 : 2nR]. The set C := {xn(1),..., xn (2 nr )} is referred to with as thefeedback codebookis defined by a sequence of encoding functions An (R, n)-code LNIT: Point-to-Point Communication (2010-06-22 08:45) for i = 1,..., n, such that fi : M Y i 1!X xi = fi(m, y1,..., yi Copyright G. Caire (Sample Lectures) Page 3 2 1) 141

Feedback Capacity (2) This model, referred to as Shannon feedback channel, can be seen as an idealization of several protocols implemented today (e.g., ARQ, incremental redundancy, power control, rate allocation in wireless channels). Theorem 12. given by The feedback capacity of a discrete memoryless channel is C fb = C = max P X I(X; Y ) Proof: The converse for the channel without feedback holds verbatim for the case with feedback (check!). The achievability, obviously, also holds. Copyright G. Caire (Sample Lectures) 142

Feedback Capacity (3) Memoryless channels: feedback may greatly simplify operations, and achieve a much better behavior of the error probability versus n, at fixed rate R<C. Example of the BEC (Automatic Repetition request, ARQ). Channels with memory: feedback may achieves higher capacity. Multiuser networks: feedback may achieves (much) larger capacity. Copyright G. Caire (Sample Lectures) 143

Source-Channel Separation Theorem (1) We wish to transmit a stationary ergodic information source {V i } over the finite alphabet V over a discrete memoryless channel {X,P Y X, Y}. We fix the compression ratio = n/k, in terms of channel uses per source symbol. A joint source-channel code for this setup is defined by an encoding function and by a decoding function : V k! X n : Y n! V k Copyright G. Caire (Sample Lectures) 144

Source-Channel Separation Theorem (2) The error probability is defined by P (k,n) e = P(V k 6= (Y n ),X n = (V k )) We say that the source is transmissible over the channel with compression ratio if there exists a sequence of codes for k!1and n = k such that P e (k,n)! 0. Copyright G. Caire (Sample Lectures) 145

Source-Channel Separation Theorem (3) Theorem 13. Source-channel coding: A discrete memoryless source {V i } with V i 2 V is transmissible over the discrete memoryless channel {X,P Y X, Y} with compression ratio if H(V ) < C Conversely, if H(V ) > C, the source is not transmissible over the channel. Copyright G. Caire (Sample Lectures) 146

Source-Channel Separation Theorem (4) Proof of Achievability: Separation approach: we concatenate an almost lossless source code with a channel code. Almost lossless source code: if V k 2 T (k) (V ), encode it with k(h(v )+ ) bits, otherwise, declare error. Choose a sequence of capacity achieving codes C n? of rate R>C such that nr k(h(v )+ ) It follows that error probability not larger than 2 can be achieved if (C ) H(V )+ If H(V ) < C, we can find small enough and k large enough such that the above conditions can be satisfied. Copyright G. Caire (Sample Lectures) 147

Source-Channel Separation Theorem (5) Proof of Converse: We let b V k = (Y n ) denote the decoder output, then Fano inequality yields H(V k V b k ) apple 1+P e (k,n) k log V Assume that there exist a sequence of source-channel codes for k!1and n = k such that P (k,n) e! 0. Then... Copyright G. Caire (Sample Lectures) 148

Source-Channel Separation Theorem (6) H(V ) = 1 k H(V k ) = 1 k I(V k ; b V k )+ 1 k H(V k b V k ) apple apple apple 1 k I(V k ; b V k )+ 1 k + P (k,n) e n I(Xn ; Y n )+ k C + k log V from which we conclude that if such sequence of codes exists, then H(V) apple C. Copyright G. Caire (Sample Lectures) 149

Capacity-Cost Function (1) In certain problems it is meaningful to associate to the channel input a cost function. Let b : X! R + such that the per letter cost of an input sequence x is defined as b(x) = 1 nx b(x i ) n i=1 Example: Hamming weight cost: for X = F q, b(x) =1{x 6= 0}. Example: Quadratic cost (related to transmit power): for X R, b(x) =x 2. Copyright G. Caire (Sample Lectures) 150

Capacity-Cost Function (2) Theorem 14. Capacity-Cost Function: The capacity-cost function of the DMC (X,P Y X, Y) with input cost function b : X! R + is given by C(B) = max P X :E[b(X)]appleB I(X; Y ) Proof of Achievability (Sketch): For 0 > 0, choose an input distribution P X such that E[b(X)] apple B 0. Use P X through the random coding argument. Define an additional encoding error as follows: if the selected codeword x(m) violates the input cost, i.e., if 1 n P n i=1 b(x i(m)) >B, then declare an error. Copyright G. Caire (Sample Lectures) 151

Capacity-Cost Function (3) Include this error event in the union bound, and use the typical average lemma: if x(m) 2 T (n) (X), then (1 )(B 0 ) apple 1 n nx b(x i (m)) apple (1 + )(B 0 ) i=1 Choose and 0 such that (1+ )(B 0 ) <B, and conclude that the probability of encoding error can be made less than for sufficiently large n. Copyright G. Caire (Sample Lectures) 152

Capacity-Cost Function (4) Proof of Converse (Sketch): Assume that there exists a sequence of codes {C n } with rate R, such that! 0, and such that P (n) e nr X2 2 nr m=1 1 n nx b(x i (m)) apple B i=1 (notice that here we consider a relaxed version of the input constraint, that holds on average over all codewords, and not for each individual codeword). Define the function C( ) = max I(X; Y ) P X :E[b(X)]apple Notice that C( ) is non-decreasing, and concave in, in fact: C( 1 )+(1 )C( 2 ) apple C( 1 +(1 ) 2 ) Copyright G. Caire (Sample Lectures) 153

Capacity-Cost Function (5) For the n-letter distribution induced by using the codewords of C n with uniform probability over the message M, using Fano inequality as before, we obtain nr apple I(M; Y n )+1+nP e (n) R apple nx I(X i ; Y i )+n n apple i=1 nx C(E[b(X i )]) + n n i=1 apple nc 1 n apple! nx E[b(X i )] + n n i=1 nc(b)+n n Copyright G. Caire (Sample Lectures) 154

Capacity-Cost Function (6) Example: Capacity of the BSC with a Hamming weight input constraint: we can write I(X; Y )=H(Y) H(Y X) =H(Y ) H 2 (p) Hence, we have to maximize H(Y ) subject to E[1{X =1}] apple. Assume P X (1) = 2 [0, 1], then P Y (0) = (1 )(1 p)+ p, P Y (1) = (1 )p + (1 p) We use the compact notation p 0 = p probability vectors. Then, indicating cyclic convolution of the H(Y )=H(p )=H 2 ((1 )p + (1 p)) It can be checked that this is monotonically increasing for decreasing for 2 [1/2, 1]. Hence 2 [0, 1/2] and then C( )= H(p ) H2 (p), for 0 apple apple 1 2 1 H 2 (p), for 1 2 < apple 1 Copyright G. Caire (Sample Lectures) 155

End of Lecture 5 Copyright G. Caire (Sample Lectures) 156