School of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003.

Similar documents
Stochastic Interpretation for the Arimoto Algorithm

LECTURE 10. Last time: Lecture outline

A bound for block codes with delayed feedback related to the sphere-packing bound

Computation of Total Capacity for Discrete Memoryless Multiple-Access Channels

Shannon s noisy-channel theorem

Universal Anytime Codes: An approach to uncertain channels in control

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

EE 4TM4: Digital Communications II. Channel Capacity

Bounds on the Maximum Likelihood Decoding Error Probability of Low Density Parity Check Codes

Lecture 4 Noisy Channel Coding

6.451 Principles of Digital Communication II Wednesday, May 4, 2005 MIT, Spring 2005 Handout #22. Problem Set 9 Solutions

Notes on Convexity. Roy Radner Stern School, NYU. September 11, 2006

Lecture 4 Channel Coding

for some error exponent E( R) as a function R,

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Error Exponent Region for Gaussian Broadcast Channels

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Lecture 11: Quantum Information III - Source Coding

(Classical) Information Theory III: Noisy channel coding

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Covert Communication with Channel-State Information at the Transmitter

Spotlight on the Extended Method of Frobenius

Random Access: An Information-Theoretic Perspective

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

ELEC546 Review of Information Theory

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Variable Length Codes for Degraded Broadcast Channels

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Lecture 3: Channel Capacity

X 1 : X Table 1: Y = X X 2

Chapter 6* The PPSZ Algorithm

Capacity of a channel Shannon s second theorem. Information Theory 1/33

arxiv: v1 [cs.it] 5 Feb 2016

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

I - Information theory basics

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

Exact Random Coding Error Exponents of Optimal Bin Index Decoding

Arimoto Channel Coding Converse and Rényi Divergence

Reliable Computation over Multiple-Access Channels

Linear programming: Theory

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Capacity of AWGN channels

Channel Polarization and Blackwell Measures

An Improved Sphere-Packing Bound for Finite-Length Codes over Symmetric Memoryless Channels

Flows and Connectivity

Entropies & Information Theory

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Unequal Error Protection Querying Policies for the Noisy 20 Questions Problem

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Equivalence for Networks with Adversarial State

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

A Comparison of Superposition Coding Schemes

Source Coding with Lists and Rényi Entropy or The Honey-Do Problem

On MIMO Fading Channels with Side Information at the Transmitter

IN [1], Forney derived lower bounds on the random coding

Algebraic Geometry Codes. Shelly Manber. Linear Codes. Algebraic Geometry Codes. Example: Hermitian. Shelly Manber. Codes. Decoding.

Intermittent Communication

Can Negligible Cooperation Increase Network Reliability?

UC San Francisco UC San Francisco Previously Published Works

Low-Density Parity-Check Codes

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

An introduction to basic information theory. Hampus Wessman

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

LDPC Codes. Intracom Telecom, Peania

Dispersion of the Gilbert-Elliott Channel

Lecture 7. The Cluster Expansion Lemma

Lecture 12: November 6, 2017

A Half-Duplex Cooperative Scheme with Partial Decode-Forward Relaying

Solutions to Homework Set #3 Channel and Source coding

The Method of Types and Its Application to Information Hiding

A Formula for the Capacity of the General Gel fand-pinsker Channel

Relay Networks With Delays

EECS 750. Hypothesis Testing with Communication Constraints

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Interference Channels with Source Cooperation

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Shannon s Noisy-Channel Coding Theorem

11.4. RECURRENCE AND TRANSIENCE 93

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Accurate Statistical Analysis of High-Speed Links with PAM-4 Modulation

Error Exponent Regions for Gaussian Broadcast and Multiple Access Channels

Lecture 3: Error Correcting Codes

Covert Timing Channel Analysis of Rate Monotonic Real-Time Scheduling Algorithm in MLS systems

A. Derivation of regularized ERM duality

Lecture 4: Proof of Shannon s theorem and an explicit code

Introduction to Convolutional Codes, Part 1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 15: Conditional and Joint Typicaility

Practical Polar Code Construction Using Generalised Generator Matrices

LECTURE 13. Last time: Lecture outline

QUADRATIC GRAPHS ALGEBRA 2. Dr Adrian Jannetta MIMA CMath FRAS INU0114/514 (MATHS 1) Quadratic Graphs 1/ 16 Adrian Jannetta

Section 1.5 Formal definitions of limits

The PPM Poisson Channel: Finite-Length Bounds and Code Design

Transcription:

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer and Communication Sciences Handout 8 Information Theor and Coding Notes on Random Coding December 2, 2003 Random Coding In this note we prove the achievabilit part of the channel coding theorem for discrete memorless channels without feedback. The discussion is largel based on chapter 5 of R. G. Gallager, Information Theor and Reliable Communication, Wile, 968.. Discrete Memorless Channels Without Feedback Throughout this note we will fix the channel we wish to communicate over. Let X and Y denote the input and output alphabets of this channel. We will assume that the channel is discrete, i.e., that X and Y are finite sets. The behavior of the channel will be completel described b specifing for each k the function P k : X k Y k R, (x k, k ) P k ( k x k, k ), which gives the probabilit of receiving the letter k at the output at time k, given the past and current inputs to the channel, and the past outputs of the channel. We make the two following assumptions: the channel is memorless and used without feedback. We will call a channel memorless if P k ( k x k, k ) = P ( k x k ), which in words means that the channel keeps no memor of the past inputs and outputs in determining the output of time k. Also note that the channel does not behave differentl at different times: the function P on the right hand side is not an explicit function of k. If a memorless channel is used without feedback, i.e., if P Xk X k,y k (x k x k, k ) = P Xk X k (x k x k ) (in words: if the channel inputs do not depend on the past channel outputs) then P Y n X n(n x n ) = P X n,y n(xn, n ) P X n (x n ) n k= = P X k,y k X k,y (x k k, k x k, k ) P X n (x n ) n k= = P X k X k,y (x k k x k, k )P Yk X k k,y ( k x k, k ) P X n (x n ) n k= = P X k X (x k k x k )P Yk X k ( k x k ) P X n (x n ) n = P ( k x k ) k=

where we use the memorless and without feedback conditions at the fourth equalit. From now on, we will restrict our attention of channels used without feedback. With some abuse of notation we will let P ( x) denote the probabilit of receiving the sequence = n at the output of the channel when the channel input is the sequence x = x n. If the channel is memorless, we see from above that 2. Block Codes P ( x) = n P ( k x k ). k= A block code with M messages and block length n is a mapping from a set of M messages {,..., M} to channel input sequences of length n. Thus, a block code is specified when we specif the M channel input sequences c = (c,,..., c,n ),..., c M = (c M,,..., c M,n ) the messages are mapped into. We will call c m the codeword for message m. To send message m with such a block code we simpl give the sequence c m to the channel as input. A decoder for such a block code is a mapping from channel output sequences Y n to the set of M messages {,..., M}. For a given decoder, let D m Y n denote the set of channel outputs which are mapped to message m. Since an output sequence is mapped to exactl one message, D m s form a collection of disjoint sets whose union is Y n. We define the rate of a block code with M messages and block length n as ln M n, and given such a code and a decoder we define P e,m = D m P ( c m ), the probabilit of a decoding error when message m is sent. Further define P e,ave = M M m= P e,m and P e,max = max m M P e,m as the average and maximal (both over the possible messages) error probabilit of such a code and decoder. Among man possible decoding methods, the rule that minimizes P e,ave is the maximum likelihood rule. Given a channel output sequence, the maximum likelihood rule decodes a message m for which P ( c m ) P ( c m ) for ever m m, and if there are more than one such m chooses one of them arbitraril. We will restrict ourselves in the following to the maximum likelihood rule. 3. Error probabilit for two codewords Consider now the case when M = 2, so the block code consists of two codewords, c and c 2. We will find a bound on P e,m for the maximum likelihood decoding rule. 2

Suppose message is to be sent. The channel input is then c, and the probabilit of receiving at the channel output is P ( c ). An error will occur if the received sequence is not in D. Since for ever not in D, P ( c 2 ) P ( c ), (but there ma be s for which P ( c 2 ) = P ( c ) and D ) we have P e, = D P ( c ) : P ( c 2 ) P ( c ) P ( c ) = P ( c ) P ( c2 ) P ( c ) :P ( c 2 ) P ( c ) P ( c ) P ( c 2) s P ( c ) s P ( c ) P ( c 2) s P ( c ) s for an s 0 = P ( c ) s P ( c 2 ) s The choice s = /2 gives us P ( c )P ( c 2 ), P e, and b smmetr, the same quantit also upper bounds P e,2. For a memorless channel P ( c m ) = n k= P ( k c m,k ) and we obtain P e,m P ( c )P ( c 2 ) = P ( c, )P ( c 2, )... P ( n c,n P ( n c 2,n ) n [ ] [ ] = P ( c, )P ( c 2, )... P ( n c,n )P ( n c 2,n ) = n [ ] P ( c,k )P ( c 2,k ). k= For example, for a binar smmetric channel with crossover probabilit ɛ, we see that { if c,k = c 2,k P ( c,k )P ( c 2,k ) = 2 ɛ( ɛ) else, and we obtain n P e,m [4ɛ( ɛ)] d/2 where d is the number of places c and c 2 are different (i.e., d = {k : c,k c 2,k } ). 4. Error Probabilit for two randoml chosen codewords Suppose now that the codewords c and c 2 are chosen randoml and independentl each from a distribution Q on X n. Observe that the codewords are random variables C and 3

C 2 and the probabilit that the block code is the particular block code with codewords c and c 2 is given b Q(c )Q(c 2 ). The error probabilities P e,m are now random variables P e,m since the are functions of C and C 2. Let P e,m denote the expectation of P e,m. We will now give an upper bound on P e,m in two different was. The first is the more straightforward, but the second wa will turn out to be conceptuall more useful. We will take m =, but is is clear b smmetr that P e, = P e,2. Method : Write P e, = E[P e, ] = c c 2 Q(c )Q(c 2 )E[P e, C = c, C 2 = c 2 ]. But when we are given c and c 2, P e, is no longer random, and we can use the bound of the last section on the probabilit of error to get P e, Q(c )Q(c 2 ) P ( c ) P ( c 2 ) c c 2 = [ Q(c ) P ( c )][ Q(c 2 ) P ( c2 )] c c 2 = [ Q(c) ] 2 P ( c) c where the last equalit follows b noting that the sums in the brackets in the next to last line are identical since the differ onl b the index of summation. Method 2: Write P e, = E[P e, ] = Q(c )P ( c )E[P e, C = c, Y = ]. c Note that here we are computing the expectation b first conditioning on the transmitted and received sequences. Now, observe that given C = c and the received sequence, an error is sure not to occur if C 2 is chosen such that P ( C 2 ) < P ( c ), and otherwise we can upper bound P e, b. Thus, given C = c and, we have { if P ( C 2 ) P ( c ) P e, 0 if P ( C 2 ) < P ( c ) Taking expectations we see that = P ( C2 ) P ( c ). E[P e, C = c, Y = ] Pr(B 2 ) where B 2 is the event that C 2 is chosen such that P ( C 2 ) P ( c ). We can bound Pr(B 2 ) b Pr(B 2 ) = c 2 Q(c 2 ) P ( c2 ) P ( c ) c 2 Q(c 2 ) P ( c 2) s P ( c ) s for an s 0. 4

Taking s = /2 and substituting back we obtain the same bound as before, P e, [ Q(c) 2. P ( c)] c For memorless channels the bound simplifies if Q(c) is chosen to be Q(c) = n k= Q(c k). In this case we obtain [ [ P e, Q(x) ] 2 n P ( x)]. 5. Average error probabilit of a randoml chosen code x Consider now a code with M codewords, each codeword c m chosen independentl according to the probabilit distribution Q. Just as in the above section, the probabilit that the block code constructed is a particular block code with codewords c,..., c M is M m= Q(c m). The error probabilities P e,m are also random variables, and again let P e,m denote the expectation of P e,m. B the smmetr with respect to the permutation of the codewords in the construction, we see that P e,m does not depend on m, and it will suffice to analze P e,. We will take the extension of method 2 in the previous section as our analsis method, and write P e, = c Q(c )P ( c )E[P e, C = c, Y = ]. Now, for a given c and define for each m 2, B m as the event that codeword C m is chosen such that P ( C m ) P ( c ), i.e., that codeword m is at least as likel as the transmitted codeword. Then just as in the previous section, ( M E[P e, C = c, Y = ] Pr m=2 { min, B m ) M m=2 } Pr(B m ) [ M ρ Pr(B m )] for all ρ [0, ]. m=2 The second inequalit above is just the union bound, the third inequalit is because for ρ [0, ], x x ρ when x [0, ], and x ρ when x. Observe now that Pr(B m ) = c m Q(c m ) P ( cm) P ( c ) c m = c Q(c m ) P ( c m) s P ( c ) s for an s 0 Q(c) P ( c)s P ( c ) s and thus E[P e, C = c, Y = ] [ (M ) c 5 Q(c) P ] ρ ( c)s. P ( c ) s

Substituting back we get P e, (M ) ρ [ ][ ] ρ. Q(c )P ( c ) sρ Q(c)P ( c) s c c Choosing now s = /( + ρ) (this choice in fact minimizes the bound) and observing that for this choice sρ = s, and that [ ] [ ] Q(c )P ( c ) sρ = Q(c)P ( c) s c since the two summations differ onl b the summation index, we get P e, (M ) [ ] +ρ. ρ Q(c)P ( c) /(+ρ) c If we now specialize this theorem to discrete memorless channels and if we choose Q(c) = n i= Q(c i), we get that for ever ρ [0, ], P e, (M ) ρ [ c [ ] ] +ρ n Q(x)P ( x) /(+ρ) If M = e nr, then (M ) e nr, and we can summarize the above as x Theorem. Given a discrete memorless channel described b P ( x), for an blocklength n, and an R 0 consider constructing a random block code with M = e nr codewords b choosing each letter of each codeword independentl according to a distribution Q on X. Then, the expected average error probabilit of this random code satisfies P e,ave exp { n max [E 0 (ρ, Q) ρr] } ρ [0,] where E 0 (ρ, Q) = ln [ ] +ρ. Q(x)P ( x) /(+ρ) x Since the expected error probabilit cannot be better than the error probabilit of the best code we also get Corollar. Given a discrete memorless channel described b P ( x), for an distribution Q on X, an blocklength n and an R > 0, there exists a code of block length n and rate at least R with P e,ave exp { ne r (R, Q) }. where E r (R, Q) = max [E 0 (ρ, Q) ρr]. ρ [0,] Note that the above corollar establishes the existence of codes of a certain rate with a guarantee on their P e,ave, but suggests no mechanism to find them. However, if one does carr out the experiment of constructing a code b randoml choosing its codewords, the probabilit that the code obtained will be much worse than the average is small, in particular, Markov inequalit tells us that the probabilit that a code constructed as such has P e,ave larger than α P e,ave is small: Pr[P e,ave α P e,ave ] /α for α >. 6

ρi(q) E 0 (ρ, Q) ρ Figure : E 0 (ρ, Q) Thus, the difficult in finding good practical codes is not that the good codes are rare, but the difficult is in find good codes for which there are practical methods to encode and decode. The corollar above makes guarantees on P e,ave, but for a code constructed b random coding the value of P e,max ma be much higher than P e,ave. Can we ensure the existence of codes with small P e,max? Theorem 2. Given a discrete memorless channel described b P ( x), for an distribution Q on X, an blocklength n and an R > 0, there exists a block code of length n and rate at least R with P e,max 4 exp { ne r (R, Q) }. Proof. Let M = 2e nr. We know that there is a code with M codewords and satisfies P e,ave = M M m= P e,m (M ) ρ exp { ne 0 (ρ, Q) } for ever ρ [0, ]. Since we see that for this code (M ) ρ 2 ρ e nρr 2e nρr, P e,ave = M M m= P e,m 2 exp { ne r (R, Q) }. Now, among the M numbers P e,,..., P e,m there cannot be more than M /2 which exceed twice the average value P e,ave. Thus, among these M numbers there exist at least e nr of them which satisf P e,m 2P e,ave 4 exp { ne r (R, Q) }. Keeping onl these codewords, and noticing that in the maximum likelihood decoder that corresponds to this code the decoding sets can onl be enlarged, we see that we have constructed a code with the desired properties. 5.. Properties of E 0 and E r The value of the existence theorems we just proved depend on whether the bound we have on the probabilit of error can be made arbitraril small for a set of rates of interest. Define I(Q) = x Q(x)P ( x) ln P ( x) x Q(x )P ( x ) 7

as the value of mutual information between the input and output when the input distribution is Q. We will now show that for ever rate R < I(Q), the exponent E r (R, Q) > 0, and thus for ever rate R < I(Q) we can find codes with arbitraril small probabilit of error (b taking n large enough). Since this statement holds for ever input distribution Q, it also follows that for ever rate R < max Q I(Q) = max px I(X; Y ) there exist codes of arbitraril small probabilit of error. This proves that an rate R below capacit is achievable (achievabilit part of the coding theorem). A tedious but straightforward calculation also ields that E 0 (ρ, Q) ρ = I(Q). ρ=0 Using the fact that whenever Q and A are nonnegative, the function t [0, ) [ x Q(x)A(x)/t] t is nonincreasing, it is eas to show that E0 (ρ, Q) is nondecreasing in ρ. We do not need it here, but it is also possible to show that E 0 (ρ, Q) is a concave function of ρ. Thus, Figure shows the tpical behavior of E 0 as a function of ρ. Observe now that since the function R E r (R, Q) is a maximum of the linear functions R E 0 (ρ, Q) ρr, (see Figure 2), we see that E r (R, Q) > 0 whenever for some ρ [0, ], E 0 (ρ, Q) ρr > 0 is satisfied. E 0 (, Q) E 0 ( 2, Q) E 0 ( 4, Q) E r (R, Q) I(Q) R Figure 2: E r (R) as a maximum of linear functions. It now follows that if R < E 0 (ρ, Q)/ρ for some ρ (0, ] and Q, then E 0 (ρ, Q) ρr > 0 and thus E r (R, Q) > 0. But, I(Q) = E 0(ρ, Q) E 0 (ρ, Q) E 0 (0, Q) E 0 (ρ, Q) ρ = lim = lim. ρ 0 ρ=0 ρ ρ 0 ρ where the last equalit follows from fact that E 0 (0, Q) = ln x, Q(x)P ( x) = ln = 0. Thus, b the definition of the limit above, for ever R < I(Q), we can find a ρ such that E 0 (ρ, Q)/ρ > R. Hence for ever R < I(Q) we have E r (R, Q) > 0. Since the above argument holds for an input distribution Q, it holds in particular for the Q that maximizes I(Q). This proves the existence of codes with arbitraril small probabilit of error for ever rate less than the capacit max Q I(Q). The approach we used here ields to the function E r (R, Q) called random coding error exponent. To prove the achievabilit part of the coding theorem it would suffice to show that for an rate R below capacit there exists a sequence of rate R codes of length n with corresponding error probabilit that tends to zero as n tends to infinit. The random coding error exponent argument in addition to prove it, also characterizes the deca of the probabilit of error as n tends to infinit. 8