LECTURE 10. Last time: Lecture outline

LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5.

Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x) 2 nh(y ) >> 2 nh(x,y ) Passing x n through the channel to obtain y n, (x n, y n ) are jointly typical with high probability. For another independently chosen xn, ( x n, y n ) are jointly typical with probability 2 ni(x;y ). Coding Theorem Random coding Joint typicality decoding Converse proved by using Fano s inequality. A Possible Confusion: i.i.d. input distribution vs. transmitting independent symbols.

Remaining Topics Can we get rid of the random coding? Instead, we will get a closer look of random coding. Joint typicality decoding vs. Likelihood decoding. Maximum Example: Binary source sequence passing through BSC. Finite codeword length n. Our plan for the next two lectures Maximum likelihood decoding. Upper bound of the error probability. Random coding error exponent. Binary symmetric channel as an example.

Maximum Likelihood Decoding Notations Message W uniformly distributed in {1, 2,..., M}. M = 2 nr. Encoder transmit codeword x n (m) if the incoming message is W = m. Receiver receives y n, and find the most likely transmitted codeword. Ŵ = arg max P Y n X n(yn x n (m)) m Define Y m as the set of y n s that decodes to m. The probability of error conditioned on W = m is transmitted: P e,m = y n m P Y n X n(yn x n (m)) Y c

Pairwise Error Probability Consider the case M = 2. P e,1 = (y cp n x n (1)) Y y n 1 We really hate the Yc 1, since we have to figure out the decision region. Can we sum over the entire set Y n without dealing with the discontinuity? c Consider any y n Y 1, by definition P (y n x n (2)) P (y n x n (1)), so we can bound [ P (y n x n P (y n x n (2)) ] s P e,1 (1)) P (y n x n (1)) = y n Y c 1 P (y n x n (1)) 1 s P (y n x(2)) s y n Y c 1 y n P (y n x n (1)) 1 s P (y n x(2)) s for any s (0, 1).

Random Codewords Choose the codeword x n (1) and x n (2) i.i.d. from the distribution P X (or equivalently P X n) P e,1 and = P X n(x n (1)) P (y n x n (1)) y n x n (1) P (error W = 1, x n (1), y n ) P (error W = 1, x n (1), y n ) [ P X n(x n P (y n x n (2)) ] s (2)) P (y n x n (1)) x n (2) In general, this is a good bound for both the fixed and the random codewords. Random coding allows for generalization to many codewords. The upper bound allows us to compute the error probability without dealing with specific decision regions.

Example: Binary source/bsc Let x n (1) and x n (2) be the all 0 and the all 1 words. Use DMC, we have for m = 1, 2, and any s (0, 1), y 1 y 2 n n P e,m... P (y i x 1,i ) 1 s P (y i x 2,i ) s = i=1 y i y n i=1 P (y i x 1,i ) 1 s P (y i x 2,i ) s

Example: Binary source/bsc Plug in the specific choices of codewords: P (yi x 1,i ) 1 s P (y i x 2,i ) s = y i ɛ 1 s (1 ɛ) s + ɛ s (1 ɛ) 1 s Optimize to get s = 1/2, and P e,m [2 ɛ(1 ɛ)] N Alternative Approach Condition on the all 0 word is transmitted, error occurs when there are more than n/2 1 s, P e,1 2 nd(1 2 ɛ) = 2 n[log 2+1 2 log ɛ+1 2 log(1 ɛ)] The upper bound is quite tight! Similar development can be done with random codewords. We are now one step away from the case with many codewords.

BSC with Many Codewords Can we generalize this to many codeword by using the union bound? Assume there are 2 nr codewords, union bound of P e,m = m =m P (m m ) 2 nr 2 nd(1 2 ɛ) The error probability goes to 0 if ( ) 1 R D < 0 2 ɛ This says the probability of error decays exponentially with n.

The Error Exponent Rewrite P e,m,union = 2 n(d(1/2 ɛ) R) The error exponent is E u (R) = D ( 1 2 ɛ) R. As long as R is small enough such that E u (R) > 0, the error probability decays with n exponentially. Question Does this give the capacity? Too bad!, the maximum data rate is not the capacity. ( 1 ) 1 H(ɛ) D 2 ɛ = log 2 + ɛ log ɛ + (1 ɛ) log(1 ɛ) 1 1 [log 2 + log ɛ + log(1 ɛ)] ( ) 2 2 1 = ɛ log 1 ɛ 2 ɛ > 0

A Better Way than the Union Bound Lemma For any ρ (0, 1], Proof P ( ) [ ] ρ Am m m P (A m ) ( ) { m P (A m ) P Am 1 m Idea: use ρ to compensate serious overlap with a cost. Now define event A = {m m W = m, x n (m), y n m } and we have just computed P (A m ) x n (m ) P (y n x n (m )) s P X n(x n (m )) P (y n x n (m)) s

Upper Bound for the Error Probability = P (error W = m, x n (m), y n ) P A m m =m x n (m ) P (y n x n (m )) s (M 1) ρ P X n(x n (m )) P (y n x n (m)) s ρ n Average over x n (m), y, P e,m P n X n(x n (m))p (y x n (m)) 1 sρ y n x n (m) x n (m ) (M 1) ρ P X n(x n (m ))P (y n x n (m )) s for any s, ρ (0, 1]. ρ take s = 1/(1 + ρ).

Upper Bound Theorem P e,m (M 1) ρ y n x n PX n(x n )P (y n x n ) 1 1+ρ (1+ρ) Corollary Apply DMC P e,m (M 1) ρ n i=1 y i x i (M 1) ρ y 1 PX (x i )P Y X (y i x i ) 1+ρ 1+ρ ) 1+ρ ( n 1 = PX (x)p 1+ρ Y X (y x) x

Random Coding Error Exponent For a fixed input distribution P X, define ) 1 1+ρ E 0 (ρ) = log ( PX (x)p Y X (y x) 1+ρ y x Then the average probability of error P e,m 2 n(e 0 (ρ) ρr) As long as the random coding error exponent E r (R) = max[e 0 (ρ) ρr] ρ [0,1] is positive, the error probability can be driven to 0 as n.

The Behavior of the Error Exponent Facts: E 0 (ρ) 0 with equality only at ρ = 0. E 0 (ρ) ρ 0. E 0 (ρ) ρ I(X; Y ), with equality at ρ = 0. E 0 (ρ) is concave in ρ. Consider E r (R) = max [E 0 (ρ) ρ(r)] ρ [0,1] Ignore the constraint, the maximum occurs at R = E 0 (ρ)/ ρ ρ. The maximizing ρ lies in [0, 1] if E 0 (ρ) E 0 (ρ) R = I(X; Y ) ρ ρ=1 ρ ρ=0 For any R < I(X; Y ), we get positive error exponent, and the error probability can be driven to 0 as n.

Summary We have proved the coding theorem in another way. For R < C, the error probability decays exponentially with n. Remaining Questions Is this a good bound? We have chosen the random codes and computed the average performance. Is there any specific code that can do better than this? The two pieces of the error exponent curve is mysterious.