Lecture 6 Channel Coding over Continuous Channels

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 9, 015 1 / 59 I-Hsiang Wang IT Lecture 6

We have investigated the measures of information for continuous r.v. s: The amount of uncertainty (entropy) is mostly infinite. Mutual information and KL divergences are well defined. Differential entropy is a useful entity to compute and manage measures of information for continuous r.v. s. Question: How about coding theorems? Is there a general way or framework to extend coding theorems from discrete (memoryless) sources/channels to continuous (memoryless) sources/channels? / 59 I-Hsiang Wang IT Lecture 6

Discrete Memoryless Channel w Channel Encoder x N Channel p Y X y N Channel Decoder bw C (B) = max I (X ; Y ). X: E[b(X)] B? w Channel Encoder x N Channel f Y X y N Channel Decoder bw C (B) = sup I (X ; Y ). X: E[b(X)] B 3 / 59 I-Hsiang Wang IT Lecture 6

Coding Theorems: from Discrete to Continuous (1) Two main techniques for extending the achievability part of coding theorems from the discrete world to the continuous world: 1 Discretization: Discretize the source and channel input/output to create a discrete system, and then make the discretization finer and finer to prove the achievability. New typicality: Extend weak typicality for continuous r.v. and repeat the arguments in a similar way. In particular, replace the entropy terms in the definitions of weakly typical sequences by differential entropy terms. Using discretization to derive the achievability of Gaussian channel capacity follows Gallager[] and El Gamal&Kim[6]. Cover&Thomas[1] and Yeung[5] use weak typicality for continuous r.v. s. Moser[4] uses threshold decoder, similar to weak typicality. 4 / 59 I-Hsiang Wang IT Lecture 6

Coding Theorems: from Discrete to Continuous () In this lecture, we use discretization for the achievability proof. Pros: No need for new tools (eg., typicality) for continuous r.v. s. Extends naturally to multi-terminal settings can focus on discrete memoryless networks. Cons: Technical; not much insight on how to achieve capacity. Hence, we use a geometric argument to provide insights on how to achieve capacity. Disclaimer: We will not be 100% rigorous in deriving the results in this lecture. Instead, you can find rigorous treatment in the references. 5 / 59 I-Hsiang Wang IT Lecture 6

Outline 1 First, we formulate the channel coding problem over continuous memoryless channels (CMC), state the coding theorem, and sketch the converse and achievability proofs. Second, we introduce additive Gaussian noise (AGN) channel, derive the Gaussian channel capacity, and provide insights based on geometric arguments. 3 We then explore extensions, including parallel Gaussian channels and correlated Gaussian channels, and continuous-time bandlimited Gaussian channels. 6 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 7 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 8 / 59 I-Hsiang Wang IT Lecture 6

Continous Memoryless Channel w Channel Encoder x N Channel f Y X y N Channel Decoder bw 1 Input/output alphabet X = Y = R. (CMC): Channel Law: Governed by the conditional density (p.d.f.) f Y X. Memoryless: Y k X k ( X k 1, Y k 1). 3 Average input cost constraint B: 1 N N k=1 b (x k) B, where b : R [0, ) is the (single-letter) cost function. The definitions of error probability, achievable rate, and capacity, are the same as those in channel coding over DMC. 9 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding Theorem Theorem 1 ( Capacity) The capacity of the CMC ( R, f Y X, R ) with input cost constraint B is C = sup I (X ; Y ). (1) X: E[b(X)] B Note: The input distribution of the r.v. X needs not to have a density. In other words, it could also be discrete. How to compute h (Y X ) when X has no density? Recall h (Y X ) = E X [ ] supp f (y X) log f (y X) dy, Y where f (y x) is the conditional density of Y given X. Converse proof: Exactly the same as that in the DMC case. 10 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (1): Discretization w ENC x N f Y X y N DEC bw The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: 11 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (1): Discretization w ENC Q in f Y X Q out DEC bw The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: Q in : (single-letter) discretization that maps X R to X d X d. Q out : (single-letter) discretization that maps Y R to Y d Y d. Note that both X d and Y d are discrete (countable) alphabets. 1 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (1): Discretization w ENC New ENC Equivalent DMC Q in f Y X Q out DEC bw The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: Q in : (single-letter) discretization that maps X R to X d X d. Q out : (single-letter) discretization that maps Y R to Y d Y d. Note that both X d and Y d are discrete (countable) alphabets. Idea: With the two discretization blocks Q in and Q out, one can build an equivalent DMC ( X d, p Yd X d, Y d ) as shown above. 13 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (): Arguments w Q in New ENC x N d Q out Equivalent DMC p Yd X d y N d DEC bw 1 Random codebook generation: Generate the codebook randomly based on the original (continuous) r.v. X, satisfying E [b (X)] B. Choice of discretization: Choose Q in such that the cost constraint will not be violated after discretization. Specifically, E [b (X d )] B. 3 Achievability in the equivalent DMC: By the achievability part of the channel coding theorem for DMC with input constraint, any rate R < I (X d ; Y d ) is achievable. 4 Achievability in the original CMC: Prove that when the discretization in Q in and Q out gets finer and finer, I (X d ; Y d ) I (X ; Y ). 14 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 15 / 59 I-Hsiang Wang IT Lecture 6

Additive White Gaussian Noise (AWGN) Channel z N w Channel Encoder x N y N Channel Decoder bw 1 Input/output alphabet X = Y = R. AWGN Channel: Conditional p.d.f. f Y X is given by Y = X + Z, Z N ( 0, σ ) X. {Z k } form an i.i.d. (white) Gaussian r.p. with Z k N ( 0, σ ), k. Memoryless: Z k ( W, X k 1, Z k 1). Without feedback: Z N X N. 3 Average input power constraint P: 1 N N k=1 x k P. 16 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding Theorem for Gaussian Channel Theorem () The capacity of the AWGN channel with input power constraint P and noise variance σ is given by C = sup I (X ; Y ) = 1 log ( ) 1 + P σ. () X: E[ X ] P Note: For the AWGN channel, the supremum is actually attainable with Gaussian input X N (0, P), that is, the input has density f X (x) = 1 πp e x P, as shown in the next slide. 17 / 59 I-Hsiang Wang IT Lecture 6

Evaluation of Capacity Let us compute the capacity of AWGN channel () as follows: I (X ; Y ) = h (Y ) h (Y X ) = h (Y ) h (X + Z X ) = h (Y ) h (Z X ) = h (Y ) h (Z ) (since Z X) = h (Y ) 1 log (πe) σ (a) 1 log (πe) ( P + σ ) 1 log (πe) σ = 1 ( log 1 + P ) σ Here (a) is due to the fact that h (Y ) 1 log (πe) Var [Y] and Var [Y] = Var [X] + Var [Z] P + σ, since Var [X] E [ X ] P. Finally, note that the above inequalities hold with equality when X N (0, P). 18 / 59 I-Hsiang Wang IT Lecture 6

Achievability Proof (1): Discretization Here we use a simple quantizer as follows to construct the discretization blocks Q in and Q out : { } m N, let Q m := l m : l = 0, ±1,..., ±m be the set of quantized points. For any r R, quantize r to the closest point [r] m Q m such that [r] m r. Discretization: For two given m, n N, define Channel input discretization: Q in ( ) = [ ] m. Channel output discretization: Q out ( ) = [ ] n In other words, X d = Q m, Y d = Q n, X d = [X] m, and Y d = [X d + Z] n = [ [X] m + Z ] n. 19 / 59 I-Hsiang Wang IT Lecture 6

Achievability Proof (): Equivalent DMC Now we have an equivalent DMC with Input X d = [X] m Output Y d = [ Y (m)] n, where Y(m) [X] m + Z. Note that for any original input r.v. X with E [ X ] P, the discretized [X] m also satisfies the power constraint: E [ [X] m ] E [ X ] P. Hence, by the achievability result of DMC with input cost constraint, any R < I ( [X] m ; [ Y (m)] n ) (evaluated under f X (x) = 1 πp e x P ) is indeed achievable for the equivalent DMC under power constraint P. The only thing left to be shown is that, I ( [X] m ; [ Y (m)] n ) can be made arbitrarily close to I (X ; Y ) = 1 log ( 1 + P σ ) as m, n. 0 / 59 I-Hsiang Wang IT Lecture 6

Achievability Proof (3): Convergence Due to data processing inequality and [X] m Y (m) [ Y (m)], we have n I ( [X] m ; [ Y (m)] n ) I ( [X]m ; Y (m) ) = h ( Y (m) ) h (Z ). Since Var [ Y (m)] P + σ, we have h ( Y (m) ) 1 log ( πe(p + σ ) ), and hence the upper bound I ( [X] m ; [ Y (m)] n ) 1 log ( 1 + P σ ). For the lower bound, we would like to prove lim inf m lim n I ( [X] m ; [ Y (m)] n ) 1 log ( 1 + P σ ). We skip the details here; see Appendix 3A of El Gamal&Kim[6]. 1 / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) y = x + z By LLN, as N, most output y (y N ) will lie inside the N-dimensional sphere of radius N (P + σ ). / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) p N y = x + z By LLN, as N, most output y (y N ) will lie inside the N-dimensional sphere of radius N (P + σ ). Also by LLN, as N, y will lie near the surface of the N-dimensional sphere centered at x with radius Nσ. Vanishing error probability criterion = non-overlapping spheres. Question: How many non-overlapping spheres can be packed into the large sphere? Maximum # of non-overlapping spheres = Maximum # of codewords that can be reliably delivered. 4 / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) p N y = x + z Back-of-envelope calculation: NR N(P+σ ) N Nσ N = R 1 N log ( N(P+σ ) N Nσ N ) = 1 log ( 1 + P σ ) Hence, intuitively any achievable rate R cannot exceed C = 1 ( log 1 + P ) σ. How to achieve it? 5 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere p NP Random codebook generation: Generate NR N-dim. vectors (codewords) {x 1,..., x NR} lying in the x-sphere of radius NP. x 1 x 6 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere p NP Random codebook generation: Generate NR N-dim. vectors (codewords) {x 1,..., x NR} lying in the x-sphere of radius NP. Decoding: α P P+σ (MMSE coeff.) x 1 y MMSE αy Nearest Neighbor x y x 7 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere r p NP N P P + x 1 y x Random codebook generation: Generate NR N-dim. vectors (codewords) {x 1,..., x NR} lying in the x-sphere of radius NP. Decoding: α P P+σ y MMSE αy By LLN, we have (MMSE coeff.) Nearest Neighbor x αy x 1 = αz + (α 1)x 1 α Nσ + (α 1) NP = N Pσ P+σ 8 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere r p NP N P P + x 1 y Performance analysis: When does an error occur? When another codeword, say, x, falls inside the uncertainty sphere centered at αy. What is that probability? It is the ratio of the volumes of the two spheres! x P {x 1 x } = ( = NPσ /(P+σ ) N NP N ) N/ σ P+σ 9 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere r p NP N P P + x 1 y x By the Union of Events Bound, the total probability of error P {E} NR ( = N ) N/ σ P+σ ( (R+ 1 log 1 1+ P σ which vanishes as N if R < 1 log ( 1 + P σ ). )), Hence, any R < 1 log ( 1 + P σ ) is achievable. 30 / 59 I-Hsiang Wang IT Lecture 6

Practical Relevance of the Gaussian Noise Model In communication engineering, the additive Gaussian noise is the most widely used model for a noisy channel with real (complex) input/output. Reasons: 1 Gaussian is a good model for noise that consists of many small perturbations, due to Central Limit Theorem. Analytically Gaussian is highly tractable. 3 Consider a input-power-constrained channel with independent additive noise. Within the family of noise distributions that have the same noise variances, Gaussian noise is the worst case noise. The last point is important it suggests that for a additive-noise-channel with input power constraint P and noise variance σ, its capacity is lower bounded by the Gaussian channel capacity 1 log ( 1 + P σ ). 31 / 59 I-Hsiang Wang IT Lecture 6

Gaussian Noise is the Worst-Case Noise Proposition 1 Consider a Gaussian r.v. X G N (0, P) and Y = X G + Z, where Z has density f Z (z), variance Var [Z] = σ and Z X G. Then, I ( X G ; Y ) 1 log ( 1 + P σ ). With Proposition 1, we immediately obtain the following theorem: Theorem 3 (Gaussian is the Worst-Case Additive Noise) Consider a CMC f Y X : Y = X + Z, Z X, with input power constraint P and noise variance σ. The additive noise has density. Then, the capacity C is minimized when Z N ( 0, σ ) and C C G 1 log ( 1 + P σ ). pf: C I ( X G ; X G + Z ) 1 log ( 1 + P σ ). 3 / 59 I-Hsiang Wang IT Lecture 6

Proof of Proposition 1 Let Z G N ( 0, σ ), and denote Y G X G + Z G. We aim to prove I ( X G ; Y ) I ( X G ; Y G ). First note that I ( X G ; Y ) = h (Y ) h (Z ) does not change if we shift Z by a constant. Hence, WLOG assume E [Z] = 0. Since both X G and Z are zero-mean, so does Y. Note that Y G N ( 0, P + σ ) and Z G N ( 0, σ ). Hence, h ( Y ) [ ( G = E )] Y G log fy G Y G = 1 log ( π(p + σ ) ) [ (Y + log e (P+σ ) E ) ] G Y G = 1 log ( π(p + σ ) ) + log e (P+σ ) E Y [(Y) ] = E Y [ log f Y G (Y)] 33 / 59 I-Hsiang Wang IT Lecture 6

The key in the above is to realize that Y and Y G has the same variance. Similarly, h ( Z G ) = E Z [ log f Z G (Z)]. Therefore, I ( X G ; Y G ) I ( X G ; Y ) = { h ( Y G ) h (Y ) } { h ( Z G ) h (Z ) } = {E Y [ log f Y G (Y)] E Y [ log f Y (Y)]} {E Z [ log f Z G (Z)] E Z [ log f Z (Z)]} [ ] [ ] = E Y log f Y(Y) f Y G (Y) E Z log f Z(Z) f Z G (Z) [ ] = E Y,Z log f Y(Y)f Z G (Z) ( log f Y G (Y)f Z (Z) E Y,Z [ fy (Y)f Z G (Z) f Y G (Y)f Z (Z) ]). (Jensen s Inequality) [ ] fy (Y)f To finish the proof, we shall prove that E Z G (Z) Y,Z f Y G (Y)f Z (Z) = 1. 34 / 59 I-Hsiang Wang IT Lecture 6

Let us calculate E Y,Z [ fy (Y)f Z G (Z) f Y G (Y)f Z (Z) ] as follows: [ ] fy (Y) f E Z G (Z) Y,Z = f Y,Z (y, z) f Y (y) f Z G (z) dz dy f Y G (Y) f Z (Z) f Y G (y) f Z (z) = f Z (z) f X G (y z) f Y (y) f Z G (z) f Y G (y) f Z (z) dz dy ( Y = XG + Z) = [f X G (y z) f Z G (z)] f Y (y) dz dy f Y G (y) = f YG,Z (y, z) f Y (y) G f Y G (y) dz dy ( Y = XG + Z) ( ) fy (y) = f f Y G (y) Y G,Z (y, z) dz dy G fy (y) = f Y G (y) f YG (y) dy = f Y (y) dy = 1. Hence, the proof is complete. 35 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 36 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 37 / 59 I-Hsiang Wang IT Lecture 6

Motivation We have investigated the capacity of the (discrete-time) memoryless Gaussian channel, an elementary model in digital communications. In wireless communications, however, due to various reasons such as frequency selectivity, inter-symbol interference, etc., a single Gaussian channel may not model the system well. Instead, a parallel Gaussian channel, which consists of several Gaussian channels with a common total power constraint is more relevant. For example, OFDM (Orthogonal Frequency Division Multiplexing) is a widely used technique in LTE and WiFi that mitigate frequency selectivity and inter-symbol interference. Parallel Gaussian channel is the equivalent channel model under OFDM. 38 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Model: X Z1 X1... X1 XL Z 3 39 / 59 YL DEC... XL N 0, Y Y1... Y ENC ZL 1 1 Y1 X w N 0, N 0, L w b YL Input/output alphabet X = Y = RL, the L-dimensional space. ( ( )) Chanel law fy X : Y = X + Z, Z N 0, diag σ1,..., σl X. Note that (Z1,..., ZL ) : N 1 Average input power constraint P: N k=1 x[k] P, where L x[k] = l=1 xl [k]. I-Hsiang Wang IT Lecture 6

Capacity of Invoking Theorem 1, the capacity of the parallel Gaussian channel C = sup I (X ; Y ). X: E[ X ] P The main issue is how to compute it. Let P l E[ X l ]. Observe that I (X ; Y ) = I (X 1,..., X L ; Y 1,..., Y L ) = h (Y 1,..., Y L ) h (Z 1,..., Z L ) = h (Y 1,..., Y L ) L l=1 1 log ( πe σ l (a) L l=1 h (Y l ) L l=1 1 log ( πe σ l ) ) (b) L l=1 1 log ( (a) holds since joint differential entropy sum of marginal ones. (b) is due to h (Y l ) 1 log (πe Var [Y l]) 1 log ( πe ( )) P l + σl. 1 + P l σ l ). 40 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Z1 X1 N 0, 1 N 0, Y1 Z X Y... ZL XL L N 0, YL ( ) log 1 + σpl for any l l=1 [ ] input X with Pl = E Xl, l = 1,..., L. Hence, I (X ; Y ) L 1 Furthermore, to satisfy the power constraint, [ ] [ ] L P E X = E X l l=1 L = l=1 Pl P. Question: Can we achieve this upper bound? Yes, by choosing (X1,..., XL ) :, and Xl N (0, Pl ), that is, X N (0, diag (P1,..., PL )), satisfying (1) 41 / 59 L l=1 Pl P and () Pl 0, l = 1,,..., L. I-Hsiang Wang IT Lecture 6

Computation of Capacity: a Power Allocation Problem Intuition: The optimal scheme is to treat each branch separately, and the l-th branch is allocated with transmit power P l, for l = 1,,..., L. In the l-th branch (sub-channel), the input X l N (0, P l ), and inputs are mutually independent across L sub-channels. Characterization of capacity boils down to the following optimization: Power Allocation Problem C ( P, σ1,..., σl) = max (P 1,...,P L ) L subject to l=1 L l=1 P l P ( 1 log 1 + P l σ l P l 0, l = 1,,..., L ) 4 / 59 I-Hsiang Wang IT Lecture 6

Optimal Power Allocation: Water-Filling The optimal solution (P 1,..., P L ) of the above power allocation problem turns out to be the following: (notation: (x) + max (x, 0)) Power Total Area = P Water-Filling Solution P 1 P P L P l = ( ν σl ) +, l = 1,..., L L ( ) ν satisfies ν σ + 1 l = P l=1 L 1 L Sub-Channel 43 / 59 I-Hsiang Wang IT Lecture 6

Optimal Power Allocation: Water-Filling The optimal solution (P 1,..., P L ) of the above power allocation problem turns out to be the following: (notation: (x) + max (x, 0)) Water-Filling Solution P l = ( ν σl ) +, l = 1,..., L L ( ) ν satisfies ν σ + l = P l=1 Power 1 Total Area = P P P L L 1 L Sub-Channel When the power budget P max l σl (high SNR regime), the optimal allocation is roughly uniform: P l P. L When the power budget P min l σl (low SNR regime), the optimal allocation is roughly choose-the-best: P l P 1 { } l = arg min σl. 44 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Power Power Total Area = P P P1 P 1 1 45 / 59 L 1 PL L L 1 L Sub-Channel Sub-Channel (a) High SNR (b) Low SNR I-Hsiang Wang IT Lecture 6

A Primer on Convex Optimization (1) To show that Water-Filling Solution attains capacity (i.e., optimality in the Power Allocation Problem), let us give a quick overview on convex optimization, Lagrangian function, and Karush-Kuhn-Tucker theorem. Convex Optimization: minimize subject to f (x) g i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p (3) The above minimization problem is convex if The objective function f is convex. The inequality constraints g 1,..., g m are convex. The equality constraints h 1,..., h p are affine, i.e., h i (x) = a T i x + b i. 46 / 59 I-Hsiang Wang IT Lecture 6

A Primer on Convex Optimization () Lagrangian Function: For the minimization problem (3), its Lagrangian function is a weighted sum of objective and constraints: L (x, λ, µ) f (x) + m λ i g i (x) + p µ i h i (x) (4) i=1 Karush-Kuhn-Tucker (KKT) Theorem: For a convex optimization problem with differentiable objective function f and inequality constraints g 1,..., g m, suppose that there exists x in the interior of the domain that is strictly feasible (g i (x) < 0, i = 1,..., m and h i (x) = 0, i = 1,..., p.). Then, a feasible x attains the optimality in (3) iff (λ, µ ) such that λ i 0 and λ i g i (x ) = 0, i = 1,,..., m x L (x, λ, µ) (x,λ,µ)=(x,λ,µ ) = 0 (5) i=1 (5) together with the feasibility of x are called the KKT conditions. 47 / 59 I-Hsiang Wang IT Lecture 6

Optimality of Water-Filling Proposition (Water-Filling) For a given ( σ 1,..., σ L), the following maximization problem maximize subject to L log ( ) P l + σl l=1 L l=1 P l = P P l 0, l = 1,..., L (6) has the solution P l = ( ) +, ν σl l = 1,..., L, where ν satisfies ( ) ν σ + l = P. L l=1 The proof is based on evaluating the KKT conditions. 48 / 59 I-Hsiang Wang IT Lecture 6

pf: First, rewrite (6) into the following equivalent form: minimize subject to L log ( ) P l + σl l=1 P l 0, l = 1,..., L L l=1 P l P 0 (7) It can be easily checked that (7) is a convex optimization problem. Hence, the Lagrangian function L (P 1,..., P L, λ 1,..., λ L, µ) = L log ( ( ) L L ) P l + σl λ l P l + µ P l P. l=1 l=1 l=1 49 / 59 I-Hsiang Wang IT Lecture 6

Proof is complete by finding P 1,..., P L, λ 1,..., λ L 0 and µ such that L l=1 P l = P L P l = log e P l +σl λ l P l = 0, λ l + µ = 0, i = 1,..., L i = 1,..., L 1 If µ < log e : Condition λ σl l = µ log e P l +σl = λ l = 0 = µ = log e P l +σ l 0 can only hold if P l > 0 = P l = log e µ σ l. If µ log e : Condition λ σl l = µ log e P l 0 and Condition λ +σl l P l = 0 imply that P l = 0. ( Hence, P l = max log e µ σ l ),, 0 for l = 1,,..., L. Finally, by renaming ν log e µ, and plugging in Condition L l=1 P l = P, we complete the proof due to the KKT theorem. 50 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 51 / 59 I-Hsiang Wang IT Lecture 6

For the parallel Gaussian channel investigated so far, let us generalize the result to the case where the noises in the L branches are correlated. The idea behind our technique is simple: apply a pre-processor and a post-processor such that the end-to-end system is again a parallel Gaussian channel with independent noise components. 5 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Model: with Colored Noise X Z1 X1... X1 XL Y1 YL Z X w Y ENC DEC... ZL XL 1 3 53 / 59 Y Y1... w b YL Input/output alphabet X = Y = RL, the L-dimensional space. Chanel law fy X : Y = X + Z, Z N (0, KZ ) X. Note that (Z1,..., ZL ) are not mutually independent anymore. N 1 Average input power constraint P: N k=1 x[k] P, where L x[k] = l=1 xl [k]. I-Hsiang Wang IT Lecture 6

Eigenvalue Decomposition of a Covariance Matrix To get to the main idea, we introduce some basic matrix theory. Definition 1 (Positive Semidefinite (PSD) Matrix) A Hermitian matrix A C L L is positive semidefinite (A 0), iff x H Ax 0, x 0 C L. Here ( ) H denotes the transpose of the complex conjugate of a matrix, and a Hermitian matrix A is a square matrix with A H = A. The following important lemma plays a key role in our development. Lemma 1 (Eigenvalue Decomposition of a PSD Matrix) If A 0, then A = QΛQ H, Q is unitary, i.e., QQ H = Q H Q = I, and Λ = diag (λ 1,..., λ L ), {λ i 0 i = 1,..., L} are A s eigenvalues. The j-th column of Q, q j, is the eigenvector of A with respect to λ j. 54 / 59 I-Hsiang Wang IT Lecture 6

Fact 1 A valid covariance matrix is PSD. pf: By definition, a valid covariance matrix K = E [ YY H] for some complex zero-mean r.v. Y. Therefore, K is Hermitian because K H = ( E [ YY H]) H = E [ (YY H ) H ] = E [ YY H] = K. Moreover, it is PSD since for all non-zero x C L, x H Kx = x H E [ YY H] x = E [ x H YY H x ] = E [ Y H x ] 0. Hence, for the covariance matrix K Z, we can always decompose it into where Λ Z = diag ( σ 1,..., σ L). K Z = QΛ Z Q H, 55 / 59 I-Hsiang Wang IT Lecture 6

Pre-Processor Q and Post-Processor Q H Based on the eigenvalue decomposition K Z = QΛ Z Q H, we insert Pre-Processor Q and Post-Processor Q H as follows: Z N (0, K Z ) X Q X Y Q H Y The end-to-end relationship between X and Ỹ is characterized by the following equivalent channel: Ỹ = X + Z, where X Q H X, Ỹ QH Y, Z Q H Z, and Z is zero-mean Gaussian, with covariance matrix Q H K Z Q = Q H QΛ Z Q H Q = Λ Z = diag ( σ 1,..., σ L). 56 / 59 I-Hsiang Wang IT Lecture 6

Equivalent Input Power Constraint P X Z N (0, Z) Y For the above equivalent channel fỹ X, observe that now the noise terms in the L branches are now mutually independent. Furthermore, note that for this channel, the input power is the same as the original channel: x = x H x = x H QQ H x = x H x = x. ( QQ H = I) Hence, we can use the water-filling solution to find the capacity of this channel, denoted by C. 57 / 59 I-Hsiang Wang IT Lecture 6

No Loss in Optimality of the Pre-/Post-Processors C C, since any scheme in fỹ X can be transformed to one in f Y X. Z N (0, K Z ) X Q H X X Y Q Y Q Q H Y On the other hand, from the above figure, we can see that after inserting another pre-processor Q H and post-processor Q, the new channel f Y X is the same as the original channel f Y X. Let C be the capacity of the above channel. Hence, C = C C C = C = C. 58 / 59 I-Hsiang Wang IT Lecture 6

Summary: Capacity of Theorem 4 (Capacity of ) For the L-branch Gaussian parallel channel with average input power constraint P and noise covariance matrix K Z, the channel capacity is L l=1 ( ) 1 log 1 + P l σl, where { σ1,..., σl} are the L eigenvalues of KZ, and the optimal power allocation {P 1,..., P L } is given by the following water-filling solution: P l = ( ν σl ) +, l = 1,..., L ν satisfies L ( ) ν σ + l = P l=1 59 / 59 I-Hsiang Wang IT Lecture 6