Lecture 6 Channel Coding over Continuous Channels

Similar documents
Lecture 5 Channel Coding over Continuous Channels

Lecture 4 Noisy Channel Coding

Lecture 4 Channel Coding

Lecture 2. Capacity of the Gaussian channel

Lecture 4 Capacity of Wireless Channels

Appendix B Information theory from first principles

Lecture 14 February 28

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs

Chapter 9 Fundamental Limits in Information Theory

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

MMSE estimation and lattice encoding/decoding for linear Gaussian channels. Todd P. Coleman /22/02

EE 4TM4: Digital Communications II. Channel Capacity

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 4 Capacity of Wireless Channels

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Lecture 7 Introduction to Statistical Decision Theory

ELEC546 Review of Information Theory

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

Lecture 18: Gaussian Channel

Information Dimension

(Classical) Information Theory III: Noisy channel coding

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Upper Bounds on the Capacity of Binary Intermittent Communication

12.4 Known Channel (Water-Filling Solution)

On Capacity Under Received-Signal Constraints

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Nearest Neighbor Decoding in MIMO Block-Fading Channels With Imperfect CSIR

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Convex Optimization & Lagrange Duality

Revision of Lecture 5

List Decoding: Geometrical Aspects and Performance Bounds

Lecture: Duality.

Capacity of AWGN channels

Capacity of Block Rayleigh Fading Channels Without CSI

5. Duality. Lagrangian

The Poisson Channel with Side Information

ACOMMUNICATION situation where a single transmitter

Optimal Transmit Strategies in MIMO Ricean Channels with MMSE Receiver

Shannon s noisy-channel theorem

Lecture 8: Information Theory and Statistics

Lecture 8: MIMO Architectures (II) Theoretical Foundations of Wireless Communications 1. Overview. Ragnar Thobaben CommTh/EES/KTH

The Method of Types and Its Application to Information Hiding

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Energy State Amplification in an Energy Harvesting Communication System

On the Capacity and Degrees of Freedom Regions of MIMO Interference Channels with Limited Receiver Cooperation

POWER ALLOCATION AND OPTIMAL TX/RX STRUCTURES FOR MIMO SYSTEMS

2318 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE Mai Vu, Student Member, IEEE, and Arogyaswami Paulraj, Fellow, IEEE

ELEC E7210: Communication Theory. Lecture 10: MIMO systems

Principles of Coded Modulation. Georg Böcherer

MULTI-INPUT multi-output (MIMO) channels, usually

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Convex Optimization Boyd & Vandenberghe. 5. Duality

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Generalized Writing on Dirty Paper

EE 4TM4: Digital Communications II Scalar Gaussian Channel

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 5: Antenna Diversity and MIMO Capacity Theoretical Foundations of Wireless Communications 1. Overview. CommTh/EES/KTH

Lecture 15: Thu Feb 28, 2019

On the Duality between Multiple-Access Codes and Computation Codes

Mathematical methods in communication June 16th, Lecture 12

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

LECTURE 13. Last time: Lecture outline

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

5958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010

Entropy, Inference, and Channel Coding

Multiple Antennas in Wireless Communications

Lecture 9 Polar Coding

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Convex Optimization M2

On the Secrecy Capacity of the Z-Interference Channel

Lecture 4. Capacity of Fading Channels

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 8: Channel Capacity, Continuous Random Variables

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

On Compound Channels With Side Information at the Transmitter

Lecture 2: August 31

Shannon s A Mathematical Theory of Communication

Steiner s formula and large deviations theory

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Block 2: Introduction to Information Theory

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Efficient Use of Joint Source-Destination Cooperation in the Gaussian Multiple Access Channel

ECE Information theory Final

Vector Channel Capacity with Quantized Feedback

Limited Feedback in Wireless Communication Systems

Constrained Optimization and Lagrangian Duality

Approximately achieving the feedback interference channel capacity with point-to-point codes

LECTURE 10. Last time: Lecture outline

Error Exponent Region for Gaussian Broadcast Channels

On the Feedback Capacity of Stationary Gaussian Channels

On the Capacity Region of the Gaussian Z-channel

Channels with cost constraints: strong converse and dispersion

Transcription:

Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 9, 015 1 / 59 I-Hsiang Wang IT Lecture 6

We have investigated the measures of information for continuous r.v. s: The amount of uncertainty (entropy) is mostly infinite. Mutual information and KL divergences are well defined. Differential entropy is a useful entity to compute and manage measures of information for continuous r.v. s. Question: How about coding theorems? Is there a general way or framework to extend coding theorems from discrete (memoryless) sources/channels to continuous (memoryless) sources/channels? / 59 I-Hsiang Wang IT Lecture 6

Discrete Memoryless Channel w Channel Encoder x N Channel p Y X y N Channel Decoder bw C (B) = max I (X ; Y ). X: E[b(X)] B? w Channel Encoder x N Channel f Y X y N Channel Decoder bw C (B) = sup I (X ; Y ). X: E[b(X)] B 3 / 59 I-Hsiang Wang IT Lecture 6

Coding Theorems: from Discrete to Continuous (1) Two main techniques for extending the achievability part of coding theorems from the discrete world to the continuous world: 1 Discretization: Discretize the source and channel input/output to create a discrete system, and then make the discretization finer and finer to prove the achievability. New typicality: Extend weak typicality for continuous r.v. and repeat the arguments in a similar way. In particular, replace the entropy terms in the definitions of weakly typical sequences by differential entropy terms. Using discretization to derive the achievability of Gaussian channel capacity follows Gallager[] and El Gamal&Kim[6]. Cover&Thomas[1] and Yeung[5] use weak typicality for continuous r.v. s. Moser[4] uses threshold decoder, similar to weak typicality. 4 / 59 I-Hsiang Wang IT Lecture 6

Coding Theorems: from Discrete to Continuous () In this lecture, we use discretization for the achievability proof. Pros: No need for new tools (eg., typicality) for continuous r.v. s. Extends naturally to multi-terminal settings can focus on discrete memoryless networks. Cons: Technical; not much insight on how to achieve capacity. Hence, we use a geometric argument to provide insights on how to achieve capacity. Disclaimer: We will not be 100% rigorous in deriving the results in this lecture. Instead, you can find rigorous treatment in the references. 5 / 59 I-Hsiang Wang IT Lecture 6

Outline 1 First, we formulate the channel coding problem over continuous memoryless channels (CMC), state the coding theorem, and sketch the converse and achievability proofs. Second, we introduce additive Gaussian noise (AGN) channel, derive the Gaussian channel capacity, and provide insights based on geometric arguments. 3 We then explore extensions, including parallel Gaussian channels and correlated Gaussian channels, and continuous-time bandlimited Gaussian channels. 6 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 7 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 8 / 59 I-Hsiang Wang IT Lecture 6

Continous Memoryless Channel w Channel Encoder x N Channel f Y X y N Channel Decoder bw 1 Input/output alphabet X = Y = R. (CMC): Channel Law: Governed by the conditional density (p.d.f.) f Y X. Memoryless: Y k X k ( X k 1, Y k 1). 3 Average input cost constraint B: 1 N N k=1 b (x k) B, where b : R [0, ) is the (single-letter) cost function. The definitions of error probability, achievable rate, and capacity, are the same as those in channel coding over DMC. 9 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding Theorem Theorem 1 ( Capacity) The capacity of the CMC ( R, f Y X, R ) with input cost constraint B is C = sup I (X ; Y ). (1) X: E[b(X)] B Note: The input distribution of the r.v. X needs not to have a density. In other words, it could also be discrete. How to compute h (Y X ) when X has no density? Recall h (Y X ) = E X [ ] supp f (y X) log f (y X) dy, Y where f (y x) is the conditional density of Y given X. Converse proof: Exactly the same as that in the DMC case. 10 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (1): Discretization w ENC x N f Y X y N DEC bw The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: 11 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (1): Discretization w ENC Q in f Y X Q out DEC bw The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: Q in : (single-letter) discretization that maps X R to X d X d. Q out : (single-letter) discretization that maps Y R to Y d Y d. Note that both X d and Y d are discrete (countable) alphabets. 1 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (1): Discretization w ENC New ENC Equivalent DMC Q in f Y X Q out DEC bw The proof of achievability makes use of discretization, so that one can apply the result in DMC with input cost: Q in : (single-letter) discretization that maps X R to X d X d. Q out : (single-letter) discretization that maps Y R to Y d Y d. Note that both X d and Y d are discrete (countable) alphabets. Idea: With the two discretization blocks Q in and Q out, one can build an equivalent DMC ( X d, p Yd X d, Y d ) as shown above. 13 / 59 I-Hsiang Wang IT Lecture 6

Sketch of the Achievability (): Arguments w Q in New ENC x N d Q out Equivalent DMC p Yd X d y N d DEC bw 1 Random codebook generation: Generate the codebook randomly based on the original (continuous) r.v. X, satisfying E [b (X)] B. Choice of discretization: Choose Q in such that the cost constraint will not be violated after discretization. Specifically, E [b (X d )] B. 3 Achievability in the equivalent DMC: By the achievability part of the channel coding theorem for DMC with input constraint, any rate R < I (X d ; Y d ) is achievable. 4 Achievability in the original CMC: Prove that when the discretization in Q in and Q out gets finer and finer, I (X d ; Y d ) I (X ; Y ). 14 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 15 / 59 I-Hsiang Wang IT Lecture 6

Additive White Gaussian Noise (AWGN) Channel z N w Channel Encoder x N y N Channel Decoder bw 1 Input/output alphabet X = Y = R. AWGN Channel: Conditional p.d.f. f Y X is given by Y = X + Z, Z N ( 0, σ ) X. {Z k } form an i.i.d. (white) Gaussian r.p. with Z k N ( 0, σ ), k. Memoryless: Z k ( W, X k 1, Z k 1). Without feedback: Z N X N. 3 Average input power constraint P: 1 N N k=1 x k P. 16 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding Theorem for Gaussian Channel Theorem () The capacity of the AWGN channel with input power constraint P and noise variance σ is given by C = sup I (X ; Y ) = 1 log ( ) 1 + P σ. () X: E[ X ] P Note: For the AWGN channel, the supremum is actually attainable with Gaussian input X N (0, P), that is, the input has density f X (x) = 1 πp e x P, as shown in the next slide. 17 / 59 I-Hsiang Wang IT Lecture 6

Evaluation of Capacity Let us compute the capacity of AWGN channel () as follows: I (X ; Y ) = h (Y ) h (Y X ) = h (Y ) h (X + Z X ) = h (Y ) h (Z X ) = h (Y ) h (Z ) (since Z X) = h (Y ) 1 log (πe) σ (a) 1 log (πe) ( P + σ ) 1 log (πe) σ = 1 ( log 1 + P ) σ Here (a) is due to the fact that h (Y ) 1 log (πe) Var [Y] and Var [Y] = Var [X] + Var [Z] P + σ, since Var [X] E [ X ] P. Finally, note that the above inequalities hold with equality when X N (0, P). 18 / 59 I-Hsiang Wang IT Lecture 6

Achievability Proof (1): Discretization Here we use a simple quantizer as follows to construct the discretization blocks Q in and Q out : { } m N, let Q m := l m : l = 0, ±1,..., ±m be the set of quantized points. For any r R, quantize r to the closest point [r] m Q m such that [r] m r. Discretization: For two given m, n N, define Channel input discretization: Q in ( ) = [ ] m. Channel output discretization: Q out ( ) = [ ] n In other words, X d = Q m, Y d = Q n, X d = [X] m, and Y d = [X d + Z] n = [ [X] m + Z ] n. 19 / 59 I-Hsiang Wang IT Lecture 6

Achievability Proof (): Equivalent DMC Now we have an equivalent DMC with Input X d = [X] m Output Y d = [ Y (m)] n, where Y(m) [X] m + Z. Note that for any original input r.v. X with E [ X ] P, the discretized [X] m also satisfies the power constraint: E [ [X] m ] E [ X ] P. Hence, by the achievability result of DMC with input cost constraint, any R < I ( [X] m ; [ Y (m)] n ) (evaluated under f X (x) = 1 πp e x P ) is indeed achievable for the equivalent DMC under power constraint P. The only thing left to be shown is that, I ( [X] m ; [ Y (m)] n ) can be made arbitrarily close to I (X ; Y ) = 1 log ( 1 + P σ ) as m, n. 0 / 59 I-Hsiang Wang IT Lecture 6

Achievability Proof (3): Convergence Due to data processing inequality and [X] m Y (m) [ Y (m)], we have n I ( [X] m ; [ Y (m)] n ) I ( [X]m ; Y (m) ) = h ( Y (m) ) h (Z ). Since Var [ Y (m)] P + σ, we have h ( Y (m) ) 1 log ( πe(p + σ ) ), and hence the upper bound I ( [X] m ; [ Y (m)] n ) 1 log ( 1 + P σ ). For the lower bound, we would like to prove lim inf m lim n I ( [X] m ; [ Y (m)] n ) 1 log ( 1 + P σ ). We skip the details here; see Appendix 3A of El Gamal&Kim[6]. 1 / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) y = x + z By LLN, as N, most output y (y N ) will lie inside the N-dimensional sphere of radius N (P + σ ). / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) p N y = x + z By LLN, as N, most output y (y N ) will lie inside the N-dimensional sphere of radius N (P + σ ). Also by LLN, as N, y will lie near the surface of the N-dimensional sphere centered at x with radius Nσ. 3 / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) p N y = x + z By LLN, as N, most output y (y N ) will lie inside the N-dimensional sphere of radius N (P + σ ). Also by LLN, as N, y will lie near the surface of the N-dimensional sphere centered at x with radius Nσ. Vanishing error probability criterion = non-overlapping spheres. Question: How many non-overlapping spheres can be packed into the large sphere? Maximum # of non-overlapping spheres = Maximum # of codewords that can be reliably delivered. 4 / 59 I-Hsiang Wang IT Lecture 6

Geometric Intuition: Sphere Packing R N p N(P + ) p N y = x + z Back-of-envelope calculation: NR N(P+σ ) N Nσ N = R 1 N log ( N(P+σ ) N Nσ N ) = 1 log ( 1 + P σ ) Hence, intuitively any achievable rate R cannot exceed C = 1 ( log 1 + P ) σ. How to achieve it? 5 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere p NP Random codebook generation: Generate NR N-dim. vectors (codewords) {x 1,..., x NR} lying in the x-sphere of radius NP. x 1 x 6 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere p NP Random codebook generation: Generate NR N-dim. vectors (codewords) {x 1,..., x NR} lying in the x-sphere of radius NP. Decoding: α P P+σ (MMSE coeff.) x 1 y MMSE αy Nearest Neighbor x y x 7 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere r p NP N P P + x 1 y x Random codebook generation: Generate NR N-dim. vectors (codewords) {x 1,..., x NR} lying in the x-sphere of radius NP. Decoding: α P P+σ y MMSE αy By LLN, we have (MMSE coeff.) Nearest Neighbor x αy x 1 = αz + (α 1)x 1 α Nσ + (α 1) NP = N Pσ P+σ 8 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere r p NP N P P + x 1 y Performance analysis: When does an error occur? When another codeword, say, x, falls inside the uncertainty sphere centered at αy. What is that probability? It is the ratio of the volumes of the two spheres! x P {x 1 x } = ( = NPσ /(P+σ ) N NP N ) N/ σ P+σ 9 / 59 I-Hsiang Wang IT Lecture 6

Achieving Capacity via Good Packing x-sphere r p NP N P P + x 1 y x By the Union of Events Bound, the total probability of error P {E} NR ( = N ) N/ σ P+σ ( (R+ 1 log 1 1+ P σ which vanishes as N if R < 1 log ( 1 + P σ ). )), Hence, any R < 1 log ( 1 + P σ ) is achievable. 30 / 59 I-Hsiang Wang IT Lecture 6

Practical Relevance of the Gaussian Noise Model In communication engineering, the additive Gaussian noise is the most widely used model for a noisy channel with real (complex) input/output. Reasons: 1 Gaussian is a good model for noise that consists of many small perturbations, due to Central Limit Theorem. Analytically Gaussian is highly tractable. 3 Consider a input-power-constrained channel with independent additive noise. Within the family of noise distributions that have the same noise variances, Gaussian noise is the worst case noise. The last point is important it suggests that for a additive-noise-channel with input power constraint P and noise variance σ, its capacity is lower bounded by the Gaussian channel capacity 1 log ( 1 + P σ ). 31 / 59 I-Hsiang Wang IT Lecture 6

Gaussian Noise is the Worst-Case Noise Proposition 1 Consider a Gaussian r.v. X G N (0, P) and Y = X G + Z, where Z has density f Z (z), variance Var [Z] = σ and Z X G. Then, I ( X G ; Y ) 1 log ( 1 + P σ ). With Proposition 1, we immediately obtain the following theorem: Theorem 3 (Gaussian is the Worst-Case Additive Noise) Consider a CMC f Y X : Y = X + Z, Z X, with input power constraint P and noise variance σ. The additive noise has density. Then, the capacity C is minimized when Z N ( 0, σ ) and C C G 1 log ( 1 + P σ ). pf: C I ( X G ; X G + Z ) 1 log ( 1 + P σ ). 3 / 59 I-Hsiang Wang IT Lecture 6

Proof of Proposition 1 Let Z G N ( 0, σ ), and denote Y G X G + Z G. We aim to prove I ( X G ; Y ) I ( X G ; Y G ). First note that I ( X G ; Y ) = h (Y ) h (Z ) does not change if we shift Z by a constant. Hence, WLOG assume E [Z] = 0. Since both X G and Z are zero-mean, so does Y. Note that Y G N ( 0, P + σ ) and Z G N ( 0, σ ). Hence, h ( Y ) [ ( G = E )] Y G log fy G Y G = 1 log ( π(p + σ ) ) [ (Y + log e (P+σ ) E ) ] G Y G = 1 log ( π(p + σ ) ) + log e (P+σ ) E Y [(Y) ] = E Y [ log f Y G (Y)] 33 / 59 I-Hsiang Wang IT Lecture 6

The key in the above is to realize that Y and Y G has the same variance. Similarly, h ( Z G ) = E Z [ log f Z G (Z)]. Therefore, I ( X G ; Y G ) I ( X G ; Y ) = { h ( Y G ) h (Y ) } { h ( Z G ) h (Z ) } = {E Y [ log f Y G (Y)] E Y [ log f Y (Y)]} {E Z [ log f Z G (Z)] E Z [ log f Z (Z)]} [ ] [ ] = E Y log f Y(Y) f Y G (Y) E Z log f Z(Z) f Z G (Z) [ ] = E Y,Z log f Y(Y)f Z G (Z) ( log f Y G (Y)f Z (Z) E Y,Z [ fy (Y)f Z G (Z) f Y G (Y)f Z (Z) ]). (Jensen s Inequality) [ ] fy (Y)f To finish the proof, we shall prove that E Z G (Z) Y,Z f Y G (Y)f Z (Z) = 1. 34 / 59 I-Hsiang Wang IT Lecture 6

Let us calculate E Y,Z [ fy (Y)f Z G (Z) f Y G (Y)f Z (Z) ] as follows: [ ] fy (Y) f E Z G (Z) Y,Z = f Y,Z (y, z) f Y (y) f Z G (z) dz dy f Y G (Y) f Z (Z) f Y G (y) f Z (z) = f Z (z) f X G (y z) f Y (y) f Z G (z) f Y G (y) f Z (z) dz dy ( Y = XG + Z) = [f X G (y z) f Z G (z)] f Y (y) dz dy f Y G (y) = f YG,Z (y, z) f Y (y) G f Y G (y) dz dy ( Y = XG + Z) ( ) fy (y) = f f Y G (y) Y G,Z (y, z) dz dy G fy (y) = f Y G (y) f YG (y) dy = f Y (y) dy = 1. Hence, the proof is complete. 35 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 36 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 37 / 59 I-Hsiang Wang IT Lecture 6

Motivation We have investigated the capacity of the (discrete-time) memoryless Gaussian channel, an elementary model in digital communications. In wireless communications, however, due to various reasons such as frequency selectivity, inter-symbol interference, etc., a single Gaussian channel may not model the system well. Instead, a parallel Gaussian channel, which consists of several Gaussian channels with a common total power constraint is more relevant. For example, OFDM (Orthogonal Frequency Division Multiplexing) is a widely used technique in LTE and WiFi that mitigate frequency selectivity and inter-symbol interference. Parallel Gaussian channel is the equivalent channel model under OFDM. 38 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Model: X Z1 X1... X1 XL Z 3 39 / 59 YL DEC... XL N 0, Y Y1... Y ENC ZL 1 1 Y1 X w N 0, N 0, L w b YL Input/output alphabet X = Y = RL, the L-dimensional space. ( ( )) Chanel law fy X : Y = X + Z, Z N 0, diag σ1,..., σl X. Note that (Z1,..., ZL ) : N 1 Average input power constraint P: N k=1 x[k] P, where L x[k] = l=1 xl [k]. I-Hsiang Wang IT Lecture 6

Capacity of Invoking Theorem 1, the capacity of the parallel Gaussian channel C = sup I (X ; Y ). X: E[ X ] P The main issue is how to compute it. Let P l E[ X l ]. Observe that I (X ; Y ) = I (X 1,..., X L ; Y 1,..., Y L ) = h (Y 1,..., Y L ) h (Z 1,..., Z L ) = h (Y 1,..., Y L ) L l=1 1 log ( πe σ l (a) L l=1 h (Y l ) L l=1 1 log ( πe σ l ) ) (b) L l=1 1 log ( (a) holds since joint differential entropy sum of marginal ones. (b) is due to h (Y l ) 1 log (πe Var [Y l]) 1 log ( πe ( )) P l + σl. 1 + P l σ l ). 40 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Z1 X1 N 0, 1 N 0, Y1 Z X Y... ZL XL L N 0, YL ( ) log 1 + σpl for any l l=1 [ ] input X with Pl = E Xl, l = 1,..., L. Hence, I (X ; Y ) L 1 Furthermore, to satisfy the power constraint, [ ] [ ] L P E X = E X l l=1 L = l=1 Pl P. Question: Can we achieve this upper bound? Yes, by choosing (X1,..., XL ) :, and Xl N (0, Pl ), that is, X N (0, diag (P1,..., PL )), satisfying (1) 41 / 59 L l=1 Pl P and () Pl 0, l = 1,,..., L. I-Hsiang Wang IT Lecture 6

Computation of Capacity: a Power Allocation Problem Intuition: The optimal scheme is to treat each branch separately, and the l-th branch is allocated with transmit power P l, for l = 1,,..., L. In the l-th branch (sub-channel), the input X l N (0, P l ), and inputs are mutually independent across L sub-channels. Characterization of capacity boils down to the following optimization: Power Allocation Problem C ( P, σ1,..., σl) = max (P 1,...,P L ) L subject to l=1 L l=1 P l P ( 1 log 1 + P l σ l P l 0, l = 1,,..., L ) 4 / 59 I-Hsiang Wang IT Lecture 6

Optimal Power Allocation: Water-Filling The optimal solution (P 1,..., P L ) of the above power allocation problem turns out to be the following: (notation: (x) + max (x, 0)) Power Total Area = P Water-Filling Solution P 1 P P L P l = ( ν σl ) +, l = 1,..., L L ( ) ν satisfies ν σ + 1 l = P l=1 L 1 L Sub-Channel 43 / 59 I-Hsiang Wang IT Lecture 6

Optimal Power Allocation: Water-Filling The optimal solution (P 1,..., P L ) of the above power allocation problem turns out to be the following: (notation: (x) + max (x, 0)) Water-Filling Solution P l = ( ν σl ) +, l = 1,..., L L ( ) ν satisfies ν σ + l = P l=1 Power 1 Total Area = P P P L L 1 L Sub-Channel When the power budget P max l σl (high SNR regime), the optimal allocation is roughly uniform: P l P. L When the power budget P min l σl (low SNR regime), the optimal allocation is roughly choose-the-best: P l P 1 { } l = arg min σl. 44 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Power Power Total Area = P P P1 P 1 1 45 / 59 L 1 PL L L 1 L Sub-Channel Sub-Channel (a) High SNR (b) Low SNR I-Hsiang Wang IT Lecture 6

A Primer on Convex Optimization (1) To show that Water-Filling Solution attains capacity (i.e., optimality in the Power Allocation Problem), let us give a quick overview on convex optimization, Lagrangian function, and Karush-Kuhn-Tucker theorem. Convex Optimization: minimize subject to f (x) g i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p (3) The above minimization problem is convex if The objective function f is convex. The inequality constraints g 1,..., g m are convex. The equality constraints h 1,..., h p are affine, i.e., h i (x) = a T i x + b i. 46 / 59 I-Hsiang Wang IT Lecture 6

A Primer on Convex Optimization () Lagrangian Function: For the minimization problem (3), its Lagrangian function is a weighted sum of objective and constraints: L (x, λ, µ) f (x) + m λ i g i (x) + p µ i h i (x) (4) i=1 Karush-Kuhn-Tucker (KKT) Theorem: For a convex optimization problem with differentiable objective function f and inequality constraints g 1,..., g m, suppose that there exists x in the interior of the domain that is strictly feasible (g i (x) < 0, i = 1,..., m and h i (x) = 0, i = 1,..., p.). Then, a feasible x attains the optimality in (3) iff (λ, µ ) such that λ i 0 and λ i g i (x ) = 0, i = 1,,..., m x L (x, λ, µ) (x,λ,µ)=(x,λ,µ ) = 0 (5) i=1 (5) together with the feasibility of x are called the KKT conditions. 47 / 59 I-Hsiang Wang IT Lecture 6

Optimality of Water-Filling Proposition (Water-Filling) For a given ( σ 1,..., σ L), the following maximization problem maximize subject to L log ( ) P l + σl l=1 L l=1 P l = P P l 0, l = 1,..., L (6) has the solution P l = ( ) +, ν σl l = 1,..., L, where ν satisfies ( ) ν σ + l = P. L l=1 The proof is based on evaluating the KKT conditions. 48 / 59 I-Hsiang Wang IT Lecture 6

pf: First, rewrite (6) into the following equivalent form: minimize subject to L log ( ) P l + σl l=1 P l 0, l = 1,..., L L l=1 P l P 0 (7) It can be easily checked that (7) is a convex optimization problem. Hence, the Lagrangian function L (P 1,..., P L, λ 1,..., λ L, µ) = L log ( ( ) L L ) P l + σl λ l P l + µ P l P. l=1 l=1 l=1 49 / 59 I-Hsiang Wang IT Lecture 6

Proof is complete by finding P 1,..., P L, λ 1,..., λ L 0 and µ such that L l=1 P l = P L P l = log e P l +σl λ l P l = 0, λ l + µ = 0, i = 1,..., L i = 1,..., L 1 If µ < log e : Condition λ σl l = µ log e P l +σl = λ l = 0 = µ = log e P l +σ l 0 can only hold if P l > 0 = P l = log e µ σ l. If µ log e : Condition λ σl l = µ log e P l 0 and Condition λ +σl l P l = 0 imply that P l = 0. ( Hence, P l = max log e µ σ l ),, 0 for l = 1,,..., L. Finally, by renaming ν log e µ, and plugging in Condition L l=1 P l = P, we complete the proof due to the KKT theorem. 50 / 59 I-Hsiang Wang IT Lecture 6

1 Channel Coding over s 51 / 59 I-Hsiang Wang IT Lecture 6

For the parallel Gaussian channel investigated so far, let us generalize the result to the case where the noises in the L branches are correlated. The idea behind our technique is simple: apply a pre-processor and a post-processor such that the end-to-end system is again a parallel Gaussian channel with independent noise components. 5 / 59 I-Hsiang Wang IT Lecture 6

Channel Coding over s Model: with Colored Noise X Z1 X1... X1 XL Y1 YL Z X w Y ENC DEC... ZL XL 1 3 53 / 59 Y Y1... w b YL Input/output alphabet X = Y = RL, the L-dimensional space. Chanel law fy X : Y = X + Z, Z N (0, KZ ) X. Note that (Z1,..., ZL ) are not mutually independent anymore. N 1 Average input power constraint P: N k=1 x[k] P, where L x[k] = l=1 xl [k]. I-Hsiang Wang IT Lecture 6

Eigenvalue Decomposition of a Covariance Matrix To get to the main idea, we introduce some basic matrix theory. Definition 1 (Positive Semidefinite (PSD) Matrix) A Hermitian matrix A C L L is positive semidefinite (A 0), iff x H Ax 0, x 0 C L. Here ( ) H denotes the transpose of the complex conjugate of a matrix, and a Hermitian matrix A is a square matrix with A H = A. The following important lemma plays a key role in our development. Lemma 1 (Eigenvalue Decomposition of a PSD Matrix) If A 0, then A = QΛQ H, Q is unitary, i.e., QQ H = Q H Q = I, and Λ = diag (λ 1,..., λ L ), {λ i 0 i = 1,..., L} are A s eigenvalues. The j-th column of Q, q j, is the eigenvector of A with respect to λ j. 54 / 59 I-Hsiang Wang IT Lecture 6

Fact 1 A valid covariance matrix is PSD. pf: By definition, a valid covariance matrix K = E [ YY H] for some complex zero-mean r.v. Y. Therefore, K is Hermitian because K H = ( E [ YY H]) H = E [ (YY H ) H ] = E [ YY H] = K. Moreover, it is PSD since for all non-zero x C L, x H Kx = x H E [ YY H] x = E [ x H YY H x ] = E [ Y H x ] 0. Hence, for the covariance matrix K Z, we can always decompose it into where Λ Z = diag ( σ 1,..., σ L). K Z = QΛ Z Q H, 55 / 59 I-Hsiang Wang IT Lecture 6

Pre-Processor Q and Post-Processor Q H Based on the eigenvalue decomposition K Z = QΛ Z Q H, we insert Pre-Processor Q and Post-Processor Q H as follows: Z N (0, K Z ) X Q X Y Q H Y The end-to-end relationship between X and Ỹ is characterized by the following equivalent channel: Ỹ = X + Z, where X Q H X, Ỹ QH Y, Z Q H Z, and Z is zero-mean Gaussian, with covariance matrix Q H K Z Q = Q H QΛ Z Q H Q = Λ Z = diag ( σ 1,..., σ L). 56 / 59 I-Hsiang Wang IT Lecture 6

Equivalent Input Power Constraint P X Z N (0, Z) Y For the above equivalent channel fỹ X, observe that now the noise terms in the L branches are now mutually independent. Furthermore, note that for this channel, the input power is the same as the original channel: x = x H x = x H QQ H x = x H x = x. ( QQ H = I) Hence, we can use the water-filling solution to find the capacity of this channel, denoted by C. 57 / 59 I-Hsiang Wang IT Lecture 6

No Loss in Optimality of the Pre-/Post-Processors C C, since any scheme in fỹ X can be transformed to one in f Y X. Z N (0, K Z ) X Q H X X Y Q Y Q Q H Y On the other hand, from the above figure, we can see that after inserting another pre-processor Q H and post-processor Q, the new channel f Y X is the same as the original channel f Y X. Let C be the capacity of the above channel. Hence, C = C C C = C = C. 58 / 59 I-Hsiang Wang IT Lecture 6

Summary: Capacity of Theorem 4 (Capacity of ) For the L-branch Gaussian parallel channel with average input power constraint P and noise covariance matrix K Z, the channel capacity is L l=1 ( ) 1 log 1 + P l σl, where { σ1,..., σl} are the L eigenvalues of KZ, and the optimal power allocation {P 1,..., P L } is given by the following water-filling solution: P l = ( ν σl ) +, l = 1,..., L ν satisfies L ( ) ν σ + l = P l=1 59 / 59 I-Hsiang Wang IT Lecture 6