Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Similar documents
ECE 4400:693 - Information Theory

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 2. Capacity of the Gaussian channel

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Lecture 5 Channel Coding over Continuous Channels

ELEC546 Review of Information Theory

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 8: Channel Capacity, Continuous Random Variables

Appendix B Information theory from first principles

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Lecture 14 February 28

x log x, which is strictly convex, and use Jensen s Inequality:

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 17: Differential Entropy

Hands-On Learning Theory Fall 2016, Lecture 3

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 18: Gaussian Channel

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Lecture 6 Channel Coding over Continuous Channels

The Method of Types and Its Application to Information Hiding

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Chapter 9 Fundamental Limits in Information Theory

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs

EE 4TM4: Digital Communications II. Channel Capacity

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Capacity of AWGN channels

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Chapter 4: Continuous channel and its capacity

Information Theory Primer:

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

ECE Homework Set 3

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Communication Theory II

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

LECTURE 3. Last time:

Principles of Communications

Lecture 4 Capacity of Wireless Channels

Lecture 4 Noisy Channel Coding

ECE Information theory Final (Fall 2008)

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Quiz 2 Date: Monday, November 21, 2016

5 Mutual Information and Channel Capacity

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Revision of Lecture 5

On the Limits of Communication with Low-Precision Analog-to-Digital Conversion at the Receiver

Lecture 2: August 31

Channel Coding 1. Sportturm (SpT), Room: C3165

A Probability Review

Principles of Coded Modulation. Georg Böcherer

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Lecture 15: Thu Feb 28, 2019

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Chapter I: Fundamental Information Theory

ECE Information theory Final

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

We will briefly look at the definition of a probability space, probability measures, conditional probability and independence of probability events.

The binary entropy function

Shannon s noisy-channel theorem

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

LECTURE 13. Last time: Lecture outline

Physical Layer and Coding

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

EE 4TM4: Digital Communications II Scalar Gaussian Channel

Block 2: Introduction to Information Theory

Lecture 5 - Information theory

STA 256: Statistics and Probability I

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Constellation Shaping for Communication Channels with Quantized Outputs

Lecture 3: Channel Capacity

Formulas for probability theory and linear models SF2941

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS


Machine Learning Lecture Notes

ECE 673-Random signal analysis I Final

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EECS 750. Hypothesis Testing with Communication Constraints

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y)

Lecture 22: Final Review

Computing and Communications 2. Information Theory -Entropy

Introduction to Machine Learning

LECTURE 10. Last time: Lecture outline

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

CS 630 Basic Probability and Information Theory. Tim Campbell

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Transcription:

Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157

Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x n )= p X n(x) log p X n(x)dx = E [log p X n(x n )] h(x n ) is a concave function of the pdf p X n(x), but it is not necessarily nonnegative. Sometimes we write h(p X ) when we wish to stress that h( ) is a function of the pdf p X. Translation: h(x n + a) =h(x n ). Scaling: h(ax n )=h(x n ) + log A. Copyright G. Caire (Sample Lectures) 158

Differential entropy (2) Example: Uniform distribution: p X (x) = 1 1{x 2 [0,a]} a Then, we have Z a 1 h(x) = log adx = log a 0 a... more in general, for a uniform distribution in R n over a compact domain D with volume Vol(D) we have h(x n ) = log Vol(D) Copyright G. Caire (Sample Lectures) 159

Gaussian distribution The standard normal distribution X N (0, 1) has density: p X (x) = 1 x 2 p exp 2 2 2 2 The real multi-variate Gaussian distribution X n N (µ, ) has density: p X n(x) = 1 1 exp (2 ) n/2 1/2 2 (x µ)t 1 (x µ) Copyright G. Caire (Sample Lectures) 160

Real Gaussian random vectors We notice that, for Gaussian densities, log p X n(x) is a quadratic function of x. E.g., for the real case we have log p X n(x) = n 2 log(2 ) 1 2 log 1 2 (x µ)t 1 (x µ) log e Since h(x n ) is invariant by translations, it must be independent of µ. Hence, there is no loss of generality in assuming µ = 0. Taking expectation and changing sign, noticing that x T 1 x = tr(xx T 1 ) we have h(x n ) = n 2 log(2 )+1 2 log + 1 2 tr(e[xxt ] 1 ) log e = 1 2 log((2 e)n ) Copyright G. Caire (Sample Lectures) 161

Complex Gaussian random vectors The complex circularly-symmetric Gaussian distribution X n CN(µ, ), where X n = XR n + jxn I, where Xn R N (Re{µ}, 1 2 ), Xn I N (Im{µ}, 1 2 ) and XR n,xn I are statistically independent, has density: p X n(x) = 1 n exp (x µ)h 1 (x µ) Repeating the arguments of before, for the complex circularly symmetric case we have: h(x n ) = log(( e) n ) Copyright G. Caire (Sample Lectures) 162

Conditional differential entropy Definition 19. For two jointly distributed continuous random vectors X n,y m over R, respectively, with joint pdf p X n,y m (x, y), the conditional differential entropy of X n given Y m is: h(x n Y m ) = Z Z p X n,y m (x, y) log p X n Y m(x y)dxdy = E log p X n Y m(xn Y m ) Copyright G. Caire (Sample Lectures) 163

Divergence and mutual information Definition 20. Let p X and q X be pdfs defined on R. The divergence (Kullback- Leibler distance) between p X and q X is defined as D(p X kq X )= Z p X (x) log p X(x) q X (x) dx Notice that D(p X kq X ) < 1 if supp(p X ) supp(q X ). Definition 21. The mutual information of X, Y p X,Y is defined by I(X; Y )= Z p X,Y (x, y) log p X,Y (x, y) p X (x)p Y (y) dxdy Copyright G. Caire (Sample Lectures) 164

Relations (similar to the discrete case) We have the following expressions: apple I(X; Y ) = E log p X,Y (X, Y ) p X (X)p Y (Y ) apple = E log p X Y (X Y ) = h(x) h(x Y ) p X (X) apple = E log p Y X(Y X) = h(y ) h(y X) p Y (Y ) = D p Y X kp Y p X = D p Y X kq Y p X D(p Y kq Y ) where the last line holds for any pdf q Y for which the divergences are finite. Copyright G. Caire (Sample Lectures) 165

Properties (1) Non-negativity and convexity of D(p X kq X ) hold verbatim as in the discrete case. I(X; Y ) entropy). 0, therefore h(y X) apple h(y ) (Conditioning reduces differential The chain rule for entropy holds for differential entropy as well: h(x n )= nx h(x i X i 1 ) i=1 Independence upper bound h(x n ) apple nx h(x i ) i=1 Copyright G. Caire (Sample Lectures) 166

Properties (2) As an application of the independence bound for Gaussian differential entropies, we get Hadamard inequality: 1 2 log ((2 e)n ) apple nx i=1 1 2 log (2 e i,i) ) apple ny i=1 i,i The chain rule for mutual information holds as well: I(X; Y n )= nx I(X; Y i Y i 1 ) i=1 where the conditional mutual information is I(X; Y Z) =h(y Z) h(y X, Z). Copyright G. Caire (Sample Lectures) 167

Extremal of differential entropy Theorem 15. Differential entropy maximizer: Let X n be a random vector with mean zero and covariance matrix E[X n (X n ) T ]=Cov(X n )=. Then, h(x n ) apple 1 2 log ((2 e)n ) Let X n be a complex random vector with mean zero and covariance matrix E[X n (X n ) H ]=Cov(X n )=. Then, h(x n ) apple log (( e) n ) Copyright G. Caire (Sample Lectures) 168

Proof: Let g(x) denote the Gaussian pdf with mean zero and covariance matrix. Then 0 apple D(p X nkg) Z = p X n(x) log p X n (x) dx g(x) Z = h(x n ) p X n(x) log g(x) dx Z = h(x n ) g(x) log g(x) dx = h(x n )+h(g) = h(x n )+ 1 2 log ((2 e)n ) Copyright G. Caire (Sample Lectures) 169

Some consequences Theorem 16. Differential entropy and MMSE estimation: Let X be a continuous random variable and X b an estimator of X. Then h E X X b 2i 1 2 e e2h(x) with equality if and only if X is Gaussian and b X = E[X]. Proof: We have: E h X X b 2i min E h X X b 2i X b = E X E[X] 2 = Var(X) 1 2 e e2h(x) Copyright G. Caire (Sample Lectures) 170

where we have used the fact that the Minimum Mean-Square Error (MMSE) estimator for X is its mean. Corollary 8. Let X and Y be jointly distributed continuous random variables. For any estimator function bx(y) of X from Y, we have (Proof: HOMEWORK) E X bx(y ) 2 1 2 e e2h(x Y ) Copyright G. Caire (Sample Lectures) 171

Corollary 9. Let X n and Y m be jointly distributed continuous random vectors, and let bx(y m )=E[X n Y m ], E n = X n bx(y m ) denote the MMSE estimator of X n given Y m and the corresponding estimation error sequence. Let e =Cov(X n bx(y m )) denote the covariance matrix of the MMSE error. Then, h(x n Y m ) apple 1 2 log ((2 e)n e ) with equality iff X n and Y m are jointly Gaussian. (Proof: HOMEWORK) Copyright G. Caire (Sample Lectures) 172

MMSE estimation for jointly Gaussian RVs When X n,y m are jointly zero-mean Gaussian, then (sequences are considered as column vectors here), the MMSE estimator is given by: bx(y m ) = xy y 1 Y m e = x xy y 1 yx where x =Cov(X n ), y =Cov(Y m ), xy = E[X n (Y m ) T ] and yx = T xy. For two one-variate jointly Gaussian zero-mean random variables X and Y, we have! h(x Y )= 1 2 2 2 2 log (2 e) x y xy 2 y where x, 2 2 y are the variances and xy = E[XY ]. Copyright G. Caire (Sample Lectures) 173

From discrete to continuous Let X R be the range of X (you can generalized this to random vectors in R n, of course). A partition P of X is a finite collection of disjoint sets P i such that S i P = X. We denote by [X] P the quantization of X by the partition P, such that Z P([X] P = i) =P(X 2 P i )= p X (x)dx P i For two random variables X, Y with joint pdf p X,Y, let [X] P and [Y ] Q denote their discretized versions and define the DMC transition probability P [Y ]Q [X] P (s r) =P(Y 2 Q s X 2 P r )= RP r R Q s p X,Y (x, y)dxdy R P r p X (x)dx Copyright G. Caire (Sample Lectures) 174

Continuous memoryless channels As an application of the data processing inequality we have: I(X; Y )=sup P,Q I ([X] P ;[Y ] Q ) The proof of the channel coding theorem for continuous-input, continuousoutput, discrete-time channels, with transition density p Y X (y x), follows by defining a suitable sequence of discretized inputs (input partitions) and discretized output (output partitions), and taking the sup over all possible finite partitions. This step is standard, and applies to any well-behaved channel with wellbehaved transition probability density. Hence, from now on, we will just **use** the channel coding theorem without further questions. Copyright G. Caire (Sample Lectures) 175

Gaussian channels Z X g Y Discrete-time, continuous-alphabet channel X = Y = R, Z N (0,N 0 /2). Input-output relation: Y = x + Z. Average transmit power constraint: 1 n nx x i (m) 2 apple E s, 8 m =1,...,2 nr i=1 Codewords are points in R n contained in a sphere of radius p ne s. Copyright G. Caire (Sample Lectures) 176

Capacity of the Gaussian channel Theorem 17. The capacity of the AWGN channel with received Signal to Noise Ratio SNR =2E s /N 0 is given by C(SNR) = 1 2 log(1 + SNR) Proof: From the channel capacity formula with input cost, we have: C(E s )= sup I(X; Y ) p X :E[ X 2 ]applee s Copyright G. Caire (Sample Lectures) 177

Then: C(E s ) = sup p X :E[ X 2 ]applee s h(y ) h(y X) = sup p X :E[ X 2 ]applee s h(y ) h(x + Z X) = sup p X :E[ X 2 ]applee s h(y ) h(z) = sup p X :E[ X 2 ]applee s h(x + Z) = 1 2 log(2 e(n 0/2+E s )) = 1 2 log(1 + 2E s/n 0 ) 1 2 log( en 0) 1 2 log( en 0) where the maximum is achieved by letting X N (0,E s ). Copyright G. Caire (Sample Lectures) 178

For the converse, we consider a sequence of codes {C n } of rate R, with P e (n)! 0 as n! 1 and the relaxed (average) input cost constraint 2 nr P 2 nr m=1 1 P n n i=1 x i(m) 2 apple E s. Using Fano and data processing inequalities, we have nr = H(M) apple I(X n ; Y n )+n n = h(y n ) h(z n )+n n = h(y n n ) 2 log( en 0)+n n nx 1 apple 2 log (2 e(n n 0/2 + Var(X i ))) 2 log( en 0)+n n i=1 apple n 2 log 2 e(n 0/2+ 1 n apple n 2 log (2 e(n 0/2+E s ))! nx Var(X i )) i=1 n 2 log( en 0)+n n n 2 log( en 0)+n n Copyright G. Caire (Sample Lectures) 179

Geometric interpretation: sphere packing The volume of a sphere in R n of radius r is given by V n r n, where V n is the volume of the unit sphere (known formula, irrelevant in this context). The received random vector Y n = X n +Z n, with high probability (law of large numbers) is inside a sphere of radius p n(n 0 /2+E s ). The noise vector Z n, with high probability, is inside a sphere of radius p nn0 /2. For large n, it is possible to pack noise spheres into the typical output sphere, with negligible empty space in between. number of spheres = V nn n 2 (N 0 /2+E s ) n 2 V n n n 2 (N 0 /2) n 2 =(1+2E s /N 0 ) n 2 Copyright G. Caire (Sample Lectures) 180

The resulting rate is: R = 1 n log (number of spheres) =1 2 log (1 + 2E s/n 0 ) Copyright G. Caire (Sample Lectures) 181

Continuous-time baseband channels N 0 2 W W f An informal argument: consider a base-band waveform channel with singlesided bandwidth W Hz. Shannon s sampling theorem: any (infinite duration) base-band signal signal can be represented uniquely from its samples, taken at the so-called Nyquist rate 2W. Copyright G. Caire (Sample Lectures) 182

Noise power Signal power N = N 0 2 2W = N 0W P x = E s 2W Signal to Noise Ratio (SNR): SNR = P x N = 2E s N 0 Capacity in bit/s: multiply the capacity in bit/symbol times the number of symbols/s: C(SNR) =2W 1 2 log 1+ 2E s = W log(1 + SNR) N 0 Copyright G. Caire (Sample Lectures) 183

Continuous-time passband channels Consider a passband waveform channel with carrier frequency modulation f 0 W and single-sided bandwidth W Hz centered around f 0. The complex baseband equivalent system has single sided bandwidth W/2, complex signal X and complex circularly symmetric noise Z CN(0,N 0 ). Noise power N = N 0 W Signal power (optimal input X CN(0,E s )). P x = E s W Copyright G. Caire (Sample Lectures) 184

Signal to Noise Ratio (SNR): SNR = P x N = E s N 0 Repeating the capacity calculation for complex circularly symmetric input and output, we find C(SNR) = log(1 + SNR) Capacity in bit/s: multiply the capacity in bit/symbol times the number of symbols/s: C(SNR) =W log 1+ E s = W log(1 + SNR) N 0 Copyright G. Caire (Sample Lectures) 185

Parallel Gaussian channels: power allocation Consider the parallel complex circularly symmetric Gaussian channel defined by Y = Gx + Z where x, Y, Z 2 C K, G = diag(g 1,...,G K ) is a diagonal matrix of fixed coefficients, and Z CN(0,N 0 I K ). Notice that the input and output alphabets are the K-dimensional vector space C K, and codewords are sequences of vectors x(m) = (x 1 (m),...,x n (m)). Input constraint: codewords x(m), must satisfy 1 n nx KX x k,i (m) 2 apple E s i=1 k=1 Copyright G. Caire (Sample Lectures) 186

The capacity is given by C(E s ) = max p X :E[kXk 2 ]applee s I(X; Y) = max p X :E[kXk 2 ]applee s h(y) h(z) = max p X :E[kXk 2 ]applee s h(y) K log(( e)n 0 ) = max p X :E[kXk 2 ]applee s = max P Kk=1 E s,k applee s KX log ( e)(n 0 + G k 2 E s,k ) K log(( e)n 0 ) k=1 KX log k=1 1+ G k 2 E s,k N 0 Copyright G. Caire (Sample Lectures) 187

Power allocation: convex optimization We must maximize the mutual information with respect to the power allocation over the parallel channels: maximize subject to KX log k=1 1+ G k 2 E s,k N 0 KX E s,k apple E s k=1 E s,k 0, 8 k Copyright G. Caire (Sample Lectures) 188

The Lagrangian function is L({E k,s }, )= KX log k=1 1+ G k 2 E s,k N 0! KX E s,k E s k=1 We impose the KKT conditions @L @E s,k = G k 2 N 0 1+ G k 2 E s,k N 0 apple 0 with equality for all k such that E? s,k > 0. Solving for E s,k we find E? s,k = 1 N 0 G k 2 Copyright G. Caire (Sample Lectures) 189

We let 1/ =, and argue that the KKT conditions are satisfied if we let apple E s,k? N 0 = G k 2 + In fact, if E? s,k =0and N 0 then E? G k 2 s,k @L @E < 0. s,k > 0 and @L @E s,k =0. Otherwise, if < N 0 G k 2 then This optimal power allocation is called Waterfilling. Intuitively, we spend more power over the better channels (the ones with higher gain). Replacing the water filling solution into the mutual information expression, we obtain KX apple Gk 2 C(E s )= log k=1 N 0 + Copyright G. Caire (Sample Lectures) 190

End of Lecture 6 Copyright G. Caire (Sample Lectures) 191