Lecture 2. Capacity of the Gaussian channel

Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN channel capacity ( Chapter 5. 5.3., Appendix B) Resources (power and bandwidth) of the AWGN channel Linear time-invariant Gaussian channels. Single input multiple output (SIMO) channel 2. Multiple input single output (MISO) channel 3. Frequency-selective channel

Spring, 207 5237S, Wireless Communications II 2.2 Entropy Entropy for a discrete random variable x with alphabet X and prob. mass function p x = P r(x = i), i X H(x) = i X p x (i) log(/p x (i)) (2.) H(x): the average amount of uncertainty associated with the random variable x = the information obtained when observing x 0 H(x) log X H(x) = 0: No uncertainty = deterministic H(x) = log X : all codewords uniformly distributed All logarithms are taken to the base 2 unless specified otherwise.

Spring, 207 5237S, Wireless Communications II 2.3 Example: Binary Entropy Funtion 0.9 0.8 0.7 0.6 H(p) 0.5 0.4 0.3 0.2 0. 0 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p H B (p) = p log p ( p) log( p) (2.2)

Spring, 207 5237S, Wireless Communications II 2.4 Joint and Conditional Entropy The joint entropy H(x, y) of a pair of discrete random variables (x, y) with a joint distribution p x,y is defined as H(x, y) = p x,y (i, j) log(/p x,y (i, j)) (2.3) i X j Y The conditional entropy H(y x) = p x (i)h(y x = i) (2.4) i X = p x (i) p y x (j i) log(/p y x (j i)) (2.5) i X j Y = i X,j Y p x,y (i, j) log(/p y x (j i)) (2.6) The average amount of uncertainty left in y after observing x

Spring, 207 5237S, Wireless Communications II 2.5 Chain Rule The chain rule for entropies H(x, y) = H(x) + H(y x) = H(y) + H(x y) (2.7) Note that H(x y) = H(x), H(y x) = H(y) if x and y are independent, thus H(x, y) = H(x) + H(y x) H(x) + H(y) (2.8) H(y x) = 0 if y can be fully recovered after observing x no uncertainty left in y

Spring, 207 5237S, Wireless Communications II 2.6 Mutual Information Relative entropy between two pmf s p x and q x : i X p x(i) log px(i) q x(i) Mutual information I(x; y): the relative entropy between the joint distribution p x,y and the product distribution p x p y I(x; y) = p x,y (i, j) log p x,y(i, j) (2.9) p x (i)p y (j) i X j Y = H(x) + H(y) H(x, y) (2.0) = H(x) H(x y) = H(y) H(y x) (2.) measure of the amount of (mutual) information that y (or y) contains about x (or y) reduction in uncertainty of y (or x) due to the knowledge of x (or y)

Spring, 207 5237S, Wireless Communications II 2.7 Entropy and Mutual Information H(X,Y) H(X Y ) I(X;Y ) H(Y X ) H(X ) H(Y ) H(x, y) = H(x) + H(y x) H(x, y) = H(y) + H(x y) H(x, y) H(x) + H(y) 0 H(x y) H(x) 0 H(y x) H(y) I(x; y) = H(x) H(x y) I(x; y) = H(y) H(y x) I(x; y) = H(x) + H(y) H(x, y) I(x; y) = I(y; x) I(x; x) = H(x)

Spring, 207 5237S, Wireless Communications II 2.8 Channel Capacity Discrete memoryless channel (DMC): input x[m] X and output y[m] Y, transition probability p(y x) Convey one of M = I equally likely messages by mapping it to its N-length codeword in I = {x,..., x M } Input sequence: N-dimensional random vector x = (x[],..., x[n]) Message i {0,,..., C } Encoder x i = (x i [],..., x i [N]) Channel p(y x) y = ( y[],..., y[n]) Decoder ^i What is the maximum achievable bit rate R R = log M N such that the average probability of error tends to 0 as N? (2.2) P e = P r(i î) (2.3)

Spring, 207 5237S, Wireless Communications II 2.9 Channel Capacity Entropy H(x) = log M = NR, H(x y) 0 for reliable communications (P e 0) I(x; y) = H(x) H(x y) R I(x; y) (2.4) N Upper bound: Note that max I N N I(x; y) I(x; y) (2.5) m= I(x[m]; y[m]) (2.6) Equality is attained if the inputs are made independent over time N max I(x[m]; y[m]) = max I(x; y) (2.7) N p x[m] p x m= N-dimensional combinatorial problem is reduced to optimization problem over input distributions on single symbols

Spring, 207 5237S, Wireless Communications II 2.0 Channel Capacity Is there a code that can provide rate close to (2.7) such that P e 0? Shannon: such codes exist if N is chosen large enough, see the detailed proofs in Cover&Thomas, Elements of Inf. Theory, Chapter 7 Channel capacity of a discrete memoryless channel is C = max p x I(x; y) (2.8) where the maximum is taken over all input distributions p x. I(x; y) is a concave function of p x for fixed p y x convex optimization problem (Theorem 2.7.4 in Cover&Thomas)

Spring, 207 5237S, Wireless Communications II 2. Example: Binary Symmetric Channel X = Y = {0, }, p(0 ) = p( 0) = p, p( ) = p(0 0) = p 0 p 0 I(x; y) = H(y) H(y x) = H(y) i X p x (i)h(y x = i) p p = H(y) i X p x (i)h B (p) p = H(y) H B (p) The capacity is achieved when the input distribution p x is uniform C = max p x I(x; y) = H B (p) (2.9)

Spring, 207 5237S, Wireless Communications II 2.2 Differential Entropy Entropy of continuous random variable Continuous RV x with pdf f x h(x) = f x (u) log du (2.20) f x (u) Similarly, mutual information between x and y with joint pdf f x,y I(x; y) = f x,y (u, v) log f x,y(u, v) dudv (2.2) f x (u)f y (v) The properties of I(x; y) are the same as in the discrete case Example: Normal distribution, f(x) = 2πσ 2 e x2 /2σ 2 2 2 Example 8..2 in Cover&Thomas h(x) = 2 log 2πeσ2 (2.22)

Spring, 207 5237S, Wireless Communications II 2.3 The Gaussian Channel Impose an average power constraint for any codeword x n N N x 2 n[m] P, n I (2.23) m= The capacity of continuous-valued channel with power constraint P can be shown to be C = max I(x; y) (2.24) f x:e[x 2 ] P Proof consists of three steps. discretise the continuous valued input and output of the channel 2. approximate it by discrete memoryless channels with increasing alphabet sizes 3. take limits appropriately

Spring, 207 5237S, Wireless Communications II 2.4 ω i encoder α x m w is independent of x Gaussian Channel w m decoder y m β X R ˆω î Now, h(w) = 2 log 2πeσ2, and E[y 2 ] = P + σ 2 I(x; y) = h(y) h(y x) Also, h(y) is maximised by choosing x from N (0, P ) C = = h(y) h(x + w x) = h(y) h(w x) = h(y) h(w) max I(x; y) = f x:e[x 2 ] P 2 log 2πe(P + σ2 ) log 2πeσ2 2 = 2 log( + P σ 2 ) (2.25) Complex baseband AWGN channel: C = log( + P σ 2 ) bits per complex dimension!

Spring, 207 5237S, Wireless Communications II 2.5 Nσ 2 Sphere Packing Interpretation N(P + σ 2 ) NP Assume N N-dim RX vector y = x + w lie within a radius r y = N(P + σ 2 ) /N N m= w[m]2 σ 2 Thus, y lies near the noise sphere of radius r w = Nσ 2 around the transmitted codeword Maximum number of codewords is the ratio between the two volumes, V y (r y ) and V w (r w ) The volume of an N-dimensional sphere of radius r is proportional to r N, thus the max number of bits is ( N(P + σ N log 2 ) N ) Nσ 2 N = 2 log( + P σ 2 ) (2.26)

Spring, 207 5237S, Wireless Communications II 2.6 Power and Bandwidth Constrained Capacity Consider a continuous-time AWGN channel with BW W [Hz], power constraint P [Watts] and Gaussian noise with psd N 0 /2 [Watts/Hz] Discrete-time complex baseband signal: where w[m] CN (0, N 0 ) y[m] = x[m] + w[m] (2.27) Independent noise in both I and Q branches 2 uses of a real AWGN channel C = 2 2 log( + P ) bits per complex dimension (2.28) N 0 W W complex samples per second: C(P, W ) = W log( + P ) bits/s (2.29) N 0 W

Spring, 207 5237S, Wireless Communications II 2.7 Power and Bandwidth Constrained Capacity Maximum achievable spectral efficiency: C(γ) = log( + γ), where the SNR γ = P N 0 W Low SNR region: C(γ) γ log 2 e linear as a function of γ High SNR region: C(γ) log 2 γ logarithmic as a function of γ 7 6 5 4 log ( + SNR) 3 2 0 0 20 40 60 80 00 SNR

Spring, 207 5237S, Wireless Communications II 2.8 Power and Bandwidth Constrained Capacity P N 0 log 2 e.6.4.2 Power limited region C(W ) (Mbps) 0.8 0.6 Capacity Limit for W 0.4 Bandwidth limited region 0.2 P/N 0 = 0 6 0 0 5 0 5 20 Bandwidth W (MHz) 25 30

Spring, 207 5237S, Wireless Communications II 2.9 Linear Time-invariant Gaussian Channels Examples of channels closely related to the simple AWGN channel Single-input multiple-output (SIMO) channel Multiple-input single-output (MISO) channel Frequency-selective channel parallel Gaussian channel Time-invariant, optimal code constructed directly from AWGN optimal codes, capacity easy to compute Amplitude (linear scale) 0.00 0.0008 0.0006 0.0004 0.0002 0 0.0002 0.0004 0.0006 0.0008 0.00 0 50 00 50 200 250 300 350 400 450 500 550 Time (ns) (c) Power specturm (db) 0 0 20 30 40 50 (d) 40 MHz 60 0.65 0.66 0.67 0.68 0.69 0.7 0.7 0.72 0.73 0.74 0.75 0.76 Frequency (GHz)

Spring, 207 5237S, Wireless Communications II 2.20 Single-input Multiple-output (SIMO) Channel SIMO channel with one TX antenna and L RX antennas y l [m] = h l x[m] + w l [m], l =,..., L (2.30) h l is the fixed complex channel between TX and lth RX antenna and w l [m] CN (0, N 0 ) i.i.d. noise across antennas Detection of x[m] from y[m] = [y [m],..., y L [m]] T ˆx[m] = f H y[m] = f H hx[m] + f H w[m] (2.3) h[m] = [h [m],..., h L [m]] T and w[m] = [w [m],..., w L [m]] T. Optimal f = h: Maximum ratio combining (MRC), or matched filtering (MF) E[ h H hx[m] 2] SIMO capacity with γ = E[ h H w[m] 2] = P h 2 N 0 C = log( + P h 2 N 0 ) bits/s/hz (2.32)

Spring, 207 5237S, Wireless Communications II 2.2 Multiple-input Single-output (MISO) Channel MISO channel with one RX antenna and L TX antennas y[m] = h H x[m] + w[m] (2.33) h = [h,..., h L ] T, and h l is the fixed complex channel between lth TX antenna and the RX antenna. Reciprocal to SIMO channel optimal TX strategy is to align x with h using beamformer f, f 2 = x[m] = fx[m] = h x[m] (2.34) h [ hh MISO capacity with γ = E hx[m] 2] [ h w[m] ] /E 2 = P h 2 N 0 C = log( + P h 2 N 0 ) bits/s/hz (2.35) P is total power constraint across L antennas Requires CSI at the transmitter!

Spring, 207 5237S, Wireless Communications II 2.22 Frequency-selective Channel L-tap frequency selective AWGN channel L y[m] = h l x[m l] + w[m] (2.36) l=0 OFDM converts (2.36) to N C parallel (sub-)channels where each h n is an AWGN channel ỹ n = h n d n + w n, n =,..., N C (2.37) Given the power allocation p n n, the maximum achievable rate per OFDM symbol is N C n= log( + p n h n 2 N 0 ) bits/ofdm symbol (2.38)

Spring, 207 5237S, Wireless Communications II 2.23 Frequency Selective Channel Optimal power allocation Power allocation to maximise (2.38) subject to the power constraint n E [ d n 2] P N C Optimal power allocation is the solution to max p,...,p NC s. t. N C n= N C N C n= log( + p n h n 2 N 0 ) p n = P, p n 0, n (2.39) (2.40) where the variables are p,..., p NC Concave objective & linear constraints Convex optimisation problem The optimal power allocation can be explicitly found

Spring, 207 5237S, Wireless Communications II 2.24 p n + N0 h n 2 Lagrangian L(ν, λ,..., λ NC, p,..., p NC ) n= Waterfilling N C = log( + p n h n 2 ) + ν N 0 ( NC ) N C p n N C P λ n p n (2.4) n= n= where ν and λ,..., λ NC are Lagrange multipliers Karush-Kuhn-Tucker (KKT) conditions: p n 0 n p n 0 n N C p n = P N C n= λ n 0 n λ np n = 0 n ν + λ n = 0 n ( ν N C p n = P N C n= p n + N0 h n 2 p n + N0 h n 2 ) p n = 0 n ν n (2.42) (2.43) (2.44) (2.45)

Spring, 207 5237S, Wireless Communications II 2.25 From (2.42) (2.45) ( ) p n = ν N + 0, h n 2 Waterfilling Optimal ν can be found by bisection, for example ( ) N C N C ν n= N + 0 = P (2.46) h n 2 N 0 h ( n ) 2 P* = 0 ν P * 2 P * 3 Subcarrier n