EE 5407 Part II: Spatial Based Wireless Communications

EE 5407 Part II: Spatial Based Wireless Communications Instructor: Prof. Rui Zhang E-mail: rzhang@i2r.a-star.edu.sg Website: http://www.ece.nus.edu.sg/stfpage/elezhang/ Lecture IV: MIMO Systems March 21, 2011 1

Introduction to MIMO Systems So far, channel fading is treated as a disadvantageous factor for reliable transmission. From now on, we will look at how to make use of channel fading, rather than compensating for it. Consider a MIMO system with t > 1 transmit antennas and r > 1 receive antennas: there are in total t r physical links (very likely over different channels due to independent fading) Intuitively, if each transmit antenna sends one (possibly independent) data stream, the total transmission rate can be increased dramatically over the SISO system: a technique so-called spatial multiplexing. However, each receive antenna receives the combinations of transmitted signals from all transmit antennas. Also, each modulated symbol can be 2

transmitted over all or part of transmit antennas. Thus there may exist cross-talk among different data streams at the receiver. One major challenge for MIMO spatial multiplexing is the transceiver design, whereby the transmitted symbols from multiple data streams can be recovered at the receiver with reasonable complexity and yet reasonably good performance. We will see that whether the CSI is known at the transmitter can lead to very different transceiver designs for MIMO spatial multiplexing. 3

Outline Review of capacity results for SISO AWGN (additive white Gaussian noise) and fading channels MIMO AWGN channel CSIT-known case: capacity; transceiver design (eigenmode transmission) CSIT-unknown case: capacity; transceiver design (horizontal encoding, linear vs. nonlinear receivers) MIMO fading channel Ergodic capacity with or without CSIT Outage capacity with or without CSIT 4

Channel Capacity Analysis Channel capacity describes the maximum information rate that can be reliably transmitted over a channel subject to a given transmit power constraint. The capacity for AWGN channels was derived by Claude Shannon in 1948. The capacity of SISO fading channels has been studied by Goldsmith and Varaiya a, Caire and Shamai b, et al. a A. J. Goldsmith and P. P. Varaiya, Capacity of fading channels with channel side information, IEEE Transactions on Information Theory, vol.43, No.6, pp.1986-1992, November 1997. b G. Caire and S. Shamai (Shitz), On the capacity of some channels with channel state information, IEEE Transactions on Information Theory, vol.45, No.6, pp.2007-2019, September 1999. 5

The capacity of MIMO fading channels has been studied by Telatar c, Foshini and Gans d, et al. c I. E. Telatar, Capacity of multi-antenna Gaussian channels, European Transactions on Telecommunications, vol.10, No.6, pp.585-595, November 1999. d G. J. Foshini and M. J. Gans, On limits of wireless communications in a fading environment when using multiple antennas, Wireless Personal Communications, vol.6, No.3, pp.311-335, March 1998. 6

Capacity of SISO AWGN Channel Consider the following SISO AWGN channel: y(n) = hx(n) + z(n), n = 1,...,N (1) h C denotes the channel, which is constant and assumed known at the receiver E[ x(n) 2 ] P, where P is the transmit power constraint z(n) CN(0, σz), 2 is independent over n (AWGN) Channel capacity measures the maximum information rate that can be reliably transmitted over the channel with any arbitrarily small probability of decoding error as the code length N goes to infinity. Claude Shannon proved that the capacity of the SISO AWGN channel is 7

equal to the mutual information between x and y, which is given by I(x; y) = h(y) h(y x) = h(y) h(z) (2) where h(x) is the differential entropy of a RV X (measuring the amount of uncertainty for X), and h(y X) denotes the conditional differential entropy of a pair of RVs X and Y (measuring the amount of uncertainty for Y conditional on knowing X). For a CSCG RV X CN(0, σ 2 X ), h(x) = log 2 πeσ 2 X bits. Since z(n) CN(0, σ 2 z), we have h(z) = log 2 πeσ 2 z. The mutual information is then written as I(x; y) = h(y) log 2 πeσ 2 z (3) The capacity is defined as the maximum mutual information over all possible distributions of x subject to E[ x(n) 2 ] P. 8

Note that E[ y(n) 2 ] h 2 P + σ 2 z. For a RV X with zero-mean and variance σx 2, h(x) is maximized when X is Gaussian distributed. To maximize h(y), we need that y(n) is Gaussian distributed such that h(y) = log 2 πe( h 2 P + σ 2 z). This suggests that x also needs to be Gaussian distributed with E( x(n) 2 ) = P. Thus it follows that the capacity of the SISO AWGN channel is achieved when x(n) CN(0, P) (i.e., x(n) is drawn from a Gaussian codebook), and is given by C = log 2 πe( h 2 P + σ 2 z) log 2 πeσ 2 z = log 2 ( 1 + h 2 P σ 2 z ) (4) where the capacity unit is bits/second/hz (bps/hz). 9

Note that the capacity is equal to log 2 (1 + γ), where γ = h 2 P σz 2 receiver SNR. is the At the asymptotically low-snr regime, i.e., γ 0, since log(1 + x) x as x 0, we have C(γ 0) γ log 2 Thus the capacity doubles for every 3dB increase in SNR. At the asymptotically high-snr regime, i.e., γ, since log(1 + x) log x as x, we have (5) C(γ ) log 2 γ (6) Thus the capacity increases by 1 bps/hz for every 3dB increase in SNR. 10

Capacity of SISO Fading Channel Consider the following SISO fading channel similarly as defined for the SISO AWGN channel (with the symbol index n dropped): y = hx + z (7) whereas the channel h C is now a RV, which is constant during each transmission block, but can change from one block to the other (i.e., block-fading); it is assumed that the instantaneous channel h is known at the receiver. Each transmission block consists of N symbols. The instantaneous mutual information (IMI) between x and y 11

conditional on channel state h is given by I(x; y h) = log 2 ( 1 + h 2 P σ 2 z ) (8) Note that the IMI is a RV, the PDF of which is determined by the distribution of h 2. Two types of fading channel capacities are defined as follows: Ergodic capacity is defined as the statistical average of the IMI: ( )] C erg = E [log 2 1 + h 2 P σz 2 (9) where the expectation is taken over h 2. Outage capacity is defined as the information rate, below which the 12

IMI falls with a prescribed probability q%: ( ) ) Pr (log 2 1 + h 2 P < C σz 2 out,q% where C out,q% is called q% outage capacity. = q% (10) Ergodic capacity has two important applications: In the case where the instantaneous CSI on h is unknown at the transmitter, the transmitter can employ a long code that spans over different fading states of the channel to achieve reliable communication, provided that the information rate of the code is less than C erg of the fading channel. This technique is known as coded diversity. In this case, N needs not to be large. In the CSIT-known case, the transmitter can use different codes for different channel fading states. As long as the selected code for each fading state has a rate smaller than the IMI, reliable communication 13

is ensured. The maximum achievable average rate over all different fading states is then given by C erg of the fading channel. This technique is known as adaptive coding. In this case, N needs to be sufficiently large. According to Jensen s inequality a we have ( )] ( E [log 2 1 + h 2 P log σz 2 2 1 + E[ h 2 ]P σz 2 Thus, the ergodic capacity of a fading channel is no larger than that of an AWGN channel with a constant channel gain, which is equal to the average channel gain of the fading channel. Outage capacity is usually applicable to data traffic with stringent delay ) (11) a Jensen s inequality says that for a concave function f(x) where x is a RV, it holds that E[f(x)] f(e[x]). A function f(x) is concave iff for any two arbitrary values of x, x 1 and x 2, and positive number λ between 0 and 1, we have λf(x 1 )+(1 λ)f(x 2 ) f(λx 1 +(1 λ)x 2 ). 14

requirements (e.g., voice, real-time video), where the information of each transmission block has a constant rate, and needs to be decoded at the receiver with a block decoding error probability less than the prescribed outage probability target. In this case, N needs to be sufficiently large to achieve reliable transmission for each block. For a given outage probability target q%, the effective capacity is defined as the maximum average rate of the transmitted information successfully decoded at the receiver C eff,q% = (1 q%)c out,q% (12) The outage probability has different meanings for the CSIT-known and CSIT-unknown cases In the CSIT-known case, the transmitter knows the instantaneous channel and thus the IMI. Thus, if an outage event will occur at the receiver, this is known at the transmitter, and is thus called 15

transmitter-aware outage. In the CSIT-unknown case, an outage event is only known at the receiver but not at the transmitter, and is thus called receiver-aware outage. Let γ = h 2 P denote the instantaneous receiver SNR. Suppose that the σz 2 transmitter employs an optimal Gaussian codebook with information rate equal to C out,q%. Then from (10) it follows that an outage event occurs iff γ < 2 C out,q% 1 γ min (13) However, practical MCS cannot perform as well as the optimal Gaussian code. As a result, there is usually a positive SNR gap between the theoretical minimum SNR γ min and the actual required operating SNR γ, where γ > γ min. Thus the outage probability is generally defined as p out = Pr(γ < γ) (14) 16

SISO Fading Channel: Ergodic Capacity Assume iid Rayleigh fading with σ 2 h = E[ h 2 ] = 1, and σ 2 z = 1. 7 6 Capacity (bps/hz) 5 4 3 AWGN Channel Capacity Fading Channel Ergodic Capacity 2 1 0 10 5 0 5 10 15 20 SNR (db) 17

SISO Fading Channel: Outage Capacity Assume iid Rayleigh fading with σ 2 h = E[ h 2 ] = 1, σ 2 z = 1, and P = 5. 1 Outage Probability 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Outage Capacity (bps/hz) Effective Capacity (bps/hz) 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Outage Capacity (bps/hz) 18

Capacity of MIMO AWGN Channel Consider the following r t MIMO channel (with symbol index n dropped): y = Hx + z (15) H C r t is a constant matrix, which is known at the receiver Assume Gaussian signals, i.e., x CN(0, S x ), where S x E[xx H ] is the covariance matrix for the transmitted signal vector If Rank(S x ) = 1, it is called beamforming mode ; if Rank(S x ) > 1, it is called spatial multiplexing mode Tr(S x ) = E[ x 2 ] P, where P is the sum-power constraint at the transmitter z CN(0, σ 2 zi r ): spatially white Gaussian noise The capacity of the MIMO AWGN channel for a given S x is equal to the 19

mutual information between x and y, which is given by C = I(x; y) = h(y) h(y x) = h(y) h(z) (16) For a CSCG random vector x CN(0, S x ), h(x) = log 2 πedet(s x ) bits. Note that y CN(0, HS x H H + σ 2 zi r ) and z CN(0, σ 2 zi r ). For square matrices A and B, det(a)/det(b) = det(ab 1 ). Thus we have C = log 2 πedet(hs x H H + σzi 2 r log 2 πedet(σzi 2 r ) ( = log 2 det I r + 1 ) HS σz 2 x H H (17) where the unit is bits/second/hz (bps/hz). Next, we study the design of S x to maximize the MIMO AWGN channel capacity for the CSIT-known case and CSIT-unknown case, respectively, as well as their respective MIMO transceiver designs. 20

MIMO AWGN Channel With CSIT Assume that H is perfectly known at the transmitter. We want to find the optimal S x to maximize the MIMO channel capacity subject to Tr(S x ) P. First, let the truncated SVD of H denoted by H = Ũ ΛṼ H (18) where Ũ Cr m with m = Rank(H), and Ũ H Ũ = I m ; Ṽ C t m with Ṽ H Ṽ = I m ; and Λ is a m m strictly positive diagonal matrix with the diagonal elements (singular values) given by λ 1... λ m > 0. Then the MIMO AWGN channel capacity given in (17) is written as ( C = log 2 det I r + 1 ) Ũ σ ΛṼ H S z 2 x Ṽ ΛŨ H (19) 21

Let S x = Ṽ H S x Ṽ. Using the fact that det(i + AB) = det(i + BA), we have C = log 2 det ( I m + 1 σ 2 z Λ 2 Sx ) The Hadamard s inequality states that the determinant of any positive semi-definite matrix A is less than the product of all its diagonal elements, i.e., det(a) i (20) [A] i,i (21) with equality iff A is a diagonal matrix. Thus we have C ( ) m log 2 1 + λ2 i[ S x ] i,i σz 2 i=1 (22) 22

with equality iff S x is a diagonal matrix, i.e., S x = Ṽ H S x Ṽ = Σ (23) where Σ is a m m non-negative diagonal matrix. Then we conclude that the optimal transmit covariance to maximize the MIMO channel capacity is in the form of S opt x = Ṽ ΣṼ H (24) Let the diagonal elements of Σ be denoted by p 1,...,p m. Then the resulted MIMO channel capacity is given by m ( ) C opt = log 2 1 + λ2 ip i σz 2 i=1 Consider now the power constraint: (25) Tr(S opt x ) P (26) 23

Since Tr(AB) = Tr(BA), we have Tr(S opt x ) = Tr(Ṽ ΣṼ H ) = Tr(ΣṼ H Ṽ ) = Tr(Σ) = m p i P (27) i=1 One method to set p i s is the equal-power (EP) allocation: p i = P, i = 1,...,m (28) m And the resultant capacity is C EP = m i=1 ( ) log 2 1 + λ2 ip σzm 2 (29) 24

Eigenmode Transmission According to S opt x, we can design the transmitted signal vector as where Ṽ is called precoding matrix Σ is called power allocation matrix x = Ṽ Σ1 2 s (30) s C m 1 is the information signal vector, s CN(0, I m ) Check power constraint: E[ x 2 ] = E[x H x] = E[s H Σ 1 2 Ṽ H Ṽ Σ 1 2 s] = E[s H Σs] = m i=1 p i 25

Substituting this form of x into the MIMO channel (15) yields y = HṼ Σ1 2 s + z = Ũ ΛṼ H Ṽ Σ 1 2 s + z = Ũ ΛΣ 1 2 s + z (31) Suppose that the receiver multiplies the received signal vector by the decoding matrix Ũ H. We thus have ỹ = Ũ H y = Ũ H Ũ ΛΣ 1 2 s + z = ΛΣ 1 2 s + z (32) where z = Ũ H z CN(0, σzi 2 m ) Let ỹ = [ỹ 1,...,ỹ m ] T, s = [s 1,..., s m ] T, and z = [ z 1,..., z m ] T. Since ΛΣ 1 2 is a diagonal matrix, we see from (32) that the MIMO 26

channel has been decomposed into a set of m parallel SISO channels represented by ỹ i = λ i pi s i + z i, i = 1,...,m (33) The receiver SNR for the ith decomposed SISO channel is given by γ i = λ2 ip i σ 2 z and the channel capacity is given by ( ) C i = log 2 (1 + γ i ) = log 2 1 + λ2 ip i σz 2 (34) (35) Then the sum capacity over m SISO channels is given by m m ( ) C i = log 2 1 + λ2 ip i = C σz 2 opt (36) i=1 i=1 Thus, the joint deployment of linear precoders Ṽ and linear decoders Ũ H 27

achieves the MIMO AWGN channel capacity in the case of known CSIT. The transceiver design based on the SVD of the MIMO channel matrix is usually called eigenmode transmission. 28

Water-Filling Power Allocation We can further optimize p i s subject to m i=1 p i P to maximize C opt given in (25). This yields the following optimization problem: Maximize {p i } Subject to m i=1 log 2 ( 1 + λ2 ip i σ 2 z m p i P i=1 ) p i 0, i = 1,..., m (37) Applying the Lagrange multiplier method to solve this problem, the cost 29

function is written as J(p 1,..., p m ) = m i=1 ( ) log 2 1 + λ2 ip i σz 2 υ m p i (38) i=1 where υ 0 is the Lagrange multiplier associated with the sum-power constraint. The cost function is maximized when the derivatives of J( ) over p i, i = 1,...,m, are all equal to zero. Thus we have λ 2 i σ 2 z 1 + λ2 i p i σ 2 z = υ log 2, i = 1,...,m (39) Let µ = 1. We have υ log 2 p i = µ σ2 z λ 2 i, i = 1,...,m (40) 30

Furthermore, using the fact that p i 0 yields the optimal power allocation as ( ) + p i = µ σ2 z, i = 1,...,m (41) λ 2 i where (x) + max(0, x). The constant µ can be determined from the equality m ( ) + µ σ2 z = P (42) λ 2 i i=1 The above power allocation rule is called water-filling (WF) where µ: fixed water-level for all sub-channels σ2 z λ 2 i : normalized (to sub-channel power gain) noise power of the ith sub-channel p i : water poured (power allocated) to the ith sub-channel 31

The capacity of the MIMO AWGN channel with WF power allocation is ( ) + m λ 2 i µ σ2 z λ C WF = log 2 1 + 2 i m ( ( )) µλ 2 + σz 2 = log i 2 (43) σz 2 i=1 i=1 32

Optimality of Beamforming First, consider the r 1 SIMO AWGN channel: The (truncated) SVD of h is y = hx + z (44) h = h h 1 (45) h Thus the optimal precoder is trivially 1, and the optimal decoder is u H = hh h (46) This is same as MRC receive beamforming. 33

The receiver SNR after applying MRC beamforming is γ SIMO = h 2 P σ 2 z (47) The SIMO AWGN channel capacity is ( C SIMO = log 2 det I r + 1 ) hph H σz 2 ( ) = log 2 1 + h 2 P σz 2 ( = log 2 1 + P ) h H h σz 2 = log 2 (1 + γ SIMO ) (48) Thus, MRC receive beamforming achieves the SIMO AWGN channel capacity. Next, consider the following 1 t MISO channel: y = h T x + z (49) 34

The (truncated) SVD of h T is h T = 1 h ht h (50) Thus the optimal decoder is trivially 1, and the optimal precoder is v = h h (51) This is same as P-MRC transmit beamforming (without power gain P). The optimal transmit covariance matrix is S opt x = vpv H = P h 2h h T (52) Note that Rank(S opt x ) = 1. Thus beamforming mode is optimal. The SNR of the MISO channel after applying P-MRC transmit 35

beamforming is γ MISO = h 2 P σ 2 z The MISO AWGN channel capacity is ( C MISO = log 2 1 + 1 ) h T S opt σz 2 x h ( ) = log 2 1 + h 2 P σz 2 = log 2 ( 1 + P σ 2 z ) h 4 h 2 (53) = log 2 (1 + γ MISO ) (54) Thus, P-MRC transmit beamforming achieves the MISO AWGN channel capacity. Last, consider the strongest eigenmode beamforming (SEB) scheme for the MIMO AWGN channel: y = Hx + z (55) 36

Recall that the precoder for this scheme is given by w opt t = Pv 1 (56) where v 1 is the first column of V (Recall that the SVD of H is UΛV H ). Thus the transmit covariance matrix for SEB is S SEB x = Pv 1 v H 1 (57) Note that Rank(S SEB x ) = 1, thus beamforming mode is used. With this transmit covariance, the maximum achievable rate is ( R SEB = log 2 det I r + P ) Hv σz 2 1 v H 1 H H ) Hv 1 2 ( = log 2 1 + P σz ( 2 = log 2 1 + λ2 1P σz 2 ) ( = log 2 1 + P ) u σz 2 1 λ 1 2 37 (58)

where u 1 is the first column of U. Applying the decoder w opt t = u 1 yields the receiver SNR as Then it follows that γ SEB = uh 1 Hv 1 2 P u H 1 z 2 = λ2 1P σ 2 z R SEB = log 2 (1 + γ SEB ) = log 2 ( 1 + λ2 1P σ 2 z ) (59) (60) Thus, the SEB achieves the maximum rate over the AWGN MIMO channel with rank-one transmit covariance matrix S SEB x (i.e., beamforming mode is used). However, in general the capacity of the MIMO AWGN channel is 38

achieved by spatial multiplexing mode with Rank(S opt x ) 1: ( ) + m λ 2 i µ σ2 z λ C WF = log 2 1 + 2 i (61) i=1 Clearly, R SEB is equal to C WF iff σ 2 z µ σ2 z λ 2 1 µ σ2 z λ 2 i = P (62) 0, i = 2,...,m (63) Since λ 2 λ 3... λ m, the conditions in (62) and (63) are satisfied iff ( ) P + σ2 z σ2 z P σ2 z σ2 z (64) λ 2 1 λ 2 2 λ 2 2 λ 2 1 Thus the SEB is capacity optimal only when P is sufficiently small. 39

Asymptotic Capacity Analysis We have shown that for the asymptotically low-power regime, i.e., P 0, the MIMO AWGN channel capacity is achieved by beamforming mode and behaves as C MIMO (P 0) = log 2 ( 1 + λ2 1P σ 2 z ) λ2 1P (log 2)σ 2 z Recall that the SISO AWGN channel capacity at the asymptotically low-power regime behaves as (65) C SISO (P 0) h 2 P (log 2)σ 2 z (66) Thus the capacity gain of the MIMO system over the SISO system for 40

the asymptotically low-power regime is given by C MIMO (P 0) C SISO (P 0) = λ2 1 h 2 (67) which is the achievable by the strongest eigenmode beamforming (SEB). On the other hand, consider the asymptotically high-power regime, i.e., P, the MIMO AWGN channel capacity is achieved by spatial multiplexing mode and behaves as (lower-bounded by equal-power allocation) m ( ) C MIMO (P ) log 2 1 + λ2 ip c + m log σzm 2 2 P (68) i=1 where c = m i=1 log 2 λ 2 i σ 2 zm is a constant independent of P. Notice that the MIMO AWGN channel capacity with the optimal WF power allocation is upper-bounded by (considering each sub-channel is 41

allocated with full-power P) C MIMO (P ) m i=1 log 2 ( 1 + λ2 ip σ 2 z ) c + m log 2 P (69) where c = m i=1 log 2 λ 2 i σ 2 z is still a constant independent of P. With upper and lower capacity bounds, the asymptotic ratio between the MIMO channel capacity and the logarithm (base-2) of the transmission power as the power goes to infinity, so-called spatial multiplexing gain, is given by lim P C MIMO log 2 P = m (70) Note that for very large values of P, the MIMO channel capacity (using either EP or WF power allocation) increases by m bps/hz for every 3dB increase in P. 42

Since for the SISO system, the channel capacity with P behaves as h where ĉ = log 2 2. σ 2 z Thus it follows that C SISO (P ) ĉ + log 2 P (71) C MIMO (P ) C SISO (P ) = m (72) Note that the spatial multiplexing gain is equal to m, the rank of the MIMO channel matrix, and is not necessarily equal to min(t, r) unless the channel matrix is full-rank, which occurs with probability one for the case of iid Rayleigh fading MIMO channel. Thus, independent channel fading is an advantageous factor for maximizing the spatial multiplexing gain of MIMO channel. 43

MIMO AWGN Channel Capacity With CSIT Assume t = r = 2, H = [1 ρ; ρ 1] with ρ = 0.5, and σ z = 1. 10 Achievable Rate (bps/hz) 9 8 7 6 5 4 3 SISO Capacity MISO/SIMO Capacity MIMO: Strongest Eigenmode Beamforming (SEB) MIMO: Eigenmode Transmission with EP MIMO Capacity: Eigenmode Transmission with WF 2 1 0 10 5 0 5 10 15 20 SNR (db) 44

MIMO AWGN Channel Without CSIT For the CSIT-known case, we have shown that the linear precoders and decoders based on the MIMO channel SVD (eigenmode transmission) together with WF power allocation achieve the MIMO channel capacity. Furthermore, the MIMO channel is decomposed into parallel SISO AWGN channels, which greatly simplifies the transceiver design. Now consider the case where the channel H is only known at the receiver, but is unknown at the transmitter. Assume that the transmitter employs an isotropic transmission with the white signal covariance matrix given by S (w) x = P t I t (73) Note that Tr(S (w) x ) = P. 45

With the white transmit covariance, the capacity of the MIMO AWGN channel is given by ( C (w) = log 2 det I r + 1 ) HS (w) σz 2 x H H ( = log 2 det I r + P ) σzt 2 HHH ( = log 2 det I r + P ) σztũ ΛṼ H Ṽ ΛŨ H 2 ( = log 2 det I m + P ) σzt Λ 2 2 m ( ) = log 2 1 + λ2 ip (74) σzt 2 i=1 Note that C (w) is identical to C EP in the case of known CSIT with equal-power allocation if t = m, i.e., t r and H is full-rank. 46

It follows from (74) that the spatial multiplexing gain for the MIMO AWGN channel with the white transmit covariance is m, which is the rank of the MIMO channel matrix, like in the CSIT-known case. However, we will see soon that the transceiver design in the CSIT-unknown case to achieve C (w) is in general far more complex as compared with the eigenmode transmission in the CSIT-known case. Before that, let s take a look at the special case with t = 2 and r = 1. Recall that in Lecture III we have introduced the Alamouti code, for which the transmit covariance matrix can be shown to be S (w) x = P 2 I 2. In this case, from (74) we see that the MISO AWGN channel capacity with the white transmit covariance is given by ( ) C (w) = log 2 1 + h 2 P = log 2σz 2 2 (1 + γ AC ) (75) 47

where γ AC = h 2 P 2σ 2 z is the SNR after decoding the Alamouti code. Thus Alamouti code achieves the MISO AWGN channel capacity with t = 2. In Lecture III, we have also introduced a heuristic scheme (which achieves the same diversity order of 2 as the Alamouti code) using alternate transmission and repetition coding over two transmit antennas. The maximum achievable rate for this scheme is given by R = 1 ( ) 2 log 2 1 + h 2 P (76) σz 2 where the factor 1 2 in front of the log function is due to repetition coding. Since log ( 1 + x 2) 1 2 log(1 + x), x 0, we can show that for any P 0 R C (w) (77) 48

As P, we have lim P R = lim C (w) P 1 log 2 2(P) + 1 log h 2 2 2 σz 2 h 2 log 2 (P) + log 2 2σ 2 z = 1 2 (78) Thus, the achievable rate of the heuristic scheme is half of that of the Alamouti code (the channel capacity) at the high-power regime, due to repetition coding. On the other hand, as P 0, we have lim P 0 R = lim C (w) P 0 h 2 P (2 log 2)σ 2 z h 2 P (2log 2)σ 2 z = 1 (79) Thus, the achievable rate of the heuristic scheme is equal to that of the Alamouti code (the channel capacity) at the low-power regime. 49

Next, consider the case of t = 2 and r 2. In Lecture III, we have introduced the joint transmit diversity (using Alamouti code) and receiver beamforming (using MRC) scheme to achieve the full diversity order tr for the iid Rayleigh fading MIMO channel. Assume that H is full-rank and thus m = min(t, r) = 2. The maximum achievable rate for this scheme is given by ( R = log 2 1 + Ω ) ( sump = log 2σz 2 2 1 + (λ2 1 + λ 2 2)P 2σz 2 ) (80) The capacity of the MIMO channel with S (w) x C (w) = log 2 ( 1 + λ2 1P 2σ 2 z = P I 2 2 is given by ) ) + log 2 ( 1 + λ2 2P 2σ 2 z (81) 50

As P, we have lim P R = lim C (w) P log 2 (P) + log 2 λ 2 1 +λ2 2 2σ 2 z 2 log 2 (P) + 2 i=1 log 2 λ 2 i 2σ 2 z = 1 2 (82) Thus, the achievable rate of this scheme is half of the channel capacity at the high-power regime, although having the same S (w) x. This is because the Alamouti code on the average transmits only one data stream with two transmit antennas. As a counterpart, as P 0, we have lim P 0 R = lim C (w) P 0 (λ 2 1 +λ2 2 )P (2 log 2)σ 2 z (λ 2 1 +λ2 2 )P (2log 2)σ 2 z = 1 (83) Thus, the achievable rate of this scheme is equal to the channel capacity at the low-power regime. 51

Horizontal Encoding For general MIMO system configurations, how to achieve/approach the channel capacity C (w) with the white transmit covariance by practical transceivers is a challenging task in the CSIT-unknown case. A practical transmitter design for spatial multiplexing is Horizontal Encoding, where the transmitted signal vector is given by x = P t [s 1,..., s t ] T (84) where s 1,...,s t correspond to independently encoded data streams with information rates R 1,...,R t ; and E[ s i 2 ] = 1, i = 1,...,t. Then the transmission sum-rate is R sum = t i=1 R i. The above spatial multiplexing scheme is also popularly known as Bell Labs Layered Space Time (BLAST) transmission scheme. 52

For horizontal encoding, each receive antenna receives the combined signal of all transmitted data streams. Thus spatial signal processing over all receive antennas is needed to recover each data stream by suppressing the interference from other data streams. There are in general two classes of receivers for horizontal encoding Linear receiver: zero-forcing (ZF) receiver; minimum-mean-squared-error (MMSE) receiver Nonlinear receiver with successive interference cancelation (SIC): ZF-SIC receiver; MMSE-SIC receiver Note that another possible scheme for MIMO spatial multiplexing is Vertical Encoding, where s 1,..., s t are jointly encoded rather than independently encoded as for horizontal encoding. At the receiver, iterative (soft) MIMO detection and channel decoding based on the maximum likelihood (ML) principle are applied to decode the information from all data streams. 53

Linear Receiver Consider the MIMO AWGN channel: with S (w) x y = Hx + z (85) = P t I t, and z CN(0, σ 2 zi r ). Assume that H is full-rank. Linear receiver first applies a decoding matrix T C t r to extract the signals from different data streams and then decodes them separately. ỹ = Ty = THx + Tz (86) Two design criteria are commonly adopted for linear receivers ZF: T is designed to satisfy TH = I t, i.e., the interference in each received data stream due to all other data streams is removed. MMSE: T is designed to minimize the MSE between ỹ and x: E[ ỹ x 2 ]. 54

ZF Receiver For ZF receiver, in order to have a feasible T such that TH = I t, the channel matrix H needs to be square or tall, i.e., r t. If r = t and H is a square matrix, the only candidate for T to make TH = I t is T = H 1 (87) However, if r > t and H is a tall matrix, there are more than one candidates for T to make TH = I t. In this case, one commonly adopted choice for T is the pseudo-inverse of H given by T ZF = ( H H H ) 1 H H H (88) Note that T ZF becomes H 1 if r = t. 55

The reason for choosing H as T ZF in the case of r > t is given next. Suppose that there is no noise at the receiver, i.e., z = 0. Thus, the MIMO channel becomes y = Hx (89) The above MIMO system consists of r linear equations with t < r unknowns, and is thus over-determined. Let ˆx be an estimate of x. It is desirable to minimize the squared error between H ˆx and y, denoted by J(ˆx) = H ˆx y 2. Then we have J(ˆx) = (H ˆx y) H (H ˆx y) = ˆx H H H H ˆx y H H ˆx ˆx H H H y + y H y (90) 56

For any complex vector x, we have dx H Ax dx = Ax, dx H y dx = y, dy H x dx = 0 (91) In order to find ˆx to minimize the squared error, we have dj(ˆx) dˆx = 0 H H H ˆx = H H y (92) Thereby we obtain the so-called least-square (LS) estimate of x as ˆx LS = ( H H H ) 1 H H y = T ZF y (93) Thus the LS estimation matrix is identical to the ZF decoding matrix. For r t and H is full-rank, the truncated SVD of H is given by H = Ũ ΛV H, where Ũ Cr t with Ũ H Ũ = I t ; V C t t with V H V = V V H = I t ; and Λ is a t t diagonal matrix with the diagonal elements (singular values) given by λ 1... λ t > 0. 57

Then it can be verified that T ZF = ( H H H ) 1 H H = ( V ΛŨ H Ũ ΛV H ) 1 V ΛŨ H = (V Λ 2 V H )V ΛŨ H = V Λ 1 Ũ H (94) Substituting the above alternative expression for T ZF into (86) yields ỹ = x + z (95) where z = V Λ 1 Ũ H z CN(0, σ 2 zv Λ 2 V H ), V Λ 2 V H = ( H H H ) 1. For the above equivalent channel, it is observed that the total signal power is E[ x 2 ] = P, and the total noise power is E[ z 2 ] = t i=1 Thus, the average SNR is given by σ 2 z λ 2 i. γ ZF avg = E[ x 2 ] E[ z 2 ] = P t i=1 σ 2 z λ 2 i λ2 tp σ 2 z (96) 58

where the upper bound is determined by the smallest singular value λ t. This disadvantageous phenomenon for ZF receiver is so-called noise enhancement due to the fact that the LS estimator in fact ignores the noise effect at the receiver. Let x = [x 1,..., x t ] T, z = [ z 1,..., z t ] T, and v ij = [V ] i,j, i = 1,...,t, j = 1,...,t. The SNR of the ith data stream with ZF receiver is γ ZF i = E[ x i 2 ] E[ z i 2 ] = P t σ 2 z t j=1 v ij 2 λ 2 j Thus, the achievable sum-rate for ZF receiver is = P t σ 2 z[(h H H) 1 ] i,i, i = 1,...,t (97) R ZF sum = t i=1 log 2 (1 + γ ZF i ) (98) 59

MMSE Receiver MMSE receiver is the optimal linear receiver that maximizes the receiver SNR for each data stream, where the noise includes both the additive noise and the interference from all other data streams. In general, MMSE receiver works for any t and r. However, in order to achieve spatial multiplexing gain equal to t, we need that r t. Let the ith row of the MMSE decoding matrix T MMSE be t H i, i = 1,...,t, where t i C r 1. The total MSE to be minimized by T MMSE is expressed as E [ T MMSE y x 2] = t E [ t H i y x i 2] (99) i=1 Thus the total MSE can be minimized by independently optimizing t i s. 60

Let J i (t i ) = E[ t H i y x i 2 ], i = 1,...,t, be the MSE for the ith data stream, which can be further expressed as where J i (t i ) = E[(t H i y x i )(y H t i x i)] = t H i R yy t i r H xyt i t H i r xy + E[ x i 2 ] (100) R yy = E[yy H ] = E[(Hx + z)(hx + z) H ] = P t HHH + σzi 2 r r xy = E[yx i] = E[(Hx + z)x i] = P t h i (101) Note that H = [h 1,..., h t ]. The MMSE estimator t i is then obtained as dj i (t i ) dt i = 0 R yy t i = r xy (102) 61

Thus we have where Γ = P. tσz 2 t i = R 1 yy r xy = ( HH H + 1 Γ I r) 1 h i, i = 1,...,t (103) Substituting t i = R 1 yy r xy into (100) yields the minimum MSE for the ith data stream as J min i = E[ x i 2 ] r H xyr 1 yy r xy (104) Furthermore, we have T MMSE = H H (HH H + 1 Γ I r) 1 (105) As Γ, i.e., the noise effect can be safely ignored, it can be shown that T MMSE T ZF. 62

Let H { i} be the matrix H with the ith column being deleted, i = 1,...,t. Using the matrix inversion lemma, (A BD 1 C) 1 = A 1 + A 1 B(D CA 1 B) 1 CA 1 (106) with A = H { i} H H { i} + (1/Γ)I r, B = h i, C = h H i, and D = 1, we have ( HH H + 1 ) 1 Γ I r = A 1 A 1 h i h H i A 1 (107) 1 + h H i A 1 h i Thus, from (103), we obtain an alternative expression for t i as where β i = h H i A 1 h i. t i = 1 1 + β i A 1 h i (108) Using this new expression for t i, from (104) it follows that the minimum 63

MSE is given by J min i = E[ x i 2 ] r H xyt i = P t P t hh i = P t P t β i 1 + β i = P t 1 A 1 h i 1 + β i 1 (109) 1 + β i Now consider the equivalent SIMO channel for decoding the ith data stream given by y = h i x i + z i (110) where z i = j i h jx j + z denotes the effective noise for the ith data stream including both the interference from all other data streams and the additive noise (SIMO channel with correlated noise studied in Lecture II). Note that z i CN(0, (P/t)A). 64

Let ỹ = [ỹ 1,...,ỹ t ] T. Applying t i given in (108) to y yields ỹ i = t H i y = t H i h i x i + t H i z i = β i 1 + β i x i + z i (111) where z i = t H i z i CN(0, σ 2 z). Note that σ 2 z = t H i E[z i z H i ]t i = ( ) 2 1 P 1+β i t hh i A 1 AA 1 h i = ( ) 2 1 P β 1+β i t i. Thus the MMSE receiver SNR for the ith data stream is ( ) 2 ( ) 2 β i E[ xi γi MMSE 1+β i 2 β ] i P 1+β i t = = σ 2 z ( ) 2 = β i (112) 1 P β 1+β i t i Notice that for the SIMO channel given in (110) with correlated noise, we have shown in Lecture II that the optimal receive beamforming vector is w opt = ( P A) 1 hi, which is identical to the MMSE decoding t vector t i in (108) if ignoring the multiplication constants. 65

Furthermore, w opt maximizes the receiver SNR as (see (51) of Lecture II) γ max = h H i ( ) 1 P t A P h i t = hh i A 1 h i = β i (113) which is same as that achieved by the MMSE decoder. The achievable sum-rate for MMSE receiver is R MMSE sum = t i=1 where the achievable rate for the ith data stream is R MMSE i = log 2 (1 + γ MMSE i ) = log 2 (1 + β i ). log 2 (1 + γ MMSE i ) (114) It can be shown that Ri MMSE is indeed the capacity of the SIMO channel given in (110) for the ith data stream. 66

Nonlinear Receiver Nonlinear receiver applies successive interference cancelation (SIC) to decode different data streams. Assume that the decoding order for the SIC is given by the reverse of the transmit antenna index, i.e., the data stream from the 1st transmit antenna is decoded last, the data stream from the 2nd transmit antenna is decoded second last,..., the data stream from the tth transmit antenna is decoded first. Then the SIC is described as follows. Step 1: Apply a linear decoder t t to the received signal y, and extract the data stream from the tth transmit antenna ỹ t = t H t y (115) 67

Step 2: Decode from ỹ t the tth data stream information s t, and reconstruct x t Step 3: Subtract h t x t from y, and yield y { t} = y h t x t = H { t} x { t} + z (116) where H { t} is H with the tth column deleted, and x { t} is x with the tth element deleted. Step 4: Return to Step 1, extract and decode the (t 1)-th data stream from y { t}, and then subtract the corresponding signal component from y { t}, until all t data streams are decoded. For each iteration of Step 1, if t i, i = t, t 1,...,2, is designed using the ZF criterion, the nonlinear receiver is called ZF-SIC; while if the MMSE criterion is used, it is called MMSE-SIC. Note that t 1 is given by the receive MRC for both cases of ZF-SIC and MMSE-SIC. 68

ZF-SIC For each iteration of SIC, the ZF decoding vector t H i, i = 1,..., t, can be obtained as the last row of the matrix H [i], where H [i] = [h 1,...,h i ] (117) Alternatively, the decoding vectors for ZF-SIC can be derived from the QR decomposition of the MIMO channel matrix. Assume that r t and H is full-rank, the truncated QR decomposition of H is given by H = QR (118) where Q C r t satisfies that Q H Q = I t, and R C t t is an upper-triangular matrix with φ ij = [R] i,j = 0, i > j. The QR decomposition of H can be obtained using the Gram-Schmidt 69

procedure as presented next. Let Q = [e 1,...,e t ], where e i = 1, i = 1,...,t and e H i e j = 0, i j. First, since h 1 = φ 11 e 1, we obtain immediately e 1 = h 1 h 1, φ 11 = h 1 (119) Second, since h 2 = φ 12 e 1 + φ 22 e 2, we multiply both left- and right-hand sides by e H 1 to obtain Then by letting u 2 = h 2 φ 12 e 1, we obtain φ 12 = e H 1 h 2 (120) φ 22 = u 2, e 2 = u 2 u 2 (121) Thus in general for any i {1,...,t} with h i = j<i φ jie j + φ ii e i, we 70

can obtain first and then u i = h i j<i φ jie j, and finally φ ji = e H j h i, j < i (122) φ ii = u i, e i = u i u i (123) The decoding matrix for ZF-SIC is given by T ZF SIC = Q H (124) Note that the ith row of T ZF SIC is t H i = e H i, i = 1,...,t. Applying T ZF SIC to y yields ỹ = T ZF SIC y = Q H QRx + Q H z = Rx + z (125) where z = Q H z CN(0, σ 2 zi t ). 71

Thus we have the equivalent channel for decoding the ith data stream as ỹ i = φ ii x i + j>i φ ij x j + z i, i = 1,...,t (126) Since for decoding the ith data stream, all data streams with index j > i have already been decoded and thus can be subtracted from ỹ i. Thus, the effective channel for the ith data stream becomes ŷ i = ỹ i j>i φ ij x j = φ ii x i + z i, i = 1,..., t (127) Thus the SNR for the ith data stream by ZF-SIC is given by γ ZF SIC i = φ2 P ii t σz 2 (128) 72

The achievable sum-rate for ZF-SIC is given by R ZF SIC sum = t i=1 log 2 (1 + γ ZF SIC i ) (129) Next, we show that the precoders obtained by the channel QR decomposition are identical to those obtained by the last rows of H [i] s. Without loss of generality, consider i = t and thus H [t] = H. We thus have H = (H H H) 1 H H = (R H Q H QR) 1 R H Q H = R Q H = R 1 Q H The last row of H is thus given by [R 1 ] t,t e H t = 1 φ tt e H t (130) where we have used the fact that for an upper-triangular matrix R, R 1 73

is also upper-triangular, and [R] ii and [R 1 ] ii are inverses of each other. This row vector is thus identical to t H t = e H t by ignoring the multiplication constant 1/φ tt. Furthermore, we obtain an alternative expression for the SNR of the tth data stream by the linear ZF receiver as γ ZF t = γ ZF SIC t = φ2 P tt t σz 2 (131) Comparing this with that given in (97), we have In general, it can be shown that 1 [(H H H) 1 ] t,t = φ 2 tt (132) γ ZF i < γ ZF SIC i, 1 i < t (133) 74

Thus we have R ZF sum < R ZF SIC sum (134) So far, we have assumed that the decoding order for ZF-SIC is fixed as the inverse of the antenna index. Notice that in total there are t! different decoding orders for ZF-SIC, each of which corresponds to a different channel QR decomposition and thus different φ ii s, γ ZF SIC i s, as well as Rsum ZF SIC. Let P be a t t permutation matrix (each row/column has one element equal to one and zeros elsewhere), which specifies the decoding order. Then consider the following QR decomposition: HP = Q p R p (135) Note that H = Q p R p P 1 = Q p R p P since P 1 = P. 75

By letting T ZF SIC = Q H p, we obtain ỹ = T ZF SIC y = Q H p Q p R p Px + Q H p z = R p x p + z (136) where x p = Px is a permuted version of x and z = Q H p z CN(0, σ 2 zi t ). In the case r = t and for the asymptotically high-power region, i.e., P, we have ) t t Rsum ZF SIC = φ 2 ii i=1 log 2 (1 + φ2 ii P t σ 2 z t log 2 P σ 2 zt + log 2 = t log 2 P σ 2 zt + log 2 det(h) 2 (137) i=1 since det(h) 2 = det(h H H) = det(r 2 ) = (det(r)) 2 = t i=1 φ2 ii. Thus the achievable rate of ZF-SIC is independent of the decoding order for the high-power regime. 76

MMSE-SIC For MMSE-SIC, the effective SIMO channel for decoding the ith data stream with data streams of index j > i already decoded and subtracted from y is given by ŷ i = h i x i + z i (138) where z i = j<i h jx j + z denotes the total noise including both the interference from data streams with index j < i and the additive noise. From our previous study, we know that the optimal MMSE decoding vector for the channel given in (138) is t i = 1 1 + α i A 1 i h i (139) 77

where A i = j<i h j h H j + 1 Γ I r = j<i h j h H j + tσ2 z P I r (140) α i = h H i A 1 i h i (141) The resultant SNR for the ith data stream is given by γ MMSE SIC i = α i (142) The achievable sum-rate for MMSE-SIC is given by R MMSE SIC sum = t i=1 log 2 (1 + γ MMSE SIC i ) (143) Next, we show that Rsum MMSE SIC is indeed equal to the MIMO channel capacity C (w) with the white transmit covariance as given in (74), i.e., the MMSE-SIC receiver is capacity-optimal. 78

First, we express the achievable rate for the ith data stream, i > 1, as R MMSE SIC i = log 2 (1 + α i ) (144) ( ) 1 = log 2 1 + h H i h j h H j + tσ2 z P I r h i (145) j<i ( ) = log 2 1 + P 1 P h j h H j + I r h i (146) h H tσz 2 i = log 2 det I r + ( = log 2 det P tσ 2 z ( P tσ 2 z tσ 2 z j<i j<i ) 1 P h j h H j + I r h tσz 2 i h H i ) 1 ( h j h H j + I r j<i P tσ 2 z j i (147) h j h H j + I r ) (148) 79

Since det(b 1 A) = det(a), we have det(b) R MMSE SIC i = log 2 det I i I i 1 ( P tσ 2 z ) ( h j h H j + I r log 2 det j i P tσ 2 z ) h j h H j + I r j<i (149) Thus we have R MMSE SIC sum = = log 2 det ( t i=1 P tσ 2 z R MMSE SIC i = I 1 + t (I i I i 1 ) = I t i=2 ) t ( P h i h H i + I r = log 2 det i=1 tσ 2 z HH H + I r ) = C (w) Since the above proof holds regardless of the decoding order, the achievable sum-rate for MMSE-SIC is independent of the decoding order. 80

However, different decoding orders will result in different transmission rates allocated over the t data streams, to make R MMSE SIC sum = C (w). Question: Do we need r t in the above proof for MMSE-SIC? In general, for ZF-based receivers (linear ZF or ZF-SIC), it needs that r t; for MMSE-based receivers (linear MMSE or MMSE-SIC), r can be smaller than t. 81

MIMO AWGN Channel Capacity Without CSIT Assume t = r = 2, H = [1 ρ; ρ 1] with ρ = 0.5, and σ z = 1. 18 16 Achievable Rate (bps/hz) 14 12 10 8 6 Capacity/MMSE SIC ZF SIC MMSE ZF 4 2 0 0 5 10 15 20 25 30 SNR (db) 82

MIMO AWGN Channel Capacity Without CSIT Assume t = r = 2, H = [1 ρ; ρ 1] with ρ = 0.9, and σ z = 1. 14 12 Achievable Rate (bps/hz) 10 8 6 4 Capacity/MMSE SIC ZF SIC MMSE ZF 2 0 0 5 10 15 20 25 30 SNR (db) 83

Capacity of MIMO Fading Channel Consider the following r t MIMO fading channel: y = Hx + z (150) H is constant during each transmission block, but can change from one block to the other (i.e., block-fading); assume that the instantaneous channel H is known at the receiver. x CN(0, S x ), where the transmit covariance matrix S x is constant over all transmission blocks if the channel is unknown at the transmitter, but may change over transmission blocks according to the instantaneous channel if known at the transmitter. Tr(S x ) P for any transmission block. z CN(0, σ 2 zi r ). 84

The IMI of the MIMO fading channel for a given pair of H and S x is given by I(x; y H) = log 2 det ( I r + 1 ) HS σz 2 x H H The ergodic capacity of the MIMO fading channel is then defined as where the expectation is taken over H. (151) C erg = E [I(x; y H)] (152) The q% outage capacity of the MIMO fading channel is denoted by C out,q%, which is defined as Pr (I(x; y H) < C out,q% ) = q% (153) 85

MIMO Fading Channel With CSIT In the CSIT-known case, S x for each transmission block can be optimized based upon the instantaneous channel H. In the ergodic capacity case, S x for each transmission block should be chosen to maximize the IMI, which is equivalent to the MIMO AWGN channel capacity for the given H. Thus, the optimal S x is designed using eigenmode transmission and WF power allocation. In the outage capacity case, if we can find the minimum outage probability q% for a constant transmission rate R such that Pr (I(x; y H) < R) q% (154) Then we can claim that R is the q% (transmitter-aware) outage capacity. Thus maximizing the outage capacity for a given outage probability 86

target is equivalent to minimizing the outage probability for a given constant rate target, i.e., C out,q% and q% have one-to-one correspondence. In order to minimize the outage probability for a given constant transmission rate, it is desirable to maximize the IMI for each transmission block based on the instantaneous channel H given P. Thus the optimal S x for each transmission block is given by the eigenmode transmission and WF power allocation, same as that for the ergodic capacity case. However, for transmission blocks with superior channel conditions, S x that satisfies the power constraint and results in an IMI larger than the given rate target R may not be unique. For such cases, it is usually desirable to find the optimal S x to minimize the transmit power given that the resultant IMI is equal to R. For example, consider a SISO fading channel with constant transmission 87

rate R. A transmission outage will not occur if log 2 ( 1 + P h 2 σ 2 z Thus if h 2 > (2R 1)σ 2 z P, any transmit power between (2R 1)σ 2 z h 2 ) R. and P can make a non-outage transmission, while the minimum power is (2R 1)σ 2 z h 2. For MIMO fading channels, the transmit power to support a given rate R is minimized by eigenmode transmission with WF power allocation: ( ) + p i = µ σ2 z, i = 1,...,m (155) λ 2 i where the water-level µ in this case should be chosen such that the rate constraint is satisfied with equality, i.e., m ( ( )) µλ 2 + log i 2 = R (156) σz 2 i=1 If m i=1 p i > P, transmitter-aware outage occurs; otherwise, no outage occurs. 88

MIMO Fading Channel Without CSIT In the CSIT-unknown case, a constant transmit covariance matrix S x is used for all transmission blocks. In the ergodic capacity case, the optimal S x should maximize the statistical average of the IMI given by [ ( E log 2 det I r + 1 )] HS σz 2 x H H (157) if the distribution of H is known at the transmitter. In the case of H H w, it can be shown that the optimal transmit covariance is the white covariance given by S (w) x = P t I t (158) 89

From (74), it follows that the resultant ergodic capacity is given by m ( )] C erg = E [log 2 1 + λ2 ip σzt 2 i=1 (159) Thus the ergodic capacity is determined by the distribution of the squared singular values of the random MIMO channel matrix. Note that λ 2 i s are the eigenvalues of the matrix H H H if r t = m, since with the truncated SVD of H, we have H H H = V ΛŨ H Ũ ΛV H = V Λ 2 V H (160) Similarly, it can be shown that λ 2 i s are the eigenvalues of HH H if t > r = m. 90

Define the following matrix W = H H H r t HH H r < t (161) The distribution law of W when H H w is called Wishart distribution, for which the joint distribution of the eigenvalues λ 2 i s are known (details are omitted here). With the Wishart distribution for W, it can be shown that the spatial multiplexing gain for the ergodic capacity of MIMO fading channel is given by lim P C erg log 2 P = min(t, r) (162) This is consistent with the spatial multiplexing gain we have obtained for the MIMO AWGN channel case, which is shown equal to the rank of the 91

given MIMO channel matrix, since for H H w, the rank of the random matrix H is min(t, r) with probability one. In the CSIT-unknown case, C erg can be achieved by horizontal encoding at the transmitter and MMSE-SIC decoding at the receiver, similarly as for the MIMO AWGN channel case. However, it is worth noting that in the MIMO fading channel case, each data stream of horizontal encoding needs to span over all different fading states (i.e., to achieve coded diversity), and the transmission rate of each data stream needs to be set appropriately according to the channel distribution as well as the decoding order at the receiver. Next, consider the outage capacity of the MIMO fading channel without CSIT. In this case, it is desirable to find a constant transmit covariance S x that minimizes the outage probability for a given constant 92

transmission rate R: Pr ( log 2 det ( I r + 1 ) ) HS σz 2 x H H < R (163) by assuming that the channel distribution is known at the transmitter. For the case of H H w, it has been conjectured that the optimal S x is in the following form S x = P k diag(1,...,1, 0...,0) (164) }{{}}{{} k ones t k zeros i.e., the transmitter selects (arbitrarily) a subset of k out of t transmit antennas for transmission with the white covariance. For example, consider the MISO fading channel with t = 2, r = 1, and h h w. We know that if k = 1 and thus S x = Pdiag(1, 0), i.e., only the first transmit antenna is used and the MISO fading channel becomes a SISO fading channel, the resultant outage probability for a constant 93

transmission rate R is Pr (log 2 (1 + P h ) 1 2 σ 2 z ) < R (165) Since h 1 2 is exponentially distributed with mean σh 2, we obtain the outage probability as p (1) out = 1 e γ (166) where γ = (2R 1)σz 2. On the other hand, if k = t = 2 and thus σh 2P S x = (P/2)I 2 (e.g., using Alamouti code), the outage probability for the MISO fading channel with the same rate R is given by ( Pr (log 2 1 + P( h ) ) 1 2 + h 2 2 ) < R 2σz 2 (167) Since h 1 2 + h 2 2 is chi-square distributed with 4 degrees of freedom, we 94

obtain the outage probability as p (2) out = 1 e 2γ (1 + 2γ) (168) Thus, we conclude that if e γ > (1 + 2γ), then p (1) out < p (2) out, i.e., using single transmit antenna results in a lower outage probability than using both antennas, and vice versa. Alamouti code can be capacity-suboptimal! In general, with other parameters fixed, the smaller the rate target R is, the larger is the optimal number of active antennas, k. This is because smaller R corresponds to smaller outage probability, which is achieved by more diversity or more transmit antennas. For general MIMO systems without CSIT, the outage capacity of the MIMO fading channel with the (truncated) white transmit covariance is achieved by vertical encoding (e.g., space-time code) and iterative receiver. 95

If horizontal encoding is used along with linear/nonlinear receivers, a practical design is to assign all data streams/transmit antennas the same transmission rate R/t. However, horizontal encoding with equal-rate allocation in general does not achieve the outage capacity even with the MMSE-SIC receiver and r t. 96

Summary MIMO AWGN channel CSIT-known case: eigenmode transmission and WF power allocation achieve the capacity CSIT-unknown case: horizontal encoding and nonlinear MMSE-SIC receiver achieve the capacity For both cases, spatial multiplexing gain is equal to the rank of MIMO channel matrix MIMO fading channel Ergodic capacity CSIT-known case: eigenmode transmission and WF power allocation based on instantaneous CSI achieve the capacity CSIT-unknown case: horizontal encoding and nonlinear MMSE-SIC 97