Capacity of Block Rayleigh Fading Channels Without CSI

Capacity of Block Rayleigh Fading Channels Without CSI Mainak Chowdhury and Andrea Goldsmith, Fellow, IEEE Department of Electrical Engineering, Stanford University, USA Email: mainakch@stanford.edu, andrea@wsl.stanford.edu Abstract A system with a single antenna at the transmitter and receiver and no channel state information at either is considered. The channel experiences block Rayleigh fading with a coherence time of T symbol times and the fading statistics are assumed to be known perfectly. The system operates with a finite average transmit power. It is shown that the capacity optimal input distribution in the T -dimensional space is the product of the distribution of an isotropically-distributed unit vector and a distribution on the norm in the T -dimensional space which is discrete and has a finite number of points in the support. Numerical evaluations of this distribution and the associated capacity for a channel with fading and Gaussian noise for a coherence time T = are presented for representative SNRs.It is also shown numerically that an implicit channel estimation is done by the capacity-achieving scheme. Index Terms Block fading channels, no CSI, capacityachieving input distribution, noncoherent communications I. INTRODUCTION Channel estimation and the subsequent use of the channel estimates for data transmission lies at the basis of many wireless communication systems in use today. In this work, we explore an alternative paradigm where channel estimation and data transmission are not performed one after the other but rather jointly with the end-goal of maximizing the data rate. This maximum data rate equals the channel s Shannon capacity under the assumption that only the channel statistics are known at the transmitter and the receiver. Capacity results in this setting are few and far between. However, there is a rich history of work investigating many special cases. One such example is the finite state Markov channel, whose capacity under no CSI was studied in [1], []. In these works, the Markov property of the channel was used both to compute good bounds for the capacity as well as exact capacities for some special classes of channels. Capacity of i.i.d. as well as block fading channels with no CSI (but with perfect knowledge of the channel statistics) has also been extensively investigated. In particular, the capacity-achieving distribution for i.i.d. Rayleigh and Ricean fading channels without CSI was derived in [3], []. Based on a characterization of the Karush- Kuhn-Tucker (KKT) conditions associated with the convex optimization problem of maximizing the mutual information of these channels, the authors established that the optimal capacity-achieving input distribution is discrete with a finite number of mass points in the norm. A series of fundamental contributions were made starting in the early s on the capacity of block fading channels without CSI at the transmitter or the receiver, also called noncoherent channels. One such contribution is the notion of unitarily invariant codes proposed in [5] and [] for noncoherent MIMO channels. In the asymptotically large SNR regimes, the capacity-achieving schemes depend only on the fading distribution and perform space-time coding over the Grassman manifold associated with the channel matrix. Multiuser counterparts of these ideas can be found in [7], []. Results about optimal random codes in general block fading channels for low-to-moderate SNR regimes are harder to come by, since the codes depend not only on the fading distribution but also on the noise distribution. One example of such a work is [9]. In this work, the authors established that the probability distribution of the error-exponent optimal random block code for a SISO channel is supported on a finite number of discrete mass points in the norm of the block code. In our work we investigate the capacity and capacityachieving distribution of a block fading model with a coherence time T > 1 and any SNR. Our analysis determines that, similar to known results for T = 1 and for the error exponent-optimal distribution for T > 1, the capacity for T > 1 is achieved by a distribution on the norm x which is supported on a finite number of mass points. Based on this observation, we present numerical results for the capacity and the capacity-achieving distributions for channels with fading and noise and a coherence time of T. We find, based on the capacity-achieving distribution for T =, that sequential channel estimation using pilot symbols and subsequent data transmission achieves strictly lower data rates than the capacity. We also examine the mutual information between the channel output and the channel state under the capacity-achieving distribution. We show that this mutual information is non-zero and increases with SNR, which indicates that some form of implicit channel estimation is inherent in optimal decoding. These results are relevant to signal design in many existing or emerging wireless systems where, on the one hand, the effects of an imprecise channel estimate on achievable data rates are poorly understood and, on the other hand, precise channel state information may be expensive to acquire. In such cases, joint channel estimation and data transmission or just noncoherent transmission may be better than separate channel estimation and transmission. A line of work exploring the cost of separate channel estimation is [1]. Specifically, this work explores the utility

of channel state information under schemes which involve separate channel estimation and transmission (henceforth referred to as partially-coherent schemes). This work assumes a certain channel estimation overhead, and makes precise various aspects of the optimal learning overhead needed to achieve good performance. A surprising outcome from this line of work is that the overhead needed to achieve good rates (measured by the capacity) is often not that large. Our results suggest in addition that, even when the coherence times are small, the channel output when data symbols are transmitted already contains information about the channel state. This suggests that a form of joint channel estimation and data transmission might achieve better performance in practice than the commonly-used pilot-based channel estimation. The rest of the paper is organized as follows. We present the system model in Section II, describe some properties of the output distribution in Section III and characterize the structure of the optimal capacity-achieving input distribution in Section IV. Based on a numerical optimization of these expressions for our channel model, we present the capacity and the capacityachieving distribution as a function of the SNR in Section V-A. We discuss the implications of our results relative to pilotbased channel estimation in Section V-B and finally present our concluding thoughts in Section VI. II. SYSTEM MODEL We consider one single antenna transmitter and one single antenna receiver. The system across a single block of T symbol times may be represented as y = hx + ν (1) with y, ν R T, h R, x R T. Each ν i N (, σ ), and h N (, 1). We restrict attention to real-valued channel coefficients for simplicity of the exposition and the numerical optimization. Extensions to complex domains follow very similar lines and are presented in the extended version of this work [11]. We use capital letters to refer to a random variable and lowercase letters to refer to a realization. We use p Y ( ) to refer to the density function of the continuous random variable Y, and µ X ( ) to refer to the probability measure on random variable X. We assume a block fading model with a coherence time T. We assume no instantaneous CSI at the transmitter or the receiver, an average transmit power of 1, and that the receiver doesn t know the instantaneous channel realization at the beginning of each new channel realization. We consider coding across blocklengths and seek to understand the optimal signaling strategies to achieve capacity. Note that since this channel can be thought of as a memoryless system p Y x ( ), the fundamental limits for the achievable rates of this system is achieved by a distribution on the T -dimensional space of all possible inputs over T time slots (i.e., a space-time random code). The channel in Fig. 1 is completely specified by the conditional density p Y x ( ) which in turn is specified as follows: if x R T Channel Fig. 1: The system model y R T y R T refers to the T -dimensional output of the channel, then, given x, y is distributed as y N (, Σ x ), where the (q, r) th entry of the matrix Σ x is specified by Σ x,q,r = x r x q + σ I(q = r), with the indicator function I equal to 1 if the condition is satisfied and zero otherwise. III. PROPERTIES OF THE COVARIANCE MATRIX Σ x In this section we point out some properties of the covariance matrix Σ x in addition to the ones listed in Sections II-C and IV in [1]. These properties are useful in understanding the nature of the optimal input distributions and are also used in establishing the results in Lemma 1. More specifically, the positive definiteness of the matrix Σ x at all points in the domain is used to establish the existence of the linear transformation used to establish a contradiction. Proofs of the identities listed below have been included in the extended version of this work [11]. (a) Σ x has T 1 eigenvalues with value σ and a single eigenvalue with value x + σ. (b) The (unnormalized) i th eigenvector corresponding to the first T 1 eigenvalues of σ are along ( x i+1 /x 1, e i ), where e i is the unit row vector with only one nonzero entry (unity) at position i. The T th eigenvector is along (x 1 /x T,..., x T 1/x T, 1). (c) Σ x is positive definite. (d) The determinant of Σ x is a function of x. IV. CHARACTERIZING THE CAPACITY-ACHIEVING DISTRIBUTION The problem of maximizing mutual information for the channel described in Fig. 1 can be written as or equivalently as sup µ X ( ) I(Y; X) subject to 1 T x dµ X (x) 1, inf I(Y; X) µ X ( ) subject to 1 x dµ X (x) 1, T The above optimization is performed over all distributions µ X ( ). I(Y; X) is the mutual information between Y and X and can be expressed as I(Y; X) = E Y [ log(p Y ( )] E X [h(y X = x)], where h(y X = x) is the differential entropy of Y given a fixed value x. The first expectation is () (3)

performed with respect to the distribution induced on Y by the distribution µ X ( ), i.e., p Y ( ) = p Y x ( )dµ X (x). Many structural properties of the capacity-achieving distribution have been derived in [1]. According to this work, the capacity-achieving distribution is the product of the distributions associated with the isotropically-distributed unitary vectors in the T -dimensional space, together with a distribution on the norm r = x. The rest of the discussions in this section focuses on finding the optimal distribution associated with the norm r = x. We observe that the objective function in (3) is convex in µ X ( ). It can also be shown that the limit point of any sequence {µ (n) x ( )} n= of measures lying in S = {µ( ) : x dµ(x) T } also lies in S. Thus the infimum in (3) is attained by an optimal µ X ( ). Necessary and sufficient conditions for the optimality of the solution µ X ( ) can be obtained by writing down the KKT conditions. In particular, the Lagrangian L(µ X ( ), λ 1, λ ) of the above optimization problem can be expressed as L(µ X ( ), λ 1, λ ) ( ) ( ) = p Y x (y)dµ X (x) log p Y x (y)dµ X (x) dy y +.5 log((πe) T Σ x )dµ X (x)+ x ( ) ( ) λ 1 x dµ X (x) T + λ dµ X (x) 1, where λ 1 R + and λ R. In the above we used the fact that h(y X = x) =.5 log((πe) T Σ x ). The first-order necessary condition for the optimal µ X ( ) states that whenever µ X ( ) assigns positive measure to a neighborhood around x (i.e, µ X (Bx δ ) >, for all δ < δ, where δ > and Bδ x {z : z x < δ}), the following must hold: (1 + log(p Y(y))) p Y x (y)dy y () +.5 log((πe) T Σ x ) + λ 1 x + λ g(p Y( ), x) =, where p Y is the distribution on y induced by µ X( ), and g(, ) is appropriately defined for the above relation to hold. We now state some properties of g(, ). These may be proved by observing that µ X ( ) is only a function of r = x (referred to as µ R ( ) afterwards), which in turn follows from the results in [5]. Lemma 1. The following hold: (a) If there exists an x such that for any neighborhood around x (i.e., Bδ x : {z : z x δ} for any positive δ), g(p Y ( ), x) is zero at some point inside the neighborhood, then p Y ( ) cannot be a valid probability distribution. (b) There exists an R < such that g(p Y ( ), x) > for all x such that x > R. (5) (c) The optimal distribution µ X ( ) assigns a non-zero measure to. Proof Sketch. We present a brief sketch of the proofs below. Proof details are presented in the extended version of this manuscript [11]. (a) This follows from the fact that if such a case exists then, in particular, there exists a linear transformation for which g(, x) is zero in an interval around x along a transformed coordinate. Thus by the Identity Theorem from complex analysis [13], g(, ) is identically zero along that coordinate. Then one can use methods very similar to those used in [3] to argue how the probability density function p Y ( ) is nonintegrable (i.e., observing that the relation defines a Laplace transform and that the relation can be inverted uniquely to a non-integrable distribution as described in Section IV-A in [3]). (b) We first observe that under a power constraint, log(p Y(y))p Y x (y)dy is bounded by a term logarithmic in the norm of x and that λ is fixed regardless of x. The result follows by noting that, since λ 1 >, as x, λ x c log( x + σ ) is unbounded for any finite constant c and hence cannot be equal to zero. (c) This may be established by contradiction. If all points in the support of the optimal distribution have a norm greater than zero, then, by arguments similar to those in [3], the mutual information is increased by bringing any coordinate closer to zero, while meeting the power constraint. The following corollary results from using this lemma together with results from real and complex analysis: Corollary 1. The support of µ R ( ) corresponding to the optimal µ X ( ) is bounded and finite in r = x. Proof. The invariance under unitary transformations follows from [1]. The support is discrete in x because otherwise (a) of Lemma 1 would imply a contradiction. The support of µ X ( ) and µ R ( ) is bounded by (b) of Lemma 1. The number of points with a non-zero probability mass in x is finite because otherwise, by the Bolzano Weierstrass theorem, we have a limit point and (a) of Lemma 1 would again imply a contradiction. Note that the arguments used to establish this result are very similar to the arguments presented in [3]. The only difference in analysis is due to the fact that T > 1. To establish the result in this case, we apply a linear transformation (at a limit point of a sequence of points with a non zero probability measure under µ X ( )) to reduce it to the case considered in [3] and hence establish a contradiction.

.9 1 1 Radial probability ma Capacity in bits per symbol time 1.5 1 1.5 ss 1 1 1 SNR Fig. : Capacity vs. SNR for T = 1 (dotted) and T = (solid) Based on Corollary 1, we now proceed to compute the capacity and capacity-achieving distribution for our channel model. In Section V-B we point out connections of these results with channel estimation...7..5..3..1. x1 x Fig. 3: µ X ( ) for SNR =.5. Blue cylinders represent the product distribution over two time slots based on the p?x for T = 1, whereas the red cylinders represents the optimal distribution for T =. The height of the cylinders is proportional to the mass at a particular radius (T = ) or at a particular point (T = 1). The blue cylinders are staggered slightly for visibility. V. N UMERICAL RESULTS In this section we consider an average transmit power of 1 unit and the SNR to be completely specified by the noise variance σ, i.e., SNR = σ1. We study the effects of SNR on the capacity of the block fading channel as well as on the information that can be extracted about the channel for the capacityachieving distribution. These distributions are specified by a finite number of support points in kxk and their corresponding probability mass functions. To obtain these distributions, the number of points in the capacity-achieving distribution was increased until, within the tolerances, the mutual information did not increase further and the dual variables λ1 and λ stayed the same. The optimizations were performed using the fmin_slsqp routine in SciPy [1]. Multiple random starting points were used to test the numerical stability of the optimization problem and the optimization routine; the capacity was found to be the same regardless of the starting point whenever the optimization completed successfully. We present capacity results in Section V-A and discuss the implications for channel estimation in Section V-B. A. Capacity results We first show how the capacity of the channel per symbol time changes with increasing SNR in Fig.. We note that, as expected, coding across time improves performance on the order of a few dbs of coding gain. We next present a visualization of the optimal µ X in the D space of all x in Figs. 3 and. Note that these figures specify µ X for both T = 1 (i.i.d.) and T = (block fading). We observe that there is a significant mass point at x =, for both T = 1 and T =. We also observe that the capacity-achieving distribution for T = 1 is not optimal for T =. Moreover, in the domain of all x R, a pilot-based scheme performing channel estimation only would correspond to just a single point with a probability mass of 1, whereas a scheme corresponding to pilot-based channel estimation and subsequent use of Gaussian codebooks would correspond to a distribution supported on a one-dimensional line in the D space. The observation that the capacityachieving scheme in Figs. 3- is neither of these demonstrates the suboptimality (from a capacity point of view) of pilotbased channel estimation to maximize the achievable rates of the block fading channel. B. Information about channel state Many existing communication systems separately estimate the channel using pilot symbols and then use the estimate for subsequent data transmission either assuming that the estimate is perfect or by modeling the channel estimation error. In this section, we discuss how the capacity-achieving distribution can inform channel estimation. In Fig. 5 we plot I(H; Y) under the optimal signaling distribution µ X for different SNRs and compare it with the mutual information I(H; Y) computed under the distribution corresponding to using pilots for channel estimation with T symbol times, namely, µ X ( ) = 1 if and only if x = x, where x is a vector whose norm satisfies the power con-

. ss Radial probability ma.7..5..3..1. x1 x Fig. : µ X ( ) for SNR = 1. Same comments as those in the caption of Fig. 3. 1 transmitter or the receiver, but with knowledge of the fading statistics. We establish that, similar to known results for the capacity of the Rayleigh and Ricean i.i.d. fading channel, the capacity of the block Rayleigh fading channel is also achieved by an input distribution µx ( ) which is only a function of r = kxk and in which the measure µr ( ) on the norm r is finite and discrete in r. We use this result to present numerical estimates of the capacity and the corresponding capacityachieving distributions and demonstrate numerically that pilotbased channel estimation has strictly lower rates than capacity. In addition, our numerical results show that under the capacityachieving distribution, the mutual information between the channel state and output is non-zero, thereby suggesting that channel estimation is implicitly performed by the capacityoptimal decoder. Further investigation of this phenomenon and its implications for practical system design are topics for future work. R EFERENCES [1] [] I(H; Y) [3] [] 1 1 [5] 1 1 1 [] SNR Fig. 5: Mutual information between the channel and the channel output with pilot symbols (dashed line) and with µ X (solid line) for T =. [7] [] straints. The latter is just the AWGN capacity expression.5 log (1 + T SNR). We observe from the figure that even with the capacityachieving distribution µ x, the information content about the channel h in the output y measured by the mutual information I(H; Y) is nonzero. This suggests that the capacity-achieving input distribution also allows information about the channel state to be obtained at the decoder even without any pilot symbols. This has implications for both the theory and practice of joint channel estimation and data transmission. The figures show conclusively that, ignoring computational complexity constraints, data transmission at channel capacity does not preclude channel estimation. VI. C ONCLUSIONS We consider the capacity of a block Rayleigh fading channel without instantaneous channel state information at either the [9] [1] [11] [1] [13] [1] M. Mushkin and I. Bar-David, Capacity and coding for the Gilbert-Elliott channels, IEEE Trans. Inf. Theory, vol. 35, no., pp. 177 19, 199. A. J. Goldsmith and P. P. Varaiya, Capacity, mutual information, and coding for finite-state Markov channels, IEEE Trans. Inf. Theory, vol., no. 3, pp., 199. I. C. Abou-Faycal et al., The capacity of discrete-time memoryless Rayleigh-fading channels, IEEE Trans. Inf. Theory, vol. 7, no., pp. 19 131, 1. M. C. Gursoy et al., The noncoherent Rician fading channelpart I: Structure of the capacity-achieving input, IEEE Trans. Wireless Commun., vol., no. 5, pp. 193, 5. B. M. Hochwald and T. L. Marzetta, Unitary space-time modulation for multiple-antenna communications in Rayleigh flat fading, IEEE Trans. Inf. Theory, vol., no., pp. 53 5,. L. Zheng and D. N. C. Tse, Communication on the grassmann manifold: A geometric approach to the noncoherent multipleantenna channel, IEEE Trans. Inf. Theory, vol., no., pp. 359 33,. S. Shamai and T. L. Marzetta, Multiuser capacity in block fading with no channel state information, IEEE Trans. Inf. Theory, vol., no., pp. 93 9,. S. Murugesan et al., Optimization of training and scheduling in the non-coherent SIMO multiple access channel, IEEE Journal on Selected Areas in Communications, vol. 5, no. 7, pp. 1 15, 7. I. Abou-Faycal and B. M. Hochwald, Coding requirements for multiple-antenna channels with unknown Rayleigh fading, Bell Labs Technical Memo, 1999. N. Jindal and A. Lozano, Optimum pilot overhead in wireless communication: A unified treatment of continuous and blockfading channels, ArXiv preprint arxiv:93.1379, 9. M. Chowdhury and A. Goldsmith, Capacity of block fading SIMO channels without CSI, To be submitted. T. L. Marzetta and B. M. Hochwald, Capacity of a mobile multiple-antenna communication link in rayleigh flat fading, IEEE Trans. Inf. Theory, vol. 5, no. 1, pp. 139 157, 1999. W. Rudin, Real and complex analysis. Tata McGraw-Hill Education, 197. E. Jones et al., Scipy: Open source scientific tools for Python, [Online; accessed 1-5-], 1. [Online]. Available: http://www.scipy.org/.