Feedback Capacity of the First-Order Moving Average Gaussian Channel

Similar documents
On the Feedback Capacity of Stationary Gaussian Channels

The Feedback Capacity of the First-Order Moving Average Gaussian Channel

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 7, JULY More precisely, encoding functions X : f1;...; 2 g2! ; i = 1; 2;...;n.

Cooperative Communication with Feedback via Stochastic Approximation

WE consider a communication scenario in which

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Approximately achieving the feedback interference channel capacity with point-to-point codes

On the Reliability of Gaussian Channels with Noisy Feedback

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information

Secure Degrees of Freedom of the MIMO Multiple Access Wiretap Channel

Generalized Writing on Dirty Paper

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Linear Codes, Target Function Classes, and Network Computing Capacity

Wideband Fading Channel Capacity with Training and Partial Feedback

Optimum Power Allocation in Fading MIMO Multiple Access Channels with Partial CSI at the Transmitters

Universal Anytime Codes: An approach to uncertain channels in control

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Lecture 5 Channel Coding over Continuous Channels

Lecture 4 Noisy Channel Coding

ELEC546 Review of Information Theory

Error Exponent Region for Gaussian Broadcast Channels

224 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 1, JANUARY 2012

WE study the capacity of peak-power limited, single-antenna,

Refinement of the outer bound of capacity region in Gaussian multiple access channel with feedback

This research was partially supported by the Faculty Research and Development Fund of the University of North Carolina at Wilmington

Feedback Stabilization over a First Order Moving Average Gaussian Noise Channel

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

Concatenated Coding Using Linear Schemes for Gaussian Broadcast Channels with Noisy Channel Output Feedback

Approximate Capacity of Fast Fading Interference Channels with no CSIT

On Multiple User Channels with State Information at the Transmitters

Lecture 4 Channel Coding

Dispersion of the Gilbert-Elliott Channel

Energy State Amplification in an Energy Harvesting Communication System

Multiuser Capacity in Block Fading Channel

Research Collection. The relation between block length and reliability for a cascade of AWGN links. Conference Paper. ETH Library

On Dependence Balance Bounds for Two Way Channels

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks

392D: Coding for the AWGN Channel Wednesday, January 24, 2007 Stanford, Winter 2007 Handout #6. Problem Set 2 Solutions

Capacity of Memoryless Channels and Block-Fading Channels With Designable Cardinality-Constrained Channel State Feedback

On the Sufficiency of Power Control for a Class of Channels with Feedback

On the Duality between Multiple-Access Codes and Computation Codes

On the complexity of maximizing the minimum Shannon capacity in wireless networks by joint channel assignment and power allocation

Capacity Theorems for Relay Channels

Multiplicativity of Maximal p Norms in Werner Holevo Channels for 1 < p 2

Towards control over fading channels

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Capacity of a Two-way Function Multicast Channel

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Steiner s formula and large deviations theory

The Poisson Channel with Side Information

Arimoto Channel Coding Converse and Rényi Divergence

A Half-Duplex Cooperative Scheme with Partial Decode-Forward Relaying

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Appendix B Information theory from first principles

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

A Comparison of Two Achievable Rate Regions for the Interference Channel

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

Multi-Robotic Systems

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

AN INTRODUCTION TO SECRECY CAPACITY. 1. Overview

On the variable-delay reliability function of discrete memoryless channels with access to noisy feedback

FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY

Lecture Notes 1: Vector spaces

Variable-Rate Universal Slepian-Wolf Coding with Feedback

Optimal Sequences and Sum Capacity of Synchronous CDMA Systems

ECE Information theory Final (Fall 2008)

High-dimensional graphical model selection: Practical and information-theoretic limits

Recursions for the Trapdoor Channel and an Upper Bound on its Capacity

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Optimal Power Allocation for Parallel Gaussian Broadcast Channels with Independent and Common Information

Computation of Information Rates from Finite-State Source/Channel Models

Transmit Directions and Optimality of Beamforming in MIMO-MAC with Partial CSI at the Transmitters 1

On the Secrecy Capacity of the Z-Interference Channel

Minimum Feedback Rates for Multi-Carrier Transmission With Correlated Frequency Selective Fading

Upper Bounds on the Capacity of Binary Intermittent Communication

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

REDUCING POWER CONSUMPTION IN A SENSOR NETWORK BY INFORMATION FEEDBACK

Single-User MIMO systems: Introduction, capacity results, and MIMO beamforming

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

This examination consists of 11 pages. Please check that you have a complete copy. Time: 2.5 hrs INSTRUCTIONS

Joint Source-Channel Coding for the Multiple-Access Relay Channel

Lecture 4. Capacity of Fading Channels

On the Throughput, Capacity and Stability Regions of Random Multiple Access over Standard Multi-Packet Reception Channels

K User Interference Channel with Backhaul

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Vector Channel Capacity with Quantized Feedback

UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, Solutions to Homework Set #5

Degrees of Freedom Region of the Gaussian MIMO Broadcast Channel with Common and Private Messages

ADAPTIVE FILTER THEORY

RCA Analysis of the Polar Codes and the use of Feedback to aid Polarization at Short Blocklengths

ACOMMUNICATION situation where a single transmitter

Optimum Transmission Scheme for a MISO Wireless System with Partial Channel Knowledge and Infinite K factor

Transcription:

Feedback Capacity of the First-Order Moving Average Gaussian Channel Young-Han Kim* Information Systems Laboratory, Stanford University, Stanford, CA 94305, USA Email: yhk@stanford.edu Abstract The feedback capacity of the stationary Gaussian additive noise channel has been open, except for the case where the noise is white. Here we obtain the closed-form feedback capacity of the first-order moving average additive Gaussian noise channel. Specifically, the channel is given by Y i = X i + Z i, i =,2,..., where the input {X i} satisfies average power constraint and the noise {Z i} is a first-order moving average Gaussian process defined by Z i = α U i + U i, α, with white Gaussian innovation {U i} i=0. We show that the feedback capacity of this channel is log x 0, where x 0 is the unique positive root of the equation ρ x 2 = ( x 2 )( α x) 2, and ρ is the ratio of the average input power per transmission to the variance of the noise innovation U i. The optimal coding scheme parallels the simple linear signalling scheme by Schalkwijk and Kailath for the additive white Gaussian noise channel; the transmitter sends a real-valued information-bearing signal at the beginning of communication, then subsequently processes the feedback noise process through a simple linear stationary first-order autoregressive filter to help the receiver recover the information by maximum likelihood decoding. The resulting probability of error decays doubly exponentially in the duration of the communication. This feedback capacity of the first-order moving average Gaussian channel is very similar in form to the best known achievable rate for the first-order autoregressive Gaussian noise channel studied by Butman, Wolfowitz, and Tiernan, although the optimality of the latter is yet to be established. I. INTRODUCTION Consider the additive Gaussian noise channel with feedback as depicted in Figure. The channel Y i = X i + Z i, W MESSAGE TRANSMITTER Fig.. X i (W, Y i ) UNIT DELAY Z i Z n N n(0, K Z ) Y i RECEIVER Gaussian channel with feedback. DECODED MESSAGE i =, 2,..., has additive Gaussian noise Z, Z 2,..., where Z n = (Z,...,Z n ) N n (0, K Z ). We wish to communicate a message W {, 2,...,2 nr } reliably over the channel Y n = X n + Z n. The channel output is causally fed back to the transmitter. We specify a (2 nr, n) code with the codewords (X (W), X 2 (W, Y ),..., X n (W, Y n )) satisfying * This work was partly supported by NSF Grant CCR-03633. Ŵ the expected power constraint E Xi 2 (W, Y i ) P, () n i= and decoding function Ŵn : R n {, 2,...,2 nr }. The probability of error P e (n) is defined by P (n) e := Pr{Ŵn(Y n ) W }, where the message W is independent of Z n and is uniformly distributed over {, 2,...,2 nr }. We define the n-block feedback capacity C n,fb as C n,fb = max 2n log det((b + I)K Z(B + I) T + K V ), (2) det(k Z ) where the maximization is over all n n nonnegative definite matrix K V and n n strictly lower triangular matrix B such that tr(bk Z B T + K V ) np. In Cover and Pombra [], it is shown that if for every ǫ > 0, there exists a sequence of (2 n(cn,fb ǫ), n) codes with P e (n) 0 as n, and for every ǫ > 0 and any sequence of codes with 2 n(cn,fb+ǫ) codewords, P e (n) is bounded away from zero for all n. When the noise process {Z n } is stationary, the n-block capacity is super-additive in the sense that n C n,fb + m C m,fb (n + m) C n+m,fb, for all n, m. Then the feedback capacity C FB is well-defined as C FB = lim n C n,fb. To obtain a closed-form expression for the feedback capacity C FB, however, we need to go further than (2) since the above characterization does not give us any hint on the sequence (in n) of (B, K V ) maximizing C n,fb or its limiting behavior. In this paper, we study in detail the case where the additive Gaussian noise process {Z i } i= is a moving average process of order one (MA()). We define the Gaussian MA() noise process {Z i } i= with parameter α, α, as Z i = α U i + U i, (3) where {U i } i=0 is a white Gaussian innovation process. Without loss of generality, we will assume that U i, i = 0,,..., has unit variance. Note that the condition α is not restrictive. When α >, it can be readily verified that the process {Z i } has 0-7803-95-9/05/$20.00 2005 IEEE 46 ISIT 2005

the same distribution as the process { Z i } defined by Z i = α(β U i +U i ),where the moving average parameter β is given by β = /α, thus β <. We state the main theorem, the proof of which will be given in Section II. Theorem : For the additive Gaussian MA() noise channel Y i = X i + Z i, i =, 2,..., with the Gaussian MA() noise process {Z i } defined in (3), the feedback capacity C FB under the power constraint n i= EX2 i np is given by C FB = log x 0, where x 0 is the unique positive root of the fourth-order polynomial P x 2 = ( x 2 )( α x) 2. (4) As will be shown later in Sections II and III, the feedback capacity C FB is achieved by an asymptotically stationary ergodic input process {X i } satisfying EXi 2 = P for all i. Thus by ergodic theorem, the feedback capacity does not diminish under a more restrictive power constraint n n i= X2 i (W, Y i ) P. The literature on Gaussian feedback channels is vast. We mention some of prior works closely related to our main discussion. In earlier work, Schalkwijk and Kailath [2], [3] considered the feedback over the additive white Gaussian noise channel, and proposed a simple linear signalling scheme that achieves the feedback capacity. The coding scheme by Schalkwijk and Kailath can be summarized as follows: Let θ be one of 2 nr equally spaced real numbers on some interval, say, [0, ]. At time k, the receiver forms the maximum likelihood estimate ˆθ k (Y,...,Y k ) of θ. Using the feedback information, at time k +, we send X k+ = γ k (θ ˆθ k ), where γ k is a scaling factor properly chosen to meet the power constraint. After n transmissions, the receiver finds the value of θ among 2 nr alternatives that is closest to ˆθ n. This simple signalling scheme, essentially without any coding, achieves the feedback capacity. As is shown by Shannon [4], feedback does not increase the capacity of memoryless channels. The benefit of feedback, however, does not consists of the simplicity of coding only. The probability of decoding error of the Schalkwijk Kailath scheme decays doubly exponentially in the duration of communication, compared to the exponential decay for the nonfeedback scenario. Butman [5] generalized the linear coding scheme of for white noise processes to autoregressive (AR) noise processes. For first-order autoregressive (AR()) processes {Z i } i= with regression parameter α, α <, defined by Z i = αz i +U i, he obtained a lower bound on the feedback capacity as log x 0, where x 0 is the unique positive root of the fourthorder polynomial P x 2 = ( x2 ) ( + α x) 2. (5) This rate has been shown to be optimal among a certain class of linear feedback schemes by Wolfowitz [6] and Tiernan [7] and is strongly believed to be the capacity of the AR() feedback capacity. Tiernan and Schalkwijk [8] found an upper bound of the AR() feedback capacity, which meets Butman s lower bound for very low and very high signal-to-noise ratio. Butman [9] also obtained capacity upper and lower bounds for AR processes with higher order. For the case of moving average (MA) noise processes, there are far fewer results in the literature, although MA processes are usually more tractable than AR processes of the same order. Ozarow [0], [] gave upper and lower bounds of the feedback capacity for AR() and MA() channels and showed that feedback strictly increases the capacity. Substantial progress was made by Ordentlich [2]; he observed that K V in (2) is at most of rank k for a MA noise process with order k. He also showed that the optimal (K V, B) necessarily has the property that the current input signal X k is orthogonal to the past outputs (Y,...,Y k ). For the special case of MA() processes, this development, combined with the arguments given by Wolfowitz [6], suggests that a linear signalling scheme similar to the Schalkwijk Kailath scheme be optimal. Our Theorem confirms the optimality of the modified Schalkwijk Kailath scheme and provides the first proof of feedback capacity for any nonwhite stationary colored Gaussian channels. This development links the Cover Pombra formulation of the feedback capacity with the Schalkwijk Kailath scheme and its generalizations to stationary colored channels, and casts new hope on the optimality of the achievable rate for the AR() channel obtained by Butman [5]. A recent report by Yang, Kavčić, and Tatikonda [3] studies the feedback capacity of the general ARMA(k) case using the state-space model and offers a conjecture on the feedback capacity as a solution to an optimization problem that does not depend on the horizon n. For the special case k = with the noise process {Z i } i= defined by Z i = β Z i + α U i + U i, α, β < they conjecture that the Schalkwijk Kailath Butman scheme is optimal. The corresponding achievable rate can be written in a closed form as log x 0, where x 0 is the unique positive root of the fourth-order polynomial and P x 2 = ( x2 )( σαx) 2 ( + σβx) 2 σ = {, α + β 0,, α + β < 0. By taking β = 0 or α = 0, we can easily recover (4) and (5), respectively. Thus, our Theorem gives a partial confirmation to the Yang Kavčić Tatikonda conjecture. II. PROOF OF THEOREM We give a sketch of the proof. Refer to [4] for details. Proof outline. We first transform the optimization problem given in (2) to a series of (asymptotically) equivalent forms. Then we 47

solve the problem by imposing individual power constraints (P,...,P n ) on each input signal. Subsequently we optimize over (P,...,P n ) under the average power constraint P + + P n np. Using Lemma below, we prove that the uniform power allocation P = = P n = P is asymptotically optimal. This will lead to a closed-form solution given in Theorem. Step. Transformations into equivalent optimization problems. Recall that we wish to solve the optimization problem: maximize log det((b + I)K Z (B + I) T + K V ) (6) over all nonnegative definite K V and strictly lower triangular B satisfying tr(bk Z B T + K V ) np. We approximate the covariance matrix K Z of the given MA() noise process with parameter α by another covariance matrix K Z. Define K Z = H Z HZ T where the lower-triangular Toeplitz matrix H Z is given by (H Z ) ij =, i = j, α, i = j +, 0, otherwise. The matrix K Z is a covariance matrix of the noise process, conditioned on the zeroth innovation U 0. It is not hard to verify that there is no asymptotic difference in capacity between the channel with the original noise covariance K Z and the channel with K Z. Throughout we will assume that the noise covariance matrix of the given channel is K Z. Now by identifying K V with F V FV T for some lowertriangular F V and identifying F Z with BH Z for some strictly lower-triangular F Z, we transform the optimization problem (6) with new variables (F V, F Z ) as maximize log det(f V F T V + (F Z + H Z )(F Z + H Z ) T ) subject to tr(f V F T V + F ZF T Z ) np. We shall use 2n-dimensional row vectors f i and h i, to denote the i-th row of F := [F V F Z ] and H := [ 0 n n H Z ], respectively. There is an obvious identification between the time-i input signal X i and the vector f i, for we can regard f i as a point in the Hilbert space with the innovations of V n and Z n as a basis. We can similarly identify Z i with h i and Y i with f i + h i. We also introduce new variables (P,...,P n ) representing the power constraint for each input f i. Now the above optimization problem becomes the following equivalent form: maximize log det((f + H)(F + H) T ) subject to f i 2 P i, i =,...,n, n i= P i np. Here denotes the Euclidean norm of a 2n-dimensional vector. Note that the variables (f,...,f n ) should satisfy the relevant triangularity conditions inherited from (F V, F Z ). Step 2. Optimization under the individual power constraint for each signal. (7) We solve the optimization problem (7) in (f,...,f n ) after fixing (P,...,P n ). We need some notation first. We define an n-by-2n matrix S := F + H, and denote by S k the k-by-2n submatrix of S consisting of the first k rows of S. We also define a sequence of 2n-by-2n matrices {Π k } n k= as Π k = I Sk T(S ksk T) S k. Note that S k is of full rank and thus that (S k Sk T) always exists. We can view Π k as a map of a 2n-dimensional row vector (acting from the right) to its component orthogonal to the subspace spanned by the rows s,...,s k of S k. (Or Π k maps a generic random variable A to A E(A Y k ).) We observe that, for k = 2,...,n, det(s k S T k ) = det(s k S T k ) s k Π k s T k. (8) Finally we define the intermediate objective functions of the maximization (7) as J k (P,...,P k ) := max log det(s k Sk T ), f,...,f k f i 2 P i so that C n,fb = max P Pi: P i np 2n J n(p,...,p n ). From (8), we have k =,...,n, ( log det(sn S T n ) + log(s n Π n s T n) ) J n = max f n ( = max log det(sn Sn ) T + max log(s n Π n s T n) ). f n f n (9) For fixed (f,...,f n ) we can easily maximize the second term of (9) by some fn satisfying fn = fn Π n. Furthermore, thanks to the special structure of the noise covariance matrix K Z, we can show that both terms of (9) are maximized by the same (f,...,fn ). In other words, if (f,...,fn ) maximizes J n (P,...,P n ), then (f,...,fn, fn) maximizes J n (P,...,P n ) for some fn. Thus the maximization for J n can be solved in a greedy fashion by sequentially maximizing J, J 2,...,J n through f, f2,...,fn, whence we obtain the recursive relationship ( ) 2 Pk J k = J k + log + + α, for k, with J 0 = J = 0. e J k J k 2 (0) Step 3. Optimal power allocation over time. In the previous step, we solved the optimization problem (7) under a fixed power allocation (P,...,P n ). Thanks to the special structure of the MA() noise process, this brute force optimization was tractable via backward dynamic programming. Here we optimize the power allocation (P,...,P n ) under the constraint n i= P i np, When α = 0, that is, when the noise is white, we can use the concavity of the logarithm to show that, for all n, C n,fb = max P P i: i Pi np 2n = log( + P), 2 log( + P i ) i= 48

with P = = Pn = P. When α 0, it is not tractable to optimize (P,...,P n ) for J n in (0) and get a closedform solution of C n,fb for finite n. The following lemma, however, enables us to figure out the asymptotically optimal power allocation and to obtain a closed-form solution for C FB = lim n C n,fb. Lemma : Let ψ : [0, ) [0, ) [0, ) such that the following conditions hold: (i) ψ(ξ, ζ) is continuous, concave in (ξ, ζ), and strictly concave in ζ; (ii) ψ(ξ, ζ) is increasing in ξ and ζ, respectively; and (iii) for any ζ > 0, there is a unique solution ξ = ξ (ζ) > 0 to the equation ξ = ψ(ξ, ζ). For some fixed P > 0, let {P i } i= be any infinite sequence of nonnegative numbers satisfying lim sup P i P. () i= Let {ξ i } i=0 be defined recursively as ξ 0 := 0 and ξ i = ψ(ξ i, P i ), i =, 2,.... Then lim sup ξ i ξ. i= Furthermore, if P i P, i =, 2,..., then the corresponding ξ i converges to ξ. To apply the above lemma to our problem, we define ( ψ(ξ, ζ) := ) 2 ζ 2 log + + α. e 2ξ The conditions (i) (iii) of Lemma 2 can be easily checked. Now from (0) and Lemma 2, we can show that lim sup n 2n J n(p,...,p n ) = lim sup ξ i ξ, i= where ξ is the unique solution to ( ξ = ψ(ξ, P) = ) 2 P 2 log + + α. e 2ξ Since our choice of {P i } is arbitrary, we conclude that sup lim sup n 2n J n(p,...,p n ) = ξ, (2) where the supremum (in fact, maximum) is over all infinite sequences {P i } satisfying the asymptotic average power constraint (). Finally, we show that ξ is indeed the feedback capacity by observing C FB = lim max n P P i: i Pi np 2n J n(p,...,p n ) (3) = sup lim sup {P i} n 2n J n(p,...,p n ) = ξ. (4) i= (One can justify the interchange of the order of limit and supremum in (3) and (4) using standard techniques in analysis.) We conclude this section by characterizing the capacity C FB = ξ in an alternative form. Let x 0 = exp( ξ ), or equivalently, ξ = log x 0. From (2), it is easy to verify that 0 < x 0 is the unique positive solution to or equivalently, x 2 = + ( P + α x 2 ) 2, P x 2 = ( x 2 )( α x) 2. This establishes the feedback capacity C FB of the first-order moving average additive Gaussian noise channel with parameter α. III. DISCUSSION The derived asymptotically optimal feedback input signal sequence, or equivalently, the sequence of matrices (K (n) V, B(n) ) has two prominent properties. First, the optimal (K V, B) for the n-block can be found sequentially, built on the optimal (K V, B) for the (n )-block. Although this property may sound quite natural, it is not true in general for other channel models. Later in this section, we will see an MA(2) channel counterexample. As a corollary to this sequentiality property, the optimal K V has rank one, which agrees with the previous result by Ordentlich [2]. Secondly, the current input signal X k is orthogonal to the past output signals (Y,...,Y k ). In the notation of Section II, we have f k Sk T = 0. This orthogonality property is indeed a necessary condition for the optimal (K V, B) for any noise covariance matrix K Z. (See [2], [5].) We explore the possibility of extending the current proof technique to a more general class of noise processes. The answer is negative. We comment on two simple cases: MA(2) and AR(). Consider the following MA(2) noise process which is essentially two interleaved MA() processes: Z i = U i + αu i 2, i =, 2,.... It is easy to see that this channel has the same capacity as the MA() channel with parameter α, which can be attained by signalling separately for each interleaved MA() channel. This suggests that the sequentiality property does not hold for this example. Indeed, we sequentially optimize the n-block capacity to obtain the rate log x 0, where x 0 is the unique positive root of the sixth order polynomial P x 2 = ( x 2 )( α x 2 ) 2. It is not hard to see that this rate is strictly less than the MA() feedback capacity unless α = 0. Note that a similar argument can prove that Butman s conjecture on the AR(k) capacity [9, Abstract] is not true in general, for k >. In contrast to MA() channels, we are missing two basic ingredients for AR() channels the optimality of rank-one 49

K V and the asymptotic optimality of the uniform power allocation. Under these two conditions, both of which are yet to be justified, it is known [6], [7] that the optimal achievable rate is given by (5). Despite the striking similarity, or reciprocity, between the polynomial (5) and the polynomial (4) for the MA() feedback, there is a major difficulty in establishing the above two conditions by the two-stage optimization strategy we used in the previous section, namely, first maximizing (f,...,f n ) and then (P,...,P n ). For certain values of individual signal power constraints (P,...,P n ), the optimal (f,...,f n ) does not satisfy the sequentiality, resulting in K V with rank higher than one. Hence we cannot obtain the recursion formula for the n-block capacity [6, Section 5] that corresponds to (0) through a greedy maximization of J n (P,...,P n ). Finally we show that the feedback capacity of the MA() channel can be achieved by using a simple stationary filter of the noise innovation process. At the beginning of the communication, we send X N(0, P). For subsequent transmissions, we use a first-order regressive filter and transmit the filtered version of the noise innovation process as X k = β X k + σu k, k 2. (5) Here β = sgn(α) x 0 with x 0 being the same unique positive root of the fourth-order polynomial (4) in Theorem, and the scaling factor σ is chosen to satisfy the power constraint as σ = sgn(α) P ( β 2 ). This input process and the MA() noise process (3) yield the output process {Y i } i= given by Y = X + αu 0 + U, Y k = β Y k αβ U k 2 + (α β + σ)u k, k 2, which is asymptotically stationary with power spectral density S Y (ω) = β 2 + αβ 2 e jω 2 for {Y k }, k 2. It is easy to check that the entropy rate of the output process {Y k } is given by 2 log(2πeβ 2 ). Hence we achieve the feedback capacity C FB. Furthermore, it can be shown that the mean-square error of X given the observations Y,...,Y n decays exponentially with rate β 2 = 2 2CFB. In other words, var(x Y,...,Y n ). = P 2 2nCFB. (6) We can interpret the signal X k as the adjustment of the receiver s estimate of the message bearing signal X after observing (Y,...,Y k ). The connection to the Schalkwijk Kailath coding scheme is now apparent. Recall that there is a simple linear relationship between the minimum mean square error estimate (in other words, the minimum variance biased estimate) for the Gaussian input X and the maximum likelihood estimate (or equivalently, the minimum variance unbiased estimate) for an arbitrary real input θ. Thus we can easily transform the above coding scheme based on the asymptotic equipartition property [] to the Schalkwijk-like linear coding scheme based on the maximum likelihood nearest neighborhood decoding of uniformly spaced 2 nr points. More specifically, we send as X one of 2 nr possible signals, say, θ Θ := { P, P +, P + 2,..., P 2, P, P }, where = 2 P/(2 nr ). Subsequent transmissions follow (5). The receiver forms the maximum likelihood estimate ˆθ n (Y,...,Y n ) and finds the nearest signal point to ˆθ n in Θ. The analysis of the error for this coding scheme follows Schalkwijk [3] and Butman [5]. From (6) and the standard result on the relationship between the minimum variance unbiased and biased estimation errors, the maximum likelihood estimation error ˆθ n θ is, conditioned on θ, Gaussian with mean θ and variance exponentially decaying with rate β 2 = 2 2nCFB. Thus, the nearest neighbor decoding error, ignoring lower order terms, is given by ( E θ [Pr ˆθ n θ 2 ) ] (. θ = erfc 3 2σ 2 θ 2 n(cfb R) ), where erfc(x) = 2 π x exp( t2 )dt and σθ 2 is the variance of input signal θ chosen uniformly over Θ. As far as R < C FB, the decoding error decays doubly exponentially in n. Note that this coding scheme uses only the second moments of the noise process. This implies that the rate C FB is achievable for the additive noise channel with any non-gaussian noise process with the same covariance structure. REFERENCES [] T. M. Cover and S. Pombra, Gaussian feedback capacity, IEEE Trans. Inform. Theory, vol. IT-35, pp. 37 43, January 989. [2] J. P. M. Schalkwijk and T. Kailath, A coding scheme for additive noise channels with feedback I: No bandwidth constraint, IEEE Trans. Inform. Theory, vol. IT-2, pp. 72 82, April 966. [3] J. P. M. Schalkwijk, A coding scheme for additive noise channels with feedback II: Band-limited signals, IEEE Trans. Inform. Theory, vol. IT- 2, pp. 83 89, April 966. [4] C. E. Shannon, The zero error capacity of a noisy channel, IRE Trans. Inform. Theory, vol. IT-2, pp. 8 9, September 956. [5] S. A. Butman, A general formulation of linear feedback communication systems with solutions, IEEE Trans. Inform. Theory, vol. IT-5, pp. 392 400, May 969. [6] J. Wolfowitz, Signalling over a Gaussian channel with feedback and autoregressive noise, J. Appl. Probab., vol. 2, pp. 73 723, 975. [7] J. C. Tiernan, Analysis of the optimum linear system for the autoregressive forward channel with noiseless feedback, IEEE Trans. Inform. Theory, vol. IT-22, pp. 359 363, May 976. [8] J. C. Tiernan and J. P. M. Schalkwijk, An upper bound to the capacity of the band-limited Gaussian autoregressive channel with noiseless feedback, IEEE Trans. Inform. Theory, vol. IT-20, pp. 3 36, May 974. [9] S. A. Butman, Linear feedback rate bounds for regressive channels, IEEE Trans. Inform. Theory, vol. IT-22, pp. 363 366, May 976. [0] L. H. Ozarow, Random coding for additive Gaussian channels with feedback, IEEE Trans. Inform. Theory, vol. IT-36, pp. 7 22, January 990. [] L. H. Ozarow, Upper bounds on the capacity of Gaussian channels with feedback, IEEE Trans. Inform. Theory, vol. IT-36, pp. 56 6, January 990. [2] E. Ordentlich, A class of optimal coding schemes for moving average additive Gaussian noise channels with feedback, in Proc. IEEE Int. Symp. Inform. Theory, p. 467, June 994. [3] S. Yang, A. Kavcic, and S. Tatikonda, Linear Gaussian channels: feedback capacity under power constraints, in Proc. IEEE Int. Symp. Inform. Theory, p. 72, June 2004. [4] Y.-H. Kim, Feedback capacity of the first-order moving average Gaussian channel, submitted to IEEE Trans. Inform. Theory, November 2004. (Preprint available at www.arxiv.org/abs/cs.it/ 04036.) [5] S. Ihara, On the capacity of the discrete time Gaussian channel with feedback, in Trans. Eighth Prague Conf. Inform. Theory, Czech. Acad. Sci. 979. 420