Coding Along Hermite Polynomials for Gaussian Noise Channels

Coding Along Hermite olynomials for Gaussian Noise Channels Emmanuel A. Abbe IG, EFL Lausanne, 1015 CH Email: emmanuel.abbe@efl.ch Lizhong Zheng LIDS, MIT Cambridge, MA 0139 Email: lizhong@mit.edu Abstract This aer shows that the caacity achieving inut distribution for a fading Gaussian broadcast channel is not Gaussian in general. The construction of non-gaussian distributions that strictly outerform Gaussian ones, for certain characterized fading distributions, is rovided. The ability of analyzing non-gaussian inut distributions with closed form exressions is made ossible in a local setting. It is shown that there exists a secific coordinate system, based on Hermite olynomials, which arametrizes Gaussian neighborhoods and which is articularly suitable to study the entroic oerators encountered with Gaussian noise. I. INTODUCTION Let a memoryless additive white Gaussian noise AWGN) channel be described by Y = X + Z, where Z N 0, v) is indeendent of X. If the inut is subject to an average ower constraint given by EX, the inut distribution maximizing the mutual information is Gaussian. This is due to the fact that under second moment constraint, the Gaussian distribution maximizes the entroy, hence max X:EX = hx + Z) N 0, ). 1) On the other hand, for an additive noise channel, if we use a Gaussian inut distribution, i.e., X N 0, ), among noises with bounded second moment, the noise minimizing the mutual information is again Gaussian. This can be shown by using the entroy ower inequality EI), which reduces in this setting to and imlies min hx + Z) N 0, v) ) Z: hz)= 1 log πev min Z:EZ =v hx + Z) hz) N 0, v). 3) Hence, in the single-user setting, Gaussian inuts are the best inuts to fight Gaussian noise and Gaussian noise is the worst noise to face with Gaussian inuts. This rovides a game equilibrium between user and nature, as defined in [3],. 63. With these results, many roblems in information theory dealing with Gaussian noise can be solved. However, in the networ information theory setting, two interesting new henomena mae the search for the otimal codes more comlex. First, the interference. With the interference, a frustration henomena aears. Assuming Gaussian additive noise for each user, if the users treat interference as noise, the game equilibrium described reviously rovides now a conflicting situation: for inuts that are Gaussian distributed, we also have the worst ind of interference Gaussian distributed). Should interference not be treated as noise, or should non-gaussian ensembles be considered? These are still oen questions. Another interesting difference between the single user and networ setting concerns the fading. In a single user setting, for an AWGN channel with a coherent fading, 1) allows to show that the Gaussian inut distribution achieves the caacity no matter what the fading distribution is). Just lie for the case with no fading. In a Gaussian broadcast channel BC), 1) and 3) allow to show that Gaussian inuts are again caacity achieving. But if we consider a fading Gaussian BC, with a coherent fading reserving the degradedness roerty), there are no nown results using 1) and 3) which allow to rove that the Gaussian inuts are caacity achieving. Either we are missing theorems to rove this assertion or it is simly wrong. In this aer, we show that this assertion is wrong. Even for the following simle structure of coherent Fading Gaussian BC, where Y i = HX + Z i, i = 1, with Z 1 N 0, v 1 ), Z N 0, v ), v 1 < v and H is arbitrarily distributed but is the same for both users; the inut distributions achieving the caacity region boundaries are unnown. Since this is a degraded BC, the caacity region is given by all rate airs IX; Y 1 U, H), IU; Y H)) with U X Y 1, Y ). The otimal inut distributions, i.e., the distributions of U, X) achieving the caacity region boundary, are given by the following otimization, where µ, max IX; Y 1 U, H) + µiu; Y H). 4) U,X): U X Y 1,Y ) EX Note that the objective function in the above maximization is hy 1 U, H) hz 1 ) + µhy H) µhy U, H). Now, each term in the above is individually maximized by a Gaussian distribution, but these terms are combined with different signs, so there is a cometitive situation and the maximizer is not obvious. When µ 1, one can show that Gaussian distributions are otimal. Also, if H is comactly suorted, and if v is small enough as to mae the suort of

H and 1/vH non overlaing, the otimal inut distribution is Gaussian cf. [5]). However, in general the otimal distributions is unnown. Note that a similar cometitive situation occurs for the interference channel. Let the inuts be X 1 and X, the interference coefficients a and b, and the noise Z 1 and Z indeendent standard Gaussian). The following exression IX 1 ; X 1 + ax + Z 1 ) + IX ; X + bx 1 + Z ) = hx 1 + ax + Z 1 ) hax + Z 1 ) + hx + bx 1 + Z ) hbx 1 + Z ), is a lower bound to the sum-caacity, which is tight by considering X 1 and X of arbitrary bloc length n. Now, Gaussian distributions are maximizing each entroy term in the above, hence, since they aear with different signs, there is a cometitive situation. Would we then refer to tae X 1 and X Gaussian, very non-gaussian or slightly non- Gaussian? Although these questions are not formally defined, the dilemma osed by them can still be understood. II. OBLEM STATEMENT We will use the fading Gaussian BC roblem as a motivation for our more general goal. For this secific roblem, we want to now if/when the distribution of U, X) maximizing 4) is Gaussian or not. Our more general goal is to understand better the roblem osed by the cometitive situations described in the Introduction. For this urose, we formulate a mathematical roblem in the next section. III. LOCAL GEOMETY AND HEMITE COODINATES Let g denote the Gaussian density with zero mean and variance. We start by changing the notation and rewrite 1) and 3) as otimizations over the inut distributions, i.e. max hf g v) = g 5) f: m f)= min hf g v) hf) = g 6) f: m f)= where the functions f are density functions on, i.e., ositive integrable functions integrating to 1, and having a well-defined entroy and second moment m f) = x fx)dx. We now consider the local analysis. We define densities of the form f ε x) = g x)1 + εlx)), x, 7) where L : satisfies inf Lx) > 8) x Lx)g x)dx = 0. 9) Hence, with these two constraints, f ε is a valid density for ε sufficiently small. It is a erturbed Gaussian density, in a direction L. Observe that, m f ε ) = iff M L) = x Lx)g x)dx = 0. 10) We are now interested in analyzing how these erturbations affects the outut through an AWGN channel. Note that, if the inut is a Gaussian g erturbed in the direction L, the outut is a Gaussian erturbed in the direction gl) gv, since f ε g v = 1 + ε gl) gv ). Convention: g L g v refers to g L) g v, i.e., the multilicative oerator recedes the convolution one. For simlicity, let us assume in the following that the function L is a olynomial satisfying 8), 9). Lemma 1: We have Df ε g ) = 1 ε L g + oε ) Df ε g v g g v ) = 1 ε g L g v + oε ), where L g = L x)g x)dx. Moreover, note that for any density f, if its first and second moments are m 1 f) = a and m f) = + a, we have hf) = hg a, ) Df g a, ). 11) Hence, the extremal entroic results of 5) and 6) are locally exressed as min g L g v g L: M L)=0 = 0 1) max g L g v g L: M L)=0 L g = 0, 13) where 0 denotes here the zero function. If 1) is obvious, 13) requires a chec which will be done in section V. Now, if we want to mae headway on the cometitive situations resented in the introduction, we need more refined results than the ones above. Let us define the following maing, Γ +) : L L g ) g L g v L ), 14) where L g ) denotes the sace of real functions having a finite g norm. This linear maing gives, for a given erturbed direction L of a Gaussian inut g, the resulting erturbed direction of the outut through additive Gaussian noise g v. The norm of each direction in their resective saces, i.e., L g ) and L ), gives how far from the Gaussian distribution these erturbations are. Note that if L satisfies 8)-9), so does Γ +) L for the measure. The result in 13) worst noise case) tells us that this maing is a contraction, but for our goal, what would be helful is a sectral analysis of this oerator, to allow more quantitative results than the extreme-case results of 1) and 13). In order to do so, one can exress Γ +) as an oerator defined and valued in the same sace, namely L with the Lebesgue measure λ, which is done by inserting the Gaussian measure in the oerator ument. We then roceed to a singular function/value analysis. Formally, let K = L g, which gives K λ = L g, and let g K g v Λ : K L λ; ) L λ; ) 15) g

which gives Γ +) L g = ΛK λ. We want to find the singular functions of Λ, i.e., denoting by Λ t the adjoint oerator of Λ, we want to find the eigenfunctions K of Λ t Λ. A. General esult IV. ESULTS The following roosition gives the singular functions and values of the oerator Λ defined in revious section. roosition 1: holds for each airs Λ t ΛK = γk, K 0 K, γ) { g H [], + v ) )} 0, where H [] x) = 1! H x/ ) and H x) = 1) e x / d dx e x /, 0, x. The olynomials H [] are the normalized Hermite olynomials for a Gaussian distribution having variance ) and g H [] are called the Hermite functions. For any > 0, {H [] } 0 is an orthonormal basis of L g ), this can be found in [4]. One can chec that H 1, resectively H erturb a Gaussian distribution into another Gaussian distribution, with a different first moment, resectively second moment. For 3, the H erturbations are not modifying the first two moments and are moving away from Gaussian distributions. Since H [] 0 = 1, the orthogonality roerty imlies that H [] satisfies 9) for any > 0. However, it is formally only for even values of that 10) is verified although we will see in section V that essentially any can be considered in our roblems). The following result contains the roerty of Hermite olynomials mostly used in our roblems, and exresses roosition 1 with the Gaussian measures. roosition : Γ +) H [] = g H [] g v = + v )/ H [], 16) Γ ) H [] = H [] g v = + v )/ H []. 17) Last roosition imlies roosition 1, since Γ ) Γ +) L = γl Λ t ΛK = γk for K = L g. Comment: these roerties of Hermite olynomials and Gaussian measures are liely to be already nown, in a different context or with different notations. However, what is articularly interesting here, are not only these roerties by themselves, but mainly the fact that they are recisely emerging from our information theoretic setting and are helful to solve our roblems. In words, we just saw that H is an eigenfunction of the inut/outut erturbation oerator Γ +), in the sense ) / [] H. Hence, over an additive that Γ +) H [] = Gaussian noise channel g v, if we erturb the inut g in the direction H [] by an amount ε, we will erturb the outut ) / in the direction H [] by an amount ε. Such a erturbation in H imlies that the outut entroy is reduced comared to not erturbing) by B. Fading Gaussian Broadcast esult ) ε if 3). roosition 3: Let Y i = HX + Z i, i = 1,, with X such that EX, Z 1 N 0, v), 0 < v < 1 and Z N 0, 1). There exists fading distributions and values of v for which the caacity achieving inut distribution is non-gaussian. More recisely, let U be any auxiliary random variable, with U X Y 1, Y ). Then, there exists, v, a distribution of H and µ such that U, X) IX; Y 1 U, H) + µiu; Y H) 18) is maximized by a non jointly Gaussian distribution. In the roof, we find a counter-examle to Gaussian being otimal for H binary and of course other counter-examles can be found). In order to defeat Gaussian codes, we construct codes using the Hermite coordinates. The roof also gives conditions on the fading distribution and the noise variance v for which these codes are strictly imroving on Gaussian ones. V. OOFS We start by reviewing the roof of 13), as it brings interesting facts. We then rove the main result. roof of 13): We assume first that we insist on constraining f ε to have zero mean and variance exactly. Using the Hermite basis, we exress L as L = 3 α H [] L must have such an exansion, since it must have a finite L g ) norm, to mae sense of the original exressions). Using 16), we can then exress 13) as ) α α 19) + v 3 3 which is clearly negative. Hence, we have roved that g L g v L g 0) and 13) is maximized by taing L = 0. Note that we can get tighter bounds than the one in revious inequality, indeed the tightest, holding for H 3, is given by g L g v + v L g 1) this clearly holds if written as a series lie in 19)). Hence, locally the contraction roerty can be tightened, and locally, we have stronger EI s, or worst noise case. Namely, if ν, we have min hf g v) νhf) = g ) f: m 1f)=0,m f)=

) and if ν < 3, g is outerformed by non-gaussian distributions. Now, if we consider the constraint m f), which in articular, allows now to have m 1 f) > 0 and m f) =, we get that if ν, min hf g v) νhf) = g 3) f: m f) and if ν <, g is outerformed by g δ for some δ > 0. It would then be interesting to study if these tighter results hold in a greater generality than for the local setting. roof of roosition : We refer to 18) as the mu-rate. Let us first consider Gaussian codes, i.e., when U, X) is jointly Gaussian, and see what mu-rate they can achieve. Without loss of generality, we can assume that X = U + V, with U and V indeendent and Gaussian, with resective variance Q and satisfying = Q +. Then, 18) becomes 1 H E log1 + v ) + µ1 1 + H E log 1 + H. 4) Now, we ic a µ and loo for the otimal ower that should be allocated to V in order to maximize the above exression. We are interested in cases for which the otimal is not at the boundary but at an extremum of 4), and if the maxima is unique, the otimal is found by the first derivative chec, which gives E H v+h = µe H 1+H. Since we will loo for µ, v, with > 0, revious condition can be written as E H v + H = µe H 1 + H. 5) We now chec if we can imrove on 4) by moving away from the otimal jointly Gaussian U, X). There are several ways to erturb U, X), we consider here a first examle. We ee U and V indeendent, but erturb them away from Gaussian s in the following way: Uε u) = g Q u)1 + εh [Q] 3 u) + δh 4 )) 6) Vε v) = g v)1 εh [] 3 v) δh 4)) 7) with ε, δ > 0 small enough. Note that these are valid density functions and that they reserve the first two moments of U and V. The reason why we add δh 4, is to ensure that 10) is satisfied, but we will see that for our urose, this can essentially be neglected. Then, using Lemma, the new distribution of X is given by X x) = g x)1 + ε Q H [ ] 3 ε Q [ ] H 3 ) + fδ) ) 4 H [ ] 4 + ) 4 H [ ] 4 ), which where fδ) = δg x)ε tends to zero when δ tends to zero. Now, by icing =, we have X x) = g x) + fδ). 8) Hence, by taing δ arbitrarily small, the distribution of X is arbitrarily close to the Gaussian distribution with variance. We now want to evaluate how these Hermite erturbations erform, given that we want to maximize 18), i.e., hy 1 U, H) hz 1 ) + µhy H) µhy U, H). 9) We wonder if, by moving away from Gaussians, the gain achieved for the term hy U, H) is higher than the loss suffered from the other terms. With the Hermite structure described in revious section, we are able to recisely measure this and we get hy 1 U = u, H = h) h = hg hu,v+h 1 ε v + h H [hu,v+h ] 3 )) = 1 ) log πev + h ) ε h 3 v + h + oε ) + oδ) hy U = u, H = h) = 1 log πe1 + h ) ε and because of 8) h 1 + h + oε ) + oδ) hy H = h) = 1 log πe1 + h ) + oε ) + oδ). Therefore, collecting all terms, we find that for U ε and V ε defined in 6) and 7), exression 9) reduces to ) I G ε H 3 ) E v + H + µ ε H 3 E 1 + H + oε ) + oδ0) where I G is equal to 4) which is the mu-rate obtained with Gaussian inuts). Hence, if for some distribution of H and some v, we have that ) H ) H µe 1 + H E v + H > 0, 31) when = 3 and is otimal for µ, we can tae ε and δ small enough in order to mae 30) strictly ler than I G. We have shown how, if verified, inequality 31) leads to counter-examles of the Gaussian otimality, but with similar exansions, we would also get counter-examles if the following inequality holds for any ower instead of 3, as long as 3. Let us summarize what we obtained: Let be otimal for µ, which means that 5) holds if there is only one maxima not at the boarder). Then, non-gaussian codes along Hermite s strictly outerforms Gaussian codes, if, for some 3, 31) holds. If the maxima is unique, this becomes where ET v) ET v) < ET 1) ET 1) T v) = H v + H. So we want the Jensen ga of T v) for the ower to be small enough comared to the Jensen ga of T 1).

such as when U and V are indeendent with U Gaussian and V non-gaussian along Hermite s. In this case, we get a different condition than 31), which is stronger in general for fixed values of the arameters, but which can still be verified, maing V non-gaussian strictly better than Gaussian. Fig. 1. Gaussian mu-rate, i.e., exression 4), lotted as a function of for µ = 5/4, v = 1/4, = 1.4086308 and H binary {1; 10}. Maxima at = 0.6043154. Fig.. LHS of 31) as a function of, for µ = 5/4, v = 1/4, = 8 and H binary {1; 10}, ositive at = 0.6043154. We now give an examle of fading distribution for which the above conditions can be verified. Let H be binary, taing values 1 and 10 with robability half and let v = 1/4. Let µ = 5/4, then for any values of, the maximizer of 4) is at = 0.6043154, cf. Figure 1, which corresonds in this case to the unique value of for which 5) is satisfied. Hence if is ler than this value of, there is a corresonding fading BC for which the best Gaussian code slits the ower on U and V with = 0.6043154 to achieve the best murate with µ = 5/4. To fit the counter-examles with the choice of Hermite erturbations made reviously, we ic =. Finally, for these values of µ and, 31) can be verified for = 8, cf. Figure, and the corresonding Hermite code along H 8 ) strictly outerforms any Gaussian codes. Note that we can consider other non-gaussian encoders, VI. DISCUSSION We have introduced the use of encoders drawn from non- Gaussian distributions along Hermite olynomials. If the erformance of non-gaussian inuts is usually hard to analyze, we showed how the neighborhoods of Gaussian inuts can be analytically analyzed by use of the Hermite coordinates. This allowed us to use nuanced version of the usual extremal entroic results, and in articular, to show that Gaussian inuts are in general not otimal for degraded fading Gaussian BC, although they might still be otimal for many fading distributions. The Hermite technique rovides not only otential counter-examles to otimality of Gaussian inuts but it also gives insight on roblems for which a cometitive situations does not imly the obvious otimality of Gaussian inuts. For examle, in the considered roblem, the Hermite technique gives a condition on what ind of fading distribution and degradedness values of v) non-gaussian inuts must be used. It also tells us how, locally, the otimal encoder is defined. In this aer, we considered fading BC s, but many other cometitive situations can be tacled with this tool, articularly, since a multi-letter generalization of the current technique can be carried out to aear). Finally, in a different context, local results could be lifted to corresonding global results in [1]. There, the localization is made with resect to the channels and not the inut distribution, yet, it would be interesting to comare the local with the global behavior for the current roblem too. A wor in rogress aims to relace the local neighborhood with a global one, consisting of all sub-gaussian distributions. ACKNOWLEDGMENT The authors would lie to than Tie Liu and Shlomo Shamai for ointing out roblems relevant to the alication of the roosed technic, as well as Daniel Strooc for discussions on Hermite olynomials. For the first author, this wor has artially been done under a ostdoctoral fellowshi at LIDS, MIT. EFEENCES [1] E. Abbe and L. Zheng, Linear universal decoding: a local to global geometric aroach, Submitted IEEE Trans. Inform. Theory, 008. [] T. M. Cover, Comments on broadcast channel, IEEE Trans. on Inform. Theory vol. 44, n. 6,. 54-530, Oct. 1998. [3] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, New Yor, NY, 1991. [4] D. W. Strooc, A Concise Introduction to the Theory of Integration, Third Edition, Birhäuser, 1999. [5] D. Tuninetti and S. Shamai, On two-user fading Gaussian broadcast channels with erfect channel state information at the receivers, Int. Sym. on Inform. Theory ISIT), 003.