On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at Santa Barbara, CA-93106. Abstract When is optial estiation linear? It is well-known that, in the case of a Gaussian source containated with Gaussian noise, a linear estiator iniizes the ean square estiation error. This paper analyzes ore generally the conditions for linearity of optial estiators. Given a noise (or source) distribution, and a specified signal to noise ratio (SNR), we derive conditions for existence and uniqueness of a source (or noise) distribution that renders the L p nor optial estiator linear. We then show that, if the noise and source variances are equal, then the atching source is distributed identically to the noise. Moreover, we prove that the Gaussian source-channel pair is unique in that it is the only source-channel pair for which the MSE optial estiator is linear at ore than one SNR values. Index Ters Optial estiation, linear estiation, sourcechannel atching I. INTRODUCTION Consider the basic proble in estiation theory, naely, source estiation fro a signal received through a channel with additive noise, given the statistics of both the source and the channel. The optial estiator that iniizes the ean square estiation error is usually a nonlinear function of the observation [1]. A frequently exploited result in estiation theory concerns the special case of Gaussian source and Gaussian channel noise, a case in which the optial estiator is guaranteed to be linear. An open follow-up question considers the existence of other cases exhibiting such a coincidence, and ore generally the characterization of conditions for linearity of optial estiators for general distortion easures. This proble also has practical iportance beyond theoretical interest, ainly due to significant coplexity issues in both design and operation of estiators. Specifically, the optial estiator generally involves entire probability distributions, whereas linear estiators require only up to secondorder statistics for their design. Moreover, unlike the optial estiator which can be an arbitrarily coplex function that is difficult to ipleent, the resulting linear estiator consists of a siple atrix-vector operation. Hence, linear estiators are ore prevalent in practice, despite their suboptial perforance in general. They also represent a significant teptation to assue that processes are Gaussian, soeties despite overwheling evidence to the contrary. Results in this paper identify the cases where a linear estiator is optial, and, hence, justify the use of linear estiators in practice without recourse to coplexity arguents. The estiation proble in general has been studied intensively in the literature. It is known that, for stable distributions This work is supported by the NSF under the grant CCF-078986 (which of course include the Gaussian case), the optial estiator is linear [], [3], [4], [5] for any signal to noise ratios (SNR). Stable distributions are a subset of the infinitely divisible distributions which, as we show in this paper, satisfy the proposed necessary condition to have a atching distribution at any SNR level. Our ain contribution to the prior works (that studied linearity at all SNR levels) focuses on the linearity of optial estiation for L p nor and its dependence on the SNR level. We present the optiality conditions for linear estiators given a specified SNR, and for the L p nor. As a special case, we investigate the p = case (ean square error) in detail. Note that a siilar proble has been studied in [5], [6] for p = without analysis of the existence of the distributions satisfying the necessary condition. We show that the necessary condition of [5], [6] is indeed a special case of our necessary and sufficient conditions, and present a detailed analysis of the MSE case. Four results are provided on the optiality of linear estiation. First, we show that if the noise (alternatively, source) distribution satisfies certain conditions, there always exists a unique source (alternatively, noise) distribution of a given power, under which the optial estiator is linear. We further identify conditions under which such a atching distribution does not exist. Secondly, we show that if the source and the noise have the sae variance, they ust be identically distributed to ensure the linearity of optial estiator. As a third result, we show that the MSE optial estiator converges to a linear estiator for any source and Gaussian noise at asyptotically low SNR, and vice versa, for any noise and Gaussian source at asyptotically high SNR. Having established ore general conditions for linearity of optial estiation, one wonders in what precise sense the Gaussian case ay be special. This question is answered by the fourth result. We consider the optiality of linear estiation at ultiple SNR values. Let rando variables X and N be the source and channel noise, respectively, and allow for scaling of either to produce varying levels of SNR. We show that if the optial estiation is linear at ore than one SNR value, then both the source X and the noise N ust be Gaussian 1. In other words, the Gaussian source-noise pair is unique in that it offers linearity of optial estiators at ultiple SNR values. The paper is organized as follows: we present the proble forulation in Section II, the ain result in Section III, the 1 Of course, in this case optial estiators are linear at all SNR levels.

Source X Noise N Fig. 1. Observation Y = X + N Estiator h(y) The general setup of the proble Reconstruction specific result for MSE in Section IV, the corollaries in Section V, coents on the vector case in Section VI and provide conclusions in Section VII. A. Preliinaries and notation II. PROBLEM FORMULATION We consider the proble of estiating the source X given the observation Y = X +N, where X and N are independent, as shown in Figure 1. Without loss of generality, we assue that X and N are scalar zero ean rando variables with distributions f X ( ) and f N ( ). Their respective characteristic functions are denoted F X (ω) and F N (ω). A distribution f(x) is said to be syetric if it is an even function : f(x) = f( x) x R. The SNR is = σ x σ. All distributions are n constrained to have finite variance, i.e., σx <, σn <. All the logariths in the paper are natural logariths and can in general be coplex. The optial estiator h( ) is the function of the observation, that iniizes the cost functional for the distortion easure Φ. J(h( )) = E {Φ(X h(y ))} (1) B. Optiality condition for L p nor Re-writing (1) ore explicitly, J(h( )) = Φ(x h(y))f X (x)f Y X (y x)dxdy () To obtain the necessary conditions for optiality, we apply the standard ethod in variational calculus [7]: J [h(y) + ɛη(y)] ɛ = 0 (3) ɛ=0 for all adissible variation functions η(y). If Φ is differentiable, (3) yields Φ (x h(y))η(y)f X (x)f Y X (y x)dxdy = 0 (4) or, E {[Φ (X h(y )]η(y )} = 0 (5) where Φ is the derivative of Φ. This necessary condition is also sufficient for all convex Φ ( d Φ dx > 0), in which case ɛ J [h(y) + ɛη(y)] ɛ=0 > 0, for any η(y) variation function. Note that this definition can be generalized to syetry about any point when one drops the assuption of zero-ean distributions ˆX Hereafter, we will specialize our results to the case of L p nor, i.e., Φ(x) = x p which is convex x R {0}, ensuring the sufficiency of (5). d Note that for odd p, dx x p = p xp x x R {0}. Hence for odd p whereas for even p { } [X h(y )] p E X h(y ) η(y ) = 0 (6) E { [X h(y )] p 1 η(y ) } = 0 (7) Note that when Φ(x) = x, this condition reduces to the well known orthogonality condition of MSE, i.e., E {[(X h(y )]η(y )} = 0 (8) for any η( ) function. Note when p =, the optial estiator h(y ) = E {X Y } can be obtained fro (7). { } [x h(y)]f X (x)f Y X (y x)dx η(y)dy = 0 (9) For (9) to hold for any η, the ter in parenthesis should be zero, yielding h(y ) = E {X Y }, using Bayes rule. Note that, for p = 1, this expression boils down to h(y ) being the edian, which is known as the centroid condition for L 1 nor (see e.g. [8]). C. Optial linear estiation for L p nor The linear estiator that iniizes the L p nor is derived using linear variation functions. Plugging η(y) = ay (for soe a R) in (7) and oitting soe straightforward steps, we obtain the optiality condition (for even p) as E { (X ky ) p 1 Y } = 0 (10) Optial scaling coefficient k can be found by plugging Y = X + N into (10). Observe that for p =, we get the well known result k = +1. D. Gaussian source and channel case We next consider the special case in which both X and N are Gaussian, X N(0, σx) and N N(0, σn). Plugging the distributions in h(y ) = E {X Y }, we obtain the well-known result h(y ) = + 1 Y (11) In this case, the optial estiator is linear at all SNR () levels. Also note that it renders the estiation error X h(y ) independent of Y. It is straightforward to show that this linear estiator satisfies (6,7) and hence optial for L p nor. This is not a new result, it is known that optial estiator is linear for L p nor if both source and noise are Gaussian, see also [9]. E. Proble stateent We attept to answer the following question: Are there other source-channel distribution pairs for which the optial estiator turns out to be linear? More precisely, we wish to find the entire set of source and channel distributions such that h(y ) = ky is the optial estiator for soe k.

III. MAIN RESULT FOR L p NORM In this section we derive the necessary and sufficient conditions for the linearity of optial estiator in ters of the characteristic functions of the source and noise. Theore 1: For a given L p distortion easure (p even), and given noise N with characteristic function F N (ω), source X with characteristic function F X (ω), the optial estiator h(y ) is linear, h(y ) = ky, if and only if the following differential equation is satisfied: F () X (p 1 ) (ω)f N ( ) ( ) p 1 k 1 (ω) = 0 (1) k Proof: Plugging in f Y X (y x) = f N (y x) in (7), we obtain [x ky] p 1 f X (x)f N (y x)dx = 0, y (13) Using the binoial expansion we get ( p 1 )( ky) x p 1 f X (x)f N (y x)dx = 0 (14) Let denote the convolution operator and rewrite (14) as ( ) p 1 ( ky) [ y p 1 f X (y) f N (y) ] = 0 (15) Taking the Fourier transfor (assuing the Fourier transfor exists), we obtain ( [ p 1 )( k) d d p 1 ] (F X (ω)) dω dω p 1 F N (ω) = 0 (16) After soe straightforward algebra, we obtain (1). The converse part of the theore follows fro the fact that the sufficiency of the necessary conditions (6,7) due to the convexity of the L p nor. Note that a siilar condition can be obtained for odd p with the noise F N (ω) replaced with its Hilbert transfor, the details are left out due to space constraints. IV. SPECIALIZING TO MSE In this section, we specialize the conditions for ean square error, p =. More precisely, we wish to find the entire set of source and channel distributions such that h(y ) = +1 Y is the optial estiator for a given. Note that, this condition was derived, in another context [5], [6], albeit without consideration of iportant iplications we focus on, including the conditions for the existence of a atching noise for a given source (and vice versa), or applications of such atching conditions. We identify the conditions for existence (and uniqueness) of a source distribution that atches the noise in a way that akes the optial estiator coincide with a linear one We state the ain result for MSE in the following theore. Theore : For a given SNR level, and given noise N with density f N (n) and characteristic function F N (ω), there exists a source X for which the optial estiator is linear if and only if the function F (ω) = F N (ω) is a legitiate characteristic function. Moreover, if F (ω) is legitiate, then it is the characteristic function of the atching source, i.e. F X (ω) = F (ω). An equivalent theore holds where we replace noise for source everywhere, i.e., given source and SNR level, we have a condition for existence of a atching noise. Proof: Plugging p = in (1) yields or ore copactly, 1 df X (ω) F X (ω) dω 1 = F N (ω) df N (ω) dω (17) d dω log F X(ω) = d dω log F N (ω) (18) The solution to this differential equation is given by: log F X (ω) = log F N (ω) + C (19) where C is a constant. Iposing F N (0) = F X (0) = 1, we obtain C = 0, hence, F X (ω) = F N (ω) (0) Hence, given a noise distribution, the necessary and sufficient condition for the existence of a atching source distribution boils down to the requireent that F N (ω) be a valid characteristic function. Moreover, if such a atching source exists, we have a recipe for deriving its distribution. Bochner s theore [4] states that a continuous F : R C with F (0) = 1 is a characteristic function if and only if it is positive sei-definite. Hence, the existence of a atching source depends on the positive definiteness of F N (ω). Definition: Let f : R C be a coplex-valued function, and t 1,..., t s be a set of points in R. Then f is said to be positive sei- definite (non-negative definite) if for any t i R and a i C, i = 1,..., s we have s s a i a j f(t i t j ) 0 (1) i=1 j=1 where a j is the coplex conjugate of a j. Equivalently, we require that the s s atrix constructed with f(t i t j ) be positive sei-definite. If function f is positive sei-definite, its Fourier transfor, F (ω) 0, ω R. Hence, in the case of our candidate characteristic function, this requireent ensures that the corresponding density is indeed non-negative everywhere. We note that characterizing the entire set of F N (ω) where F N (ω) is positive sei-definite ay be a difficult task. Instead we illustrate with various cases of interest where F N (ω) is or is not positive sei-definite. Let us start with a siple but useful case.

Corollary 1: If Z, a atching source distribution exists, regardless of the noise distribution. Proof: Fro (0), integer yields to the valid characteristic function of the rando variable X X = N i () i=1 where N i are independent and identically distributed as N. Let us recall the concept of infinite divisibility, which is closely related to our proble. Definition [10]: A distribution with characteristic function F (ω) is called infinitely divisible, if for each integer k 1, there exists a characteristic function F k (ω) such that F (ω) = (F k (ω)) k. Infinitely divisible distributions have been studied extensively in probability theory [10], [11]. It is known that Poisson, exponential, and geoetric distributions as well as the set of stable distributions (which includes Gaussian and Gaa distributions) are infinitely divisible. On the other hand, it is easy to see that distributions of discrete rando variable with finite alphabets are not infinitely divisible. Corollary : A atching source distribution exists for any positive R if f N (n) is infinitely divisible. Proof: It is easy to show fro the definition of infinite divisibility that (F N (ω)) r is a valid characteristic function for all rational r > 0, using Corollary 1. Using the fact that every R is a liit of a sequence of rational nubers r n, and by the continuity theore [1], we conclude that F X (ω) = [F N (ω)] is a valid characteristic function. However, the converse of the above corollary is not true: There can exist a atching source, even though f N is not infinitely divisible. For exaple, a finite alphabet discrete rando variable v is not infinitely divisible but still can be k-divisible, where k < V 1 and V is the cardinality of v. Hence, when = 1 k, there ight exist a atching source, even though noise is not infinitely divisible. Let us now identify a case in which a atching source does not exist. When F N (ω) is real and negative for soe ω, i.e. f N (n) is syetric but not positive sei-definite, a atching source does not exist. We state this in the for of a corollary. Corollary 3: For / Z, if F N (ω) R and ω such that F N (ω) < 0, a atching source distribution does not exist. Proof: We prove this corollary by contradiction. Let F N (ω) be a valid characteristic function. Recall the orthogonality property of the optial estiator for MSE, i.e., (8). Let η(y ) = Y for = 1,, 3...M. Plugging the best linear estiator h(y ) = Y and replacing Y with X + N, we obtain the condition {[ E X +1 (X + N) + 1 ] } (X + N) = 0 for = 1,.., M (3) Expressing (X + N) as a binoial expansion ( ) (X + N) = X i N i (4) i and rearranging the ters, we obtain the M +1 linear equations that recursively connect all oents of f X (x) up to M + 1, i.e., for each = 1,..., M we have 1 E(X +1 ) = E(N +1 ) + A(,, i)e(n i+1 )E(X i ) (5) where, A(,, i) = ( ) ( i i+1). It follows fro (5) that, if all odd oents of N are zero, then so are all odd oents of X. Hence, when the noise is syetric, the atching source ust also be syetric. However, if / Z, by (0), it follows that F X (ω) is not real, and hence f X (x) is not syetric. This contradiction shows that no atching source exists when / Z and noise distribution is syetric but not positive sei-definite. V. SPECIAL CASES In this section, we return to L p nor and investigate soe special cases obtained by varying. Theore 3: Given a source and noise of equal variance, the optial estiator is linear if and only if the noise and source distributions are identical. Proof: For MSE, it is straightforward to see fro (0) that, at = 1, characteristic functions ust be identical. The characteristic function uniquely deterines the distribution [1]. Alternatively, it can be observed directly fro (1) for L p nor that F N (ω) = F X (ω) satisfies the optiality condition. Theore 4 (for MSE only): In the liit 0, the MSE optial estiator converges to linear in probability if the channel is Gaussian, regardless of the source. Siilarly, as, the MSE optial estiator converges to linear in probability if the source is Gaussian, regardless of the channel. Proof: The proof applies the law of large nubers to (0) for MSE. For the Gaussian channel with asyptotically low SNR, (asyptotical) optiality of linear estiation can also be deduced fro Eq. 91 of [13]. We conjecture that this theore also holds for L p nor, although we currently do not have a proof. Let us consider a setup with a given source and noise variables which ay be scaled to vary the SNR. Can the optial estiator be linear at different values of? This question is otivated by the practical setting where is not known in advance or ay vary (e.g. in the design stage of a counication syste). It is well-known that the Gaussian source-gaussian noise pair akes the optial estiator linear at all levels. Below, we show that this is the only sourcechannel pair whose optial estiators are linear at ultiple values. Theore 5: Let the source or channel variables be scaled to vary the SNR,. The L p nor optial estiator is linear at two different values 1 and, if and only if both the source and the channel noise are Gaussian. Proof: This theore can be proved fro the set of oent equations (5). Let us say the noise is scaled by

α R, i.e N = αn. The relation between the oents of the original and scaled noise E(N ) = α E(N ) for = 1,.., M + 1 (6) Also, a set of oent equations should hold for 1 and. For clarity, we focus on the MSE nor, but the proof for L p nor follows the sae lines. The key observation is that, as entioned in Sec II.D, the sae linear estiator is optial for a Gaussian source-channel pair with L p nor. 1 E(X +1 ) = j E(N +1 )+ A( j,, i)e(n i+1 )E(X i ) (7) where = 1,.., M, j = 1, and A(,, i) = ( ) ( i i+1). Note that every equation introduces a new variable E(X +1 ), for = 1,.., M, so each new equation is independent of its predecessors. Let us consider solving these equations recursively, starting fro = 1. At each, we have three unknowns (E(X +1 ), E(N +1 ), E(N +1 )) that are related linearly. Since the nuber of equations is equal to the nuber of unknowns for each, there ust exist a unique solution. We know that the oents of the Gaussian sourcechannel pair satisfy (7). For the Gaussian rando variable, the oents uniquely deterine the distribution [14], so Gaussian source and noise are the only solution. Alternate Proof: Theore 5 can be proved, only for MSE, in an alternative way. Assue the sae terinology as above. Then, σn = α σn and F N (ω) = F N (ωα). Let, Using (0), 1 = σ x σn, = σ x α σn (8) F X (ω) = F N (ω) 1, F X (ω) = F N (ωα) (9) Taking the logarith on both sides of (9) and plugging (8) into (9), we obtain α = log F N(αω) log F N (ω) (30) Note that (30) should be satisfied for both α and α since they yield the sae. Plugging α = 1 in (30), we obtain F N (ω) = F N ( ω), ω. Using the fact that every characteristic function should be conjugate syetric (i.e. F N ( ω) = F N (ω)), we get F N(ω) R, ω. As log F N (ω) is R C, the Weierstrass theore [15] guarantees that there is a sequence of polynoials that uniforly converges to it: log F N (ω) = k 0 +k 1 ω+k ω +k 3 ω 3..., where k i C. Hence, by (30) we obtain: α = k 0 + k 1 ωα + k (ωα) + k 3 (ωα) 3... k 0 + k 1 ω + k ω + k 3 ω 3, ω R, (31)... which is satisfied for all ω only if all coefficients k i vanish, except for k, i.e. log F N (ω) = k ω, or log F N (ω) = 0 ω R (the solution α = 1 is not relevant in this case). The latter is not a characteristic function, and the forer is the Gaussian characteristic function, F N (ω) = e kω (where we use the established fact that F N (ω) R.) Since a characteristic function deterines the distribution uniquely, the Gaussian source and noise ust be the only such pair. VI. COMMENTS ON THE EXTENSION TO HIGHER DIMENSIONS Extension of the conditions to the vector case is nontrivial due to the fact that individual SNR values for each vector coponent can differ. Currently we do have the solution for this extension, but it is left out due to space constraints. VII. CONCLUSION In this paper, we derived conditions under which the optial estiator linear for L p nor. We identified the conditions for the existence and uniqueness of a source distribution that atches the noise in a way that ensures linearity of the optial estiator for the special case of p =. One trivial exaple of this type of atching occurs for Gaussian source and Gaussian noise at all SNR levels. Another instance of atching happens when the source and noise are identically distributed where the optial estiator is h(y ) = 1 Y. We also show that Gaussian source-channel pair is unique in that it is the only source-channel pair for which the optial estiator is linear at ore than one SNR value. Moreover, we show the asyptotical linearity of MSE optial estiators for low SNR if the channel is Gaussian regardless of the source and vice versa, for high SNR if the source is Gaussian regardless of the channel. REFERENCES [1] S.M. Kay, Fundaentals of Statistical Signal Processing, Prentice Hall PTR, 1993. [] V.P. Skitovic, Linear cobinations of independent rando variables and the noral distribution law, Selected Translations in Matheatical Statistics and Probability, p. 11, 196. [3] S.G. Ghurye and I. Olkin, A characterization of the ultivariate noral distribution, The Annals of Matheatical Statistics, pp. 533 541, 196. [4] MM Rao and RJ Swift, Probability Theory with Applications, Springer, 005. [5] RG Laha, On a characterization of the stable law with finite expectation, The Annals of Matheatical Statistics, vol. 7, no. 1, pp. 187 195, 1956. [6] A. Balakrishnan, On a characterization of processes for which optial ean-square systes are of specified for, IEEE Transactions on Inforation Theory, vol. 6, no. 4, pp. 490 500, 1960. [7] D.G. Luenberger, Optiization by Vector Space Methods, John Wiley & Sons Inc, 1969. [8] A. Gersho and R.M. Gray, Vector Quantization and Signal Copression, Springer, 199. [9] S. Sheran, Non-ean-square error criteria, IEEE Transactions on Inforation Theory,, vol. 4, no. 3, pp. 15 16, 1958. [10] E. Lukacs, Characteristics Functions, Charles Griffin and Copany, 1960. [11] F.W. Steutel and K. Van Harn, Infinite divisibility of probability distributions on the real line, CRC, 003. [1] P. Billingsley, Probability and Measure, John Wiley & Sons Inc, 008. [13] D. Guo, S. Shaai, and S. Verdu, Mutual inforation and iniu ean-square error in Gaussian channels, IEEE Transactions on Inforation Theory, vol. 51, no. 4, pp. 161 18, 005. [14] JA Shohat and J.D. Taarkin, The Proble of Moents, New York, 1943. [15] R.M. Dudley, Real Analysis and Probability, Cabridge Univ Pr, 00.