On Conditions for Linearity of Optimal Estimation

Similar documents
Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

a a a a a a a m a b a b

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

Chapter 6 1-D Continuous Groups

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Non-Parametric Non-Line-of-Sight Identification 1

Sharp Time Data Tradeoffs for Linear Inverse Problems

Feature Extraction Techniques

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

The Methods of Solution for Constrained Nonlinear Programming

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

The Weierstrass Approximation Theorem

A Bernstein-Markov Theorem for Normed Spaces

Probability Distributions

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Asynchronous Gossip Algorithms for Stochastic Optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Distributed Subgradient Methods for Multi-agent Optimization

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

Randomized Recovery for Boolean Compressed Sensing

Analysis of Polynomial & Rational Functions ( summary )

Energy-Efficient Threshold Circuits Computing Mod Functions

3.8 Three Types of Convergence

Curious Bounds for Floor Function Sums

Support recovery in compressed sensing: An estimation theoretic approach

Computational and Statistical Learning Theory

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

Block designs and statistics

Fixed-to-Variable Length Distribution Matching

Fourier Series Summary (From Salivahanan et al, 2002)

Estimating Parameters for a Gaussian pdf

Introduction to Discrete Optimization

Multi-Scale/Multi-Resolution: Wavelet Transform

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

A Simple Regression Problem

COS 424: Interacting with Data. Written Exercises

Divisibility of Polynomials over Finite Fields and Combinatorial Applications

Physics 215 Winter The Density Matrix

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Implementing Non-Projective Measurements via Linear Optics: an Approach Based on Optimal Quantum State Discrimination

Bootstrapping Dependent Data

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

1 Proof of learning bounds

arxiv: v1 [math.nt] 14 Sep 2014

Estimating Entropy and Entropy Norm on Data Streams

Lectures 8 & 9: The Z-transform.

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Antenna Saturation Effects on MIMO Capacity

Least Squares Fitting of Data

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Polygonal Designs: Existence and Construction

Weighted- 1 minimization with multiple weighting sets

Lower Bounds for Quantized Matrix Completion

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

The Transactional Nature of Quantum Information

arxiv: v1 [math.na] 10 Oct 2016

Learnability and Stability in the General Learning Setting

lecture 36: Linear Multistep Mehods: Zero Stability

Topic 5a Introduction to Curve Fitting & Linear Regression

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

An RIP-based approach to Σ quantization for compressed sensing

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Necessity of low effective dimension

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

On Constant Power Water-filling

Exact tensor completion with sum-of-squares

1 Bounding the Margin

A Probabilistic and RIPless Theory of Compressed Sensing

SOLUTIONS. PROBLEM 1. The Hamiltonian of the particle in the gravitational field can be written as, x 0, + U(x), U(x) =

The degree of a typical vertex in generalized random intersection graph models

Combining Classifiers

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

Computable Shell Decomposition Bounds

Max-Product Shepard Approximation Operators

Moments of the product and ratio of two correlated chi-square variables

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

On the Existence of Pure Nash Equilibria in Weighted Congestion Games

IN modern society that various systems have become more

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

PAC-Bayesian Learning of Linear Classifiers

Tail estimates for norms of sums of log-concave random vectors

STOPPING SIMULATED PATHS EARLY

Distributional transformations, orthogonal polynomials, and Stein characterizations

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Testing Properties of Collections of Distributions

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1.

A note on the multiplication of sparse matrices

arxiv: v1 [cs.ds] 3 Feb 2014

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

Hybrid System Identification: An SDP Approach

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

Machine Learning Basics: Estimators, Bias and Variance

A Quantum Observable for the Graph Isomorphism Problem

Transcription:

On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at Santa Barbara, CA-93106. Abstract When is optial estiation linear? It is well-known that, in the case of a Gaussian source containated with Gaussian noise, a linear estiator iniizes the ean square estiation error. This paper analyzes ore generally the conditions for linearity of optial estiators. Given a noise (or source) distribution, and a specified signal to noise ratio (SNR), we derive conditions for existence and uniqueness of a source (or noise) distribution that renders the L p nor optial estiator linear. We then show that, if the noise and source variances are equal, then the atching source is distributed identically to the noise. Moreover, we prove that the Gaussian source-channel pair is unique in that it is the only source-channel pair for which the MSE optial estiator is linear at ore than one SNR values. Index Ters Optial estiation, linear estiation, sourcechannel atching I. INTRODUCTION Consider the basic proble in estiation theory, naely, source estiation fro a signal received through a channel with additive noise, given the statistics of both the source and the channel. The optial estiator that iniizes the ean square estiation error is usually a nonlinear function of the observation [1]. A frequently exploited result in estiation theory concerns the special case of Gaussian source and Gaussian channel noise, a case in which the optial estiator is guaranteed to be linear. An open follow-up question considers the existence of other cases exhibiting such a coincidence, and ore generally the characterization of conditions for linearity of optial estiators for general distortion easures. This proble also has practical iportance beyond theoretical interest, ainly due to significant coplexity issues in both design and operation of estiators. Specifically, the optial estiator generally involves entire probability distributions, whereas linear estiators require only up to secondorder statistics for their design. Moreover, unlike the optial estiator which can be an arbitrarily coplex function that is difficult to ipleent, the resulting linear estiator consists of a siple atrix-vector operation. Hence, linear estiators are ore prevalent in practice, despite their suboptial perforance in general. They also represent a significant teptation to assue that processes are Gaussian, soeties despite overwheling evidence to the contrary. Results in this paper identify the cases where a linear estiator is optial, and, hence, justify the use of linear estiators in practice without recourse to coplexity arguents. The estiation proble in general has been studied intensively in the literature. It is known that, for stable distributions This work is supported by the NSF under the grant CCF-078986 (which of course include the Gaussian case), the optial estiator is linear [], [3], [4], [5] for any signal to noise ratios (SNR). Stable distributions are a subset of the infinitely divisible distributions which, as we show in this paper, satisfy the proposed necessary condition to have a atching distribution at any SNR level. Our ain contribution to the prior works (that studied linearity at all SNR levels) focuses on the linearity of optial estiation for L p nor and its dependence on the SNR level. We present the optiality conditions for linear estiators given a specified SNR, and for the L p nor. As a special case, we investigate the p = case (ean square error) in detail. Note that a siilar proble has been studied in [5], [6] for p = without analysis of the existence of the distributions satisfying the necessary condition. We show that the necessary condition of [5], [6] is indeed a special case of our necessary and sufficient conditions, and present a detailed analysis of the MSE case. Four results are provided on the optiality of linear estiation. First, we show that if the noise (alternatively, source) distribution satisfies certain conditions, there always exists a unique source (alternatively, noise) distribution of a given power, under which the optial estiator is linear. We further identify conditions under which such a atching distribution does not exist. Secondly, we show that if the source and the noise have the sae variance, they ust be identically distributed to ensure the linearity of optial estiator. As a third result, we show that the MSE optial estiator converges to a linear estiator for any source and Gaussian noise at asyptotically low SNR, and vice versa, for any noise and Gaussian source at asyptotically high SNR. Having established ore general conditions for linearity of optial estiation, one wonders in what precise sense the Gaussian case ay be special. This question is answered by the fourth result. We consider the optiality of linear estiation at ultiple SNR values. Let rando variables X and N be the source and channel noise, respectively, and allow for scaling of either to produce varying levels of SNR. We show that if the optial estiation is linear at ore than one SNR value, then both the source X and the noise N ust be Gaussian 1. In other words, the Gaussian source-noise pair is unique in that it offers linearity of optial estiators at ultiple SNR values. The paper is organized as follows: we present the proble forulation in Section II, the ain result in Section III, the 1 Of course, in this case optial estiators are linear at all SNR levels.

Source X Noise N Fig. 1. Observation Y = X + N Estiator h(y) The general setup of the proble Reconstruction specific result for MSE in Section IV, the corollaries in Section V, coents on the vector case in Section VI and provide conclusions in Section VII. A. Preliinaries and notation II. PROBLEM FORMULATION We consider the proble of estiating the source X given the observation Y = X +N, where X and N are independent, as shown in Figure 1. Without loss of generality, we assue that X and N are scalar zero ean rando variables with distributions f X ( ) and f N ( ). Their respective characteristic functions are denoted F X (ω) and F N (ω). A distribution f(x) is said to be syetric if it is an even function : f(x) = f( x) x R. The SNR is = σ x σ. All distributions are n constrained to have finite variance, i.e., σx <, σn <. All the logariths in the paper are natural logariths and can in general be coplex. The optial estiator h( ) is the function of the observation, that iniizes the cost functional for the distortion easure Φ. J(h( )) = E {Φ(X h(y ))} (1) B. Optiality condition for L p nor Re-writing (1) ore explicitly, J(h( )) = Φ(x h(y))f X (x)f Y X (y x)dxdy () To obtain the necessary conditions for optiality, we apply the standard ethod in variational calculus [7]: J [h(y) + ɛη(y)] ɛ = 0 (3) ɛ=0 for all adissible variation functions η(y). If Φ is differentiable, (3) yields Φ (x h(y))η(y)f X (x)f Y X (y x)dxdy = 0 (4) or, E {[Φ (X h(y )]η(y )} = 0 (5) where Φ is the derivative of Φ. This necessary condition is also sufficient for all convex Φ ( d Φ dx > 0), in which case ɛ J [h(y) + ɛη(y)] ɛ=0 > 0, for any η(y) variation function. Note that this definition can be generalized to syetry about any point when one drops the assuption of zero-ean distributions ˆX Hereafter, we will specialize our results to the case of L p nor, i.e., Φ(x) = x p which is convex x R {0}, ensuring the sufficiency of (5). d Note that for odd p, dx x p = p xp x x R {0}. Hence for odd p whereas for even p { } [X h(y )] p E X h(y ) η(y ) = 0 (6) E { [X h(y )] p 1 η(y ) } = 0 (7) Note that when Φ(x) = x, this condition reduces to the well known orthogonality condition of MSE, i.e., E {[(X h(y )]η(y )} = 0 (8) for any η( ) function. Note when p =, the optial estiator h(y ) = E {X Y } can be obtained fro (7). { } [x h(y)]f X (x)f Y X (y x)dx η(y)dy = 0 (9) For (9) to hold for any η, the ter in parenthesis should be zero, yielding h(y ) = E {X Y }, using Bayes rule. Note that, for p = 1, this expression boils down to h(y ) being the edian, which is known as the centroid condition for L 1 nor (see e.g. [8]). C. Optial linear estiation for L p nor The linear estiator that iniizes the L p nor is derived using linear variation functions. Plugging η(y) = ay (for soe a R) in (7) and oitting soe straightforward steps, we obtain the optiality condition (for even p) as E { (X ky ) p 1 Y } = 0 (10) Optial scaling coefficient k can be found by plugging Y = X + N into (10). Observe that for p =, we get the well known result k = +1. D. Gaussian source and channel case We next consider the special case in which both X and N are Gaussian, X N(0, σx) and N N(0, σn). Plugging the distributions in h(y ) = E {X Y }, we obtain the well-known result h(y ) = + 1 Y (11) In this case, the optial estiator is linear at all SNR () levels. Also note that it renders the estiation error X h(y ) independent of Y. It is straightforward to show that this linear estiator satisfies (6,7) and hence optial for L p nor. This is not a new result, it is known that optial estiator is linear for L p nor if both source and noise are Gaussian, see also [9]. E. Proble stateent We attept to answer the following question: Are there other source-channel distribution pairs for which the optial estiator turns out to be linear? More precisely, we wish to find the entire set of source and channel distributions such that h(y ) = ky is the optial estiator for soe k.

III. MAIN RESULT FOR L p NORM In this section we derive the necessary and sufficient conditions for the linearity of optial estiator in ters of the characteristic functions of the source and noise. Theore 1: For a given L p distortion easure (p even), and given noise N with characteristic function F N (ω), source X with characteristic function F X (ω), the optial estiator h(y ) is linear, h(y ) = ky, if and only if the following differential equation is satisfied: F () X (p 1 ) (ω)f N ( ) ( ) p 1 k 1 (ω) = 0 (1) k Proof: Plugging in f Y X (y x) = f N (y x) in (7), we obtain [x ky] p 1 f X (x)f N (y x)dx = 0, y (13) Using the binoial expansion we get ( p 1 )( ky) x p 1 f X (x)f N (y x)dx = 0 (14) Let denote the convolution operator and rewrite (14) as ( ) p 1 ( ky) [ y p 1 f X (y) f N (y) ] = 0 (15) Taking the Fourier transfor (assuing the Fourier transfor exists), we obtain ( [ p 1 )( k) d d p 1 ] (F X (ω)) dω dω p 1 F N (ω) = 0 (16) After soe straightforward algebra, we obtain (1). The converse part of the theore follows fro the fact that the sufficiency of the necessary conditions (6,7) due to the convexity of the L p nor. Note that a siilar condition can be obtained for odd p with the noise F N (ω) replaced with its Hilbert transfor, the details are left out due to space constraints. IV. SPECIALIZING TO MSE In this section, we specialize the conditions for ean square error, p =. More precisely, we wish to find the entire set of source and channel distributions such that h(y ) = +1 Y is the optial estiator for a given. Note that, this condition was derived, in another context [5], [6], albeit without consideration of iportant iplications we focus on, including the conditions for the existence of a atching noise for a given source (and vice versa), or applications of such atching conditions. We identify the conditions for existence (and uniqueness) of a source distribution that atches the noise in a way that akes the optial estiator coincide with a linear one We state the ain result for MSE in the following theore. Theore : For a given SNR level, and given noise N with density f N (n) and characteristic function F N (ω), there exists a source X for which the optial estiator is linear if and only if the function F (ω) = F N (ω) is a legitiate characteristic function. Moreover, if F (ω) is legitiate, then it is the characteristic function of the atching source, i.e. F X (ω) = F (ω). An equivalent theore holds where we replace noise for source everywhere, i.e., given source and SNR level, we have a condition for existence of a atching noise. Proof: Plugging p = in (1) yields or ore copactly, 1 df X (ω) F X (ω) dω 1 = F N (ω) df N (ω) dω (17) d dω log F X(ω) = d dω log F N (ω) (18) The solution to this differential equation is given by: log F X (ω) = log F N (ω) + C (19) where C is a constant. Iposing F N (0) = F X (0) = 1, we obtain C = 0, hence, F X (ω) = F N (ω) (0) Hence, given a noise distribution, the necessary and sufficient condition for the existence of a atching source distribution boils down to the requireent that F N (ω) be a valid characteristic function. Moreover, if such a atching source exists, we have a recipe for deriving its distribution. Bochner s theore [4] states that a continuous F : R C with F (0) = 1 is a characteristic function if and only if it is positive sei-definite. Hence, the existence of a atching source depends on the positive definiteness of F N (ω). Definition: Let f : R C be a coplex-valued function, and t 1,..., t s be a set of points in R. Then f is said to be positive sei- definite (non-negative definite) if for any t i R and a i C, i = 1,..., s we have s s a i a j f(t i t j ) 0 (1) i=1 j=1 where a j is the coplex conjugate of a j. Equivalently, we require that the s s atrix constructed with f(t i t j ) be positive sei-definite. If function f is positive sei-definite, its Fourier transfor, F (ω) 0, ω R. Hence, in the case of our candidate characteristic function, this requireent ensures that the corresponding density is indeed non-negative everywhere. We note that characterizing the entire set of F N (ω) where F N (ω) is positive sei-definite ay be a difficult task. Instead we illustrate with various cases of interest where F N (ω) is or is not positive sei-definite. Let us start with a siple but useful case.

Corollary 1: If Z, a atching source distribution exists, regardless of the noise distribution. Proof: Fro (0), integer yields to the valid characteristic function of the rando variable X X = N i () i=1 where N i are independent and identically distributed as N. Let us recall the concept of infinite divisibility, which is closely related to our proble. Definition [10]: A distribution with characteristic function F (ω) is called infinitely divisible, if for each integer k 1, there exists a characteristic function F k (ω) such that F (ω) = (F k (ω)) k. Infinitely divisible distributions have been studied extensively in probability theory [10], [11]. It is known that Poisson, exponential, and geoetric distributions as well as the set of stable distributions (which includes Gaussian and Gaa distributions) are infinitely divisible. On the other hand, it is easy to see that distributions of discrete rando variable with finite alphabets are not infinitely divisible. Corollary : A atching source distribution exists for any positive R if f N (n) is infinitely divisible. Proof: It is easy to show fro the definition of infinite divisibility that (F N (ω)) r is a valid characteristic function for all rational r > 0, using Corollary 1. Using the fact that every R is a liit of a sequence of rational nubers r n, and by the continuity theore [1], we conclude that F X (ω) = [F N (ω)] is a valid characteristic function. However, the converse of the above corollary is not true: There can exist a atching source, even though f N is not infinitely divisible. For exaple, a finite alphabet discrete rando variable v is not infinitely divisible but still can be k-divisible, where k < V 1 and V is the cardinality of v. Hence, when = 1 k, there ight exist a atching source, even though noise is not infinitely divisible. Let us now identify a case in which a atching source does not exist. When F N (ω) is real and negative for soe ω, i.e. f N (n) is syetric but not positive sei-definite, a atching source does not exist. We state this in the for of a corollary. Corollary 3: For / Z, if F N (ω) R and ω such that F N (ω) < 0, a atching source distribution does not exist. Proof: We prove this corollary by contradiction. Let F N (ω) be a valid characteristic function. Recall the orthogonality property of the optial estiator for MSE, i.e., (8). Let η(y ) = Y for = 1,, 3...M. Plugging the best linear estiator h(y ) = Y and replacing Y with X + N, we obtain the condition {[ E X +1 (X + N) + 1 ] } (X + N) = 0 for = 1,.., M (3) Expressing (X + N) as a binoial expansion ( ) (X + N) = X i N i (4) i and rearranging the ters, we obtain the M +1 linear equations that recursively connect all oents of f X (x) up to M + 1, i.e., for each = 1,..., M we have 1 E(X +1 ) = E(N +1 ) + A(,, i)e(n i+1 )E(X i ) (5) where, A(,, i) = ( ) ( i i+1). It follows fro (5) that, if all odd oents of N are zero, then so are all odd oents of X. Hence, when the noise is syetric, the atching source ust also be syetric. However, if / Z, by (0), it follows that F X (ω) is not real, and hence f X (x) is not syetric. This contradiction shows that no atching source exists when / Z and noise distribution is syetric but not positive sei-definite. V. SPECIAL CASES In this section, we return to L p nor and investigate soe special cases obtained by varying. Theore 3: Given a source and noise of equal variance, the optial estiator is linear if and only if the noise and source distributions are identical. Proof: For MSE, it is straightforward to see fro (0) that, at = 1, characteristic functions ust be identical. The characteristic function uniquely deterines the distribution [1]. Alternatively, it can be observed directly fro (1) for L p nor that F N (ω) = F X (ω) satisfies the optiality condition. Theore 4 (for MSE only): In the liit 0, the MSE optial estiator converges to linear in probability if the channel is Gaussian, regardless of the source. Siilarly, as, the MSE optial estiator converges to linear in probability if the source is Gaussian, regardless of the channel. Proof: The proof applies the law of large nubers to (0) for MSE. For the Gaussian channel with asyptotically low SNR, (asyptotical) optiality of linear estiation can also be deduced fro Eq. 91 of [13]. We conjecture that this theore also holds for L p nor, although we currently do not have a proof. Let us consider a setup with a given source and noise variables which ay be scaled to vary the SNR. Can the optial estiator be linear at different values of? This question is otivated by the practical setting where is not known in advance or ay vary (e.g. in the design stage of a counication syste). It is well-known that the Gaussian source-gaussian noise pair akes the optial estiator linear at all levels. Below, we show that this is the only sourcechannel pair whose optial estiators are linear at ultiple values. Theore 5: Let the source or channel variables be scaled to vary the SNR,. The L p nor optial estiator is linear at two different values 1 and, if and only if both the source and the channel noise are Gaussian. Proof: This theore can be proved fro the set of oent equations (5). Let us say the noise is scaled by

α R, i.e N = αn. The relation between the oents of the original and scaled noise E(N ) = α E(N ) for = 1,.., M + 1 (6) Also, a set of oent equations should hold for 1 and. For clarity, we focus on the MSE nor, but the proof for L p nor follows the sae lines. The key observation is that, as entioned in Sec II.D, the sae linear estiator is optial for a Gaussian source-channel pair with L p nor. 1 E(X +1 ) = j E(N +1 )+ A( j,, i)e(n i+1 )E(X i ) (7) where = 1,.., M, j = 1, and A(,, i) = ( ) ( i i+1). Note that every equation introduces a new variable E(X +1 ), for = 1,.., M, so each new equation is independent of its predecessors. Let us consider solving these equations recursively, starting fro = 1. At each, we have three unknowns (E(X +1 ), E(N +1 ), E(N +1 )) that are related linearly. Since the nuber of equations is equal to the nuber of unknowns for each, there ust exist a unique solution. We know that the oents of the Gaussian sourcechannel pair satisfy (7). For the Gaussian rando variable, the oents uniquely deterine the distribution [14], so Gaussian source and noise are the only solution. Alternate Proof: Theore 5 can be proved, only for MSE, in an alternative way. Assue the sae terinology as above. Then, σn = α σn and F N (ω) = F N (ωα). Let, Using (0), 1 = σ x σn, = σ x α σn (8) F X (ω) = F N (ω) 1, F X (ω) = F N (ωα) (9) Taking the logarith on both sides of (9) and plugging (8) into (9), we obtain α = log F N(αω) log F N (ω) (30) Note that (30) should be satisfied for both α and α since they yield the sae. Plugging α = 1 in (30), we obtain F N (ω) = F N ( ω), ω. Using the fact that every characteristic function should be conjugate syetric (i.e. F N ( ω) = F N (ω)), we get F N(ω) R, ω. As log F N (ω) is R C, the Weierstrass theore [15] guarantees that there is a sequence of polynoials that uniforly converges to it: log F N (ω) = k 0 +k 1 ω+k ω +k 3 ω 3..., where k i C. Hence, by (30) we obtain: α = k 0 + k 1 ωα + k (ωα) + k 3 (ωα) 3... k 0 + k 1 ω + k ω + k 3 ω 3, ω R, (31)... which is satisfied for all ω only if all coefficients k i vanish, except for k, i.e. log F N (ω) = k ω, or log F N (ω) = 0 ω R (the solution α = 1 is not relevant in this case). The latter is not a characteristic function, and the forer is the Gaussian characteristic function, F N (ω) = e kω (where we use the established fact that F N (ω) R.) Since a characteristic function deterines the distribution uniquely, the Gaussian source and noise ust be the only such pair. VI. COMMENTS ON THE EXTENSION TO HIGHER DIMENSIONS Extension of the conditions to the vector case is nontrivial due to the fact that individual SNR values for each vector coponent can differ. Currently we do have the solution for this extension, but it is left out due to space constraints. VII. CONCLUSION In this paper, we derived conditions under which the optial estiator linear for L p nor. We identified the conditions for the existence and uniqueness of a source distribution that atches the noise in a way that ensures linearity of the optial estiator for the special case of p =. One trivial exaple of this type of atching occurs for Gaussian source and Gaussian noise at all SNR levels. Another instance of atching happens when the source and noise are identically distributed where the optial estiator is h(y ) = 1 Y. We also show that Gaussian source-channel pair is unique in that it is the only source-channel pair for which the optial estiator is linear at ore than one SNR value. Moreover, we show the asyptotical linearity of MSE optial estiators for low SNR if the channel is Gaussian regardless of the source and vice versa, for high SNR if the source is Gaussian regardless of the channel. REFERENCES [1] S.M. Kay, Fundaentals of Statistical Signal Processing, Prentice Hall PTR, 1993. [] V.P. Skitovic, Linear cobinations of independent rando variables and the noral distribution law, Selected Translations in Matheatical Statistics and Probability, p. 11, 196. [3] S.G. Ghurye and I. Olkin, A characterization of the ultivariate noral distribution, The Annals of Matheatical Statistics, pp. 533 541, 196. [4] MM Rao and RJ Swift, Probability Theory with Applications, Springer, 005. [5] RG Laha, On a characterization of the stable law with finite expectation, The Annals of Matheatical Statistics, vol. 7, no. 1, pp. 187 195, 1956. [6] A. Balakrishnan, On a characterization of processes for which optial ean-square systes are of specified for, IEEE Transactions on Inforation Theory, vol. 6, no. 4, pp. 490 500, 1960. [7] D.G. Luenberger, Optiization by Vector Space Methods, John Wiley & Sons Inc, 1969. [8] A. Gersho and R.M. Gray, Vector Quantization and Signal Copression, Springer, 199. [9] S. Sheran, Non-ean-square error criteria, IEEE Transactions on Inforation Theory,, vol. 4, no. 3, pp. 15 16, 1958. [10] E. Lukacs, Characteristics Functions, Charles Griffin and Copany, 1960. [11] F.W. Steutel and K. Van Harn, Infinite divisibility of probability distributions on the real line, CRC, 003. [1] P. Billingsley, Probability and Measure, John Wiley & Sons Inc, 008. [13] D. Guo, S. Shaai, and S. Verdu, Mutual inforation and iniu ean-square error in Gaussian channels, IEEE Transactions on Inforation Theory, vol. 51, no. 4, pp. 161 18, 005. [14] JA Shohat and J.D. Taarkin, The Proble of Moents, New York, 1943. [15] R.M. Dudley, Real Analysis and Probability, Cabridge Univ Pr, 00.