Robust Least-Squares Filtering With a Relative Entropy Constraint

Similar documents
Auxiliary signal design for failure detection in uncertain systems

A NUMERICAL METHOD TO SOLVE A QUADRATIC CONSTRAINED MAXIMIZATION

Probability and Statistics for Final Year Engineering Students

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Gaussian Estimation under Attack Uncertainty

Chapter 2 Wiener Filtering

THE problem of parameter estimation in linear models is

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

H Estimation. Speaker : R.Lakshminarayanan Guide : Prof. K.Giridhar. H Estimation p.1/34

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

SRI VIDYA COLLEGE OF ENGINEERING AND TECHNOLOGY UNIT 3 RANDOM PROCESS TWO MARK QUESTIONS

Discrete-Time H Gaussian Filter

The goal of the Wiener filter is to filter out noise that has corrupted a signal. It is based on a statistical approach.

ECE534, Spring 2018: Solutions for Problem Set #5

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Statistical and Adaptive Signal Processing

FIR Filters for Stationary State Space Signal Models

ON THE BOUNDEDNESS BEHAVIOR OF THE SPECTRAL FACTORIZATION IN THE WIENER ALGEBRA FOR FIR DATA

September Math Course: First Order Derivative

Properties of the Autocorrelation Function

Linear-Quadratic Optimal Control: Full-State Feedback

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Part III Spectrum Estimation

Random Processes Handout IV

26. Filtering. ECE 830, Spring 2014

Gaussian Basics Random Processes Filtering of Random Processes Signal Space Concepts

Machine Learning. Support Vector Machines. Manfred Huber

Noise-Shaped Predictive Coding for Multiple Descriptions of a Colored Gaussian Source

EE 227A: Convex Optimization and Applications October 14, 2008

Review of Optimization Methods

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

Square Root Raised Cosine Filter

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.

OPTIMAL FUSION OF SENSOR DATA FOR DISCRETE KALMAN FILTERING Z. G. FENG, K. L. TEO, N. U. AHMED, Y. ZHAO, AND W. Y. YAN

Nonparametric and Parametric Defined This text distinguishes between systems and the sequences (processes) that result when a WN input is applied

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Fisher Information in Gaussian Graphical Models

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

On the Feedback Capacity of Stationary Gaussian Channels

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Assesment of the efficiency of the LMS algorithm based on spectral information

Semidefinite Programming Basics and Applications

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs

New Introduction to Multiple Time Series Analysis

Q3. Derive the Wiener filter (non-causal) for a stationary process with given spectral characteristics

BOUNDARY CONDITIONS FOR SYMMETRIC BANDED TOEPLITZ MATRICES: AN APPLICATION TO TIME SERIES ANALYSIS

Signals and Spectra - Review

ROBUST MINIMUM DISTANCE NEYMAN-PEARSON DETECTION OF A WEAK SIGNAL IN NON-GAUSSIAN NOISE

Probabilistic Graphical Models

The Poisson Channel with Side Information

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Long-Run Covariability

Feedback Control of Turbulent Wall Flows

Bayesian Regularization

An Information Theoretic Approach to Analog-to-Digital Compression

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Constrained Optimization and Lagrangian Duality

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

The Bock iteration for the ODE estimation problem

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Riccati difference equations to non linear extended Kalman filter constraints

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

Stochastic Processes. A stochastic process is a function of two variables:

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Information maximization in a network of linear neurons

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

An Information Theoretic Approach to Analog-to-Digital Compression

Conjugate Predictive Distributions and Generalized Entropies

ECE531 Lecture 10b: Maximum Likelihood Estimation

COMPLEX SIGNALS are used in various areas of signal

Convexity/Concavity of Renyi Entropy and α-mutual Information

ECE-340, Spring 2015 Review Questions

Numerical Optimization of Partial Differential Equations

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

Stochastic Processes

Lecture 7: Weak Duality

The H infinity fixed-interval smoothing problem for continuous systems

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Brief Review on Estimation Theory

Estimation, Detection, and Identification CMU 18752

Lecture 22: Variance and Covariance

Robust H Filter Design Using Frequency Gridding

LTI Systems, Additive Noise, and Order Estimation

1 Computing with constraints

You should be able to...

Convex Optimization M2

Support Vector Machines

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Subject: Optimal Control Assignment-1 (Related to Lecture notes 1-10)

ADDITIVE noise is most often represented by a fixed

Transcription:

Robust Least-Squares Filtering With a Relative Entropy Constraint Bernard C. Levy ECE Department University of California Davis, CA 95616 USA and Abstract Ramine Nikoukhah INRIA Domaine de Voluceau Rocquencourt France We formulate a robust Wiener filtering problem for wide-sense stationary (WSS) Gaussian processes in the presence of modelling errors. It requires solving a minimax problem that consists of finding the best filter for the least-favorable statistical model within a neighborhood of the nominal model. The neighborhood is formed by models whose relative entropy with respect to the nominal model is less than a fixed constant. The standard noncausal Wiener filter is optimal, but the causal Wiener filter is not optimal, and a characterization is provided for the best filter and the corresponding least-favorable model. 1 Introduction We examine the least-squares filtering problem for a signal based on observations over an infinite interval. When exact joint second order statistics are available for the signal and observations, the signal estimate is generated by a causal or noncausal Wiener filter, depending on whether a causality constraint is imposed on the estimation filter. Unfortunately, in most situations the statistics of the signal and observations are known only imprecisely. In such situations, the estimates produced by an ordinary Wiener filter based on the nominal model may be less accurate than those generated by more conservative filters that take into account the possible existence of modelling errors. In this context, the recent prominence of H and set membership robust filtering techniques 1, 2 appears to have overshadowed another fruitful approach proposed approximately 2 years ago by Kassam, Poor and their collaborators 3, 4, 5, 6. In this approach, which was inspired by Huber s pioneering work 7 in robust statistics, the actual joint spectral density of the signal and observations is assumed to belong to a neighborhood of the nominal model. This neighborhood or uncertainty class can be specified in a variety of ways. It can be based on an ɛ-contamination model of the type originally considered by Huber, a total variation model 4, 5, or a spectral band model 3, 6 wherein the power spectral densities (PSDs) specifying the signal and observations are required to stay within a band centered on the nominal PSD. Other classes that have been considered in the literature 6 include p-point models which allocate fixed amounts of power to certain spectral bands and moment-constrained models where some moments are fixed. The robust filtering problem then reduces to the solution of a minimax problem where one seeks to find the best filter for the worst set of statistics in the specified uncertainty class. This approach is straightforward at the conceptual level, but sometimes difficult to implement in practice, since even when the minimax problem is of convex-concave type so that a saddle point exists, it is not always possible to give a simple parametrization to the saddle point. In this paper, the neighborhood of the nominal model we consider is formed by models whose relative entropy with respect to the nominal model is bounded by a fixed constant. Section 2 1

considers the robust noncausal least-squares filtering problem for WSS Gaussian processes. In this case, it is shown that the noncausal Wiener filter is robust with respect to the class of perturbations considered. On the other hand, for causal filters, it is shown in Section 3 that the standard Wiener filter is not robust and a characterization is provided for the optimum causal filter and the corresponding least-favorable statistical model. The Section 4 presents some conclusions and provides directions for further research. 2 Noncausal Wiener Filtering Let X(t) and Y (t) be two jointly stationary scalar Gaussian processes over Z. The nominal and actual Gaussian probability measures P and P of the process X(t) Z(t) =, (2.1) Y (t) are specified respectively by the mean and autocovariance functions m X E P Z(t) = m Z = m Y E P (Z(t + k) m Z )(Z(t) m Z ) T K X (k) = K Z (k) = K Y X (k) and Then if K XY (k) K Y (k) (2.2) (2.3) m X E P Z(t) = m Z = (2.4) m Y E P (Z(t + k) m Z )(Z(t) m Z ) T = K KX (k) KXY (k) Z (k) =. (2.5) K Y X (k) KY (k) R Z (k) = m Z m T Z + K Z (k) (2.6) R Z (k) = m Z m T Z + K Z (k) (2.7) denote respectively the nominal and actual autocorrelation functions of Z(t), the nominal and true power spectral densities of Z(t) can be written as S Z (e jθ ) = R Z (k)e jkθ = k= S X (e jθ ) S XY (e jθ ) S Y X (e jθ ) S Y (e jθ ). (2.8) and S Z (e jθ ) = R(k)e jkθ = k= SX (e jθ ) SXY (e jθ ) S Y X (e jθ ) SY (e jθ ). (2.9) 2

They can be decomposed as S Z (e jθ ) = 2πm Z m T Zδ(θ) + Σ Z (e jθ ) (2.1) S Z (e jθ ) = 2π m Z m T Zδ(θ) + Σ Z (e jθ ), (2.11) where δ(.) denotes the impulse function, and Σ Z (e jθ ) and Σ(e jθ ) are respectively the discrete-time Fourier transforms of K Z (k) and K Z (k). For stationary Gaussian processes, the relative entropy between the true and nominal models takes the form of the Itakura and Saito spectral distortion measure 8 where I( m Z, Σ Z ; m Z, Σ Z ) = m T ZΣ 1 Z (1) m Z + 1 2π tr{(σ 1 Z 2π Σ Z )(e jθ ) I 2 } ln det(σ 1 Σ Z Z )(e jθ ), (2.12) m Z = mz m Z. (2.13) In order to ensure the existence of I, we assume that the nominal spectral density matrix Σ Z (e jθ ) is positive definite for all θ, 2π. Note that since ln det M is a concave function of positive definite matrices M 9, pp. 51 52, I is a convex functional of m Z and Σ Z ( ). We consider the noncausal Wiener filtering problem where the goal is to estimate X(t) given the observations {Y (s), s Z}. Let ˆX(t) = k= G k Y (t k) + h (2.14) denote the estimate of X(t) produced by the pair (G, h) formed by the filter G(z) = k= G k z k (2.15) and the additive constant h. We assume that the filter G(z) is BIBO stable, which ensures the existence of the discrete-time Fourier transform G(e jθ ). The estimation error is given by E(t) = X(t) ˆX(t) = δ k G k Z(t k) h, (2.16) k= so that the mean and variance of the estimation error under the probability measure P can be expressed as m E = E P E(t) = 1 G(1) m Z h (2.17) and K E = E P (E(t) m E ) 2 = 1 2π 1 G(e jθ ) 2π ΣZ (e jθ ) 1 G(e jθ ) dθ. (2.18) 3

Consider now the convex ball B = {( m Z, Σ Z ) : I( m Z, Σ Z ; m Z, Σ Z ) c} (2.19) centered on the nominal statistics (m Z, Σ Z ). Let also S be the class of estimators (G, h) such that G(z) is BIBO stable. We seek to solve the minimax Wiener filtering problem min max (G,h) S ( m Z, Σ Z ) B J( m Z, Σ Z ; G, h) (2.2) where J( m Z, Σ Z ; G, h) = E P E 2 (t) = m 2 E + K E. (2.21) The solution of the minimax problem is obtained by exhibiting a saddle point for the objective function J 1, Chap. 6. Theorem 2.1 The function J( m Z, Σ Z ; G, h) admits a saddle point (( m, Σ )), (G, h )) B S. The estimator G (z) = Σ XY (z)σ 1 Y (z), h = m X G (1)m Y (2.22) is the optimal noncausal Wiener filter for both the nominal and least-favorable statistics of Z(t), and the least-favorable statistics have the structure ΣX (z) Σ XY (z) m = m Z, Σ (z) =, (2.23) Σ Y X (z) Σ Y (z) so that only Σ X (z) is perturbed. If S E (z) = Σ X (z) Σ XY (z)σ 1 Y (z)σ Y X(z) (2.24) denotes the power spectrum of the error for the least favorable statistics and S E (z) = Σ X (z) Σ XY (z)σ 1 Y (z)σ Y X(z) (2.25) represents the error power spectrum for the nominal statistics, S E (z) can be expressed in terms of S E (z) as S E (z) = S 1 E (z) λ 1 1 (2.26) where there exists a unique Lagrange multiplier such that I( m, Σ, m Z, Σ Z ) = c. Proof: We need to show that λ > r(s E ) = max θ,2π S E(e jθ ) (2.27) J( m Z, Σ Z ; G, h ) J( m, Σ ; G, h ) J( m, Σ ; G, h) (2.28) for all ( m Z, Σ Z ) B and (G, h) S. The second inequality is just a consequence of the fact that (G, h ) is the optimum noncausal Wiener filter for the pair ( m, Σ ). 4

To verify that ( m, Σ ) maximizes J( m Z, Σ Z ; G, h ) in the ball B, we form the Lagrangian L( m Z, Σ Z, λ) = J( m Z, Σ Z ; G, h ) + λ(c I( m Z, Σ Z ; m Z, Σ Z )). (2.29) The first order Gateaux derivatives of L with respect to m Z in the direction of u R 2 and with respect to Σ(z) in the direction of a para-symmetric 2 2 matrix function V (z) are given respectively by mz,ul = 2 m T Z ( I n G (1) I n G (1) ΣZ,V L = 1 2π {( I n tr 2π G (e jθ ) λ(σ 1 Z (ejθ 1 ) Σ Z (ejθ )) λσ 1 Z (1) )u (2.3) I n G (e jθ ) ) } V (e jθ ) dθ. (2.31) Recall that a matrix function V (z) is said to be para-symmetric if it admits the symmetry V (z) = V # (z) with We have also ( mz,u) 2 L = 2u T ( I n G (1) ( ΣZ,V )2 L = λ 2π 2π V # (z) = V T (z 1 ). (2.32) I n G (1) λσ 1 Z (1) )u (2.33) mz,u ΣZ,V L = (2.34) = λ 2π 2π where F (z) is an arbitrary matrix spectral factor of tr{( Σ 1 1 Z V Σ Z V )(ejθ )}dθ tr{( F V Σ 1 Z V F # )(e jθ )}dθ (2.35) 1 Σ Z (z), i.e., Σ 1 Z (z) = F # (z) F (z). (2.36) Thus, provided that the Lagrange multiplier λ is such that the matrix I n W = I n G (1) λσ 1 Z (1) (2.37) G (1) is negative definite, the Hessian H mz, Σ Z (u, V ) = 2 m Z,u + 2 mz,u ΣZ,V + 2 ΣZ L (2.38),V will be negative definite for all m Z and Σ Z, so that L is a concave function of m Z and Σ Z. Let Q be an arbitrary matrix square-root of Σ Z (1), i.e., Σ Z (1) = QQ T. Premultiplying W on the left by Q T and on the right by Q, and denoting M = 1 G (1) Q, (2.39) 5

we find that W is congruent to W = M T M λi 2, (2.4) which will be negative definite provided λ > MM T = S E (1). (2.41) The Lagrangian L( m Z, Σ Z, λ) is then maximized by setting mz,ul = and ΣZ,V L = for all u and V (z), which yields m Z = and Σ 1 (z) = Σ 1 Z (z) 1 λ 1 G (z 1 ) 1 G (z). (2.42) Noting that Σ Z (z) admits the block spectral factorization 1 G (z) S E (z) Σ Z (z) = 1 Σ Y (z) 1 G (z 1 ) 1 the identity (2.42) implies that Σ (z) admits the factorization 1 G (z) SE (z) 1 Σ (z) = 1 Σ Y (z) G (z 1 ) 1, (2.43) (2.44) where the error spectrum S E (z) obeys (2.26). The structure (2.43) and (2.44) of Σ Z and Σ implies (Σ 1 Σ 1 S 1 Z )(z) = S G (z 1 E E ) 1 1 1 G (z 1 ) 1 (2.45) so that and thus I( m, Σ ; m Z, Σ Z ) = 1 2π Substituting (2.26), this gives with where tr(σ Z Σ I 2 ) = S 1 E S E 1 (2.46) det(σ Z Σ ) = S 1 E S E (2.47) 2π (S 1 S E E )(e jθ ) 1 ln(s 1 S E E (e jθ ))dθ. (2.48) I( m, Σ ; m Z, Σ Z ) = γ(λ) (2.49) γ(λ) = 1 2π γ(λ, S E (e jθ ))dθ (2.5) 2π γ(λ, d) = λ λ d 1 ln( λ λ d ). (2.51) Note that the integral on the right hand side of (2.5) converges only if λ > r(s E ). For λ > d >, we have d dλ γ(λ, d) = d 2 λ(λ d) 2 < (2.52) 6

and lim γ(λ, d) =, lim λ γ(λ, d) = +, (2.53) λ d so that γ(λ, d) is a monotone decreasing function of λ. Since γ(λ) is expressed as an integral of this function, it is itself monotone decreasing with lim γ(λ) =, λ lim γ(λ) =. (2.54) λ r(s E ) For any positive c, there exists therefore a unique λ such that γ(λ ) = c. For this value of the Lagrange multiplier the condition (2.41) is satisfied, so that the saddle point identity (2.28) holds with I( m, Σ, m Z, Σ Z ) = c. (2.55) Thus, the noncausal Wiener filter is robust with respect to a relative entropy mis-modelling criterion. This robustness property should probably not come as a complete surprise since it is already known that H 2 smoothers are also optimal with respect to the H criterion 11. However, the performance of the Wiener filter is affected by model errors. Specifically, the MSE corresponding to the nominal model is given by whereas the MSE for the least favorable statistics takes the form MSE = 1 2π S E (e jθ )dθ (2.56) 2π MSE = 1 2π S E (e jθ )dθ. (2.57) 2π Taking into account (2.26), the excess MSE, i.e., the additional mean-square error occasioned by the least-favorable model perturbation, can be expressed as where λ satisfies γ(λ ) = c. Then, using the inequality MSE exc = MSE MSE = 1 2π SE 2 (ejθ ) 2π λ S E (e jθ dθ (2.58) ) x 2 1 x > x + ln(1 x) (2.59) 1 x for < x < 1 with x = S E (e jθ )/λ, (2.6) and taking into account the expression (2.48) for I, one finds (see 12 for details) MSE exc > λ c. (2.61) 7

3 Robust Causal Estimation Up to this point, we have not imposed any causality constraint on the estimators under consideration. The class C of causal estimators is formed by the pairs (G(z), h), such that G(z) is stable and G k = (3.1) for k <, so that G(z) is analytical on and outside the unit circle. Then, consider the minimax problem (2.2) (2.21) with S replaced by C. Applying an analysis similar to the one of Section 2, it is not difficult to find that a saddle point ( m, Σ ), (G (z), h ) will have the following structure. The least-favorable statistics take the form by ΣX (z) ΣXY (z) m = m Z, Σ (z) = (3.2) Σ Y X (z) ΣY (z) Let Σ Y (z) = F (z 1 ) F (z) (3.3) be a spectral factorization of Σ Y (z) where the filter F (z) has minimum phase. Let also {K(z)} + represent the causal part of an arbitrary filter K(z). Then, the estimator G (z) = { Σ XY F 1 (z 1 )} + F 1 (z), h = m X G (1)m Y (3.4) is the optimal causal Wiener filter for the statistics ( m, Σ (z)), and Σ (z) satisfies the identity Σ 1 (z) = Σ 1 Z (z) 1 1 λ G (z 1 1 G (z), (3.5) ) where the Lagrange multiplier is selected such that (2.55) holds. To see how the above conditions were derived, note that given the least-favorable model ( m, Σ ), the second inequality in (2.28) implies that (G, h ) must be the optimal causal Wiener filter for this model. Given the estimator (G, h ), the maximization of J( m Z, Σ Z ; G, h ) can be performed by using the approach of Theorem 2.1. This yields m = m Z and the identity (3.5) for Σ (z). To ensure that the filter (G, h ) and statistics ( m, Σ ) represent an actual saddle point of J( m Z, Σ Z ; G, h), we need only to prove that after substitution of (3.4) inside (3.5), the resulting equation admits a solution Σ (z) which is positive definite on the unit circle for a value of λ such that I( m, Σ, m Z, Σ Z ) = c. Unfortunately, unlike the causal case, the equation (3.5) does not appear to admit a closed form solution. If S E (z) denotes the Schur complement of Σ Y (z) inside Σ (z) and if S E (z) is the Schur complement of Σ Y (z) inside Σ Z (z), by matching the (1, 1) blocks on both sides of (3.5), it is easy to verify that (2.26) still holds. However, the other blocks of Σ (z) are harder to evaluate. 4 Conclusions In this paper, we have proposed a methodology for robust filtering that employs the relative entropy as a measure of proximity between two statistical models. By examining the resulting minimax problem, it was shown that the standard noncausal Wiener filter is optimal. However, the causal 8

Wiener filter is not optimal and a characterization was given for the structure of an optimal filter and the matching least-favorable statistical model. The higher level of difficulty of robust causal filtering problems should not come as a surprise, since previous work on robust Wiener filtering 4, 5, 6 had also this feature. The report 12 extends the results described here to least-squares filtering problems defined over a finite interval. In this case, the relative entropy is expressed as the Kullback-Leibler (KL) divergence 13. The formulation outlined here and in 12 provides only a starting point for the investigation of robust filtering from a KL divergence viewpoint. Several topics appear to deserve further investigation. First, it would be of interest to develop numerical techniques, possibly iterative, to solve the coupled spectral equations (3.4) (3.5) for the optimal causal filter and the associated least-favorable model. Also, up to this point, we have considered unstructured perturbations for the joint statistics of the process to be estimated and the observations. However, there exists situations where it makes sense to consider structured perturbations. For example, given observations of the form Y (t) = X(t) + N(t), (4.1) if the noise process N(t) is independent of X(t), it would be of interest to restrict the class of model perturbations to those that maintain this property. Finally, we have limited our attention to external statistical descriptions of the observations and process to be estimated. It would be of interest to extend our results to situations where an internal state-space model of the system is available. References 1 B. Hassibi, A. H. Sayed, and T. Kailath, Indefinite-Quadratic Estimation and Control. Philadelphia, PA: Soc. for Indust. Applied Math., 1999. 2 I. R. Petersen and A. V. Savkin, Robust Kalman Filtering for Signals and Systems with Large Uncertainties. Boston, MA: Birkhäuser, 1999. 3 S. A. Kassam and T. L. Lim, Robust Wiener filters, J. Franklin Institute, vol. 34, pp. 171 185, 1977. 4 H. V. Poor, On robust Wiener filtering, IEEE Trans. Automat. Control, vol. 25, pp. 531 536, June 198. 5 K. S. Vastola and H. V. Poor, Robust Wiener-Kolmogorov theory, IEEE Trans. Informat. Theory, vol. 3, pp. 316 327, Mar. 1984. 6 S. A. Kassam and H. V. Poor, Robust techniques for signal processing: a survey, Proc. IEEE, vol. 73, pp. 433 481, Mar. 1985. 7 P. J. Huber, Robust Statistics. New York, NY: J. Wiley, 1981. 8 D. Kazakos and P. Papantoni-Kazakos, Spectral distance measures between Gaussian processes, IEEE Trans. Automat. Control, pp. 95 959, Oct. 198. 9

9 T. M. Cover and J. A. Thomas, Elements of Information Theory. J. Wiley, 1991. 1 I. Ekeland and R. Temam, Convex Analysis and Variational Problems. Philadelphia, PA: Soc. Indstrial Applied Math., 1999. 11 K. N. Nagpal and P. P. Khargonekar, Filtering and smoothing in an H setting, IEEE Trans. Automat. Control, vol. 36, pp. 152 166, Feb. 1991. 12 B. C. Levy and R. Nikoukhah, Robust least-squares estimation with a relative entropy constraint, Tech. Rep., Institut de Recherche en Informatique et Automatique, Rocquencourt, France, May 22. 13 S. Kullback, Information Theory and Statistics. New York: John Wiley, 1959. 1