Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity

Similar documents
Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity

KNOWN approaches for improving the performance of

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

Diffusion LMS Algorithms for Sensor Networks over Non-ideal Inter-sensor Wireless Channels

Exploiting Sparsity for Wireless Communications

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

Analysis of incremental RLS adaptive networks with noisy links

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Asymptotically Optimal and Bandwith-efficient Decentralized Detection

Chapter 2 Fundamentals of Adaptive Filter Theory

Set-Membership Constrained Particle Filter: Distributed Adaptation for Sensor Networks

Compressed Sensing and Sparse Recovery

Expressions for the covariance matrix of covariance data

Riccati difference equations to non linear extended Kalman filter constraints

1 Computing with constraints

City, University of London Institutional Repository

FERMENTATION BATCH PROCESS MONITORING BY STEP-BY-STEP ADAPTIVE MPCA. Ning He, Lei Xie, Shu-qing Wang, Jian-ming Zhang

SPARSE signal representations have gained popularity in recent

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Q-Learning and Stochastic Approximation

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Online Adaptive Estimation of Sparse Signals: where RLS meets the l 1 -norm

1 Introduction 198; Dugard et al, 198; Dugard et al, 198) A delay matrix in such a lower triangular form is called an interactor matrix, and almost co

Newton-like method with diagonal correction for distributed optimization

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Statistical and Adaptive Signal Processing

Auxiliary signal design for failure detection in uncertain systems

Approximating the Covariance Matrix with Low-rank Perturbations

A STATE-SPACE APPROACH FOR THE ANALYSIS OF WAVE AND DIFFUSION FIELDS

Sparse Gaussian conditional random fields

Prioritized Sweeping Converges to the Optimal Value Function

Adaptive beamforming for uniform linear arrays with unknown mutual coupling. IEEE Antennas and Wireless Propagation Letters.

Stability and the elastic net

Chapter 2 Wiener Filtering

Model-based Correlation Measure for Gain and Offset Nonuniformity in Infrared Focal-Plane-Array Sensors

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

NON-LINEAR CONTROL OF OUTPUT PROBABILITY DENSITY FUNCTION FOR LINEAR ARMAX SYSTEMS

Near Optimal Adaptive Robust Beamforming

ADAPTIVE FILTER THEORY

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications

ANYTIME OPTIMAL DISTRIBUTED KALMAN FILTERING AND SMOOTHING. University of Minnesota, 200 Union Str. SE, Minneapolis, MN 55455, USA

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay

Soft-Output Trellis Waveform Coding

DESIGN OF QUANTIZED FIR FILTER USING COMPENSATING ZEROS

Lecture Notes 1: Vector spaces

9 Classification. 9.1 Linear Classifiers

Weighted Gossip: Distributed Averaging Using Non-Doubly Stochastic Matrices

On the Stability of the Least-Mean Fourth (LMF) Algorithm

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

A Subspace Approach to Estimation of. Measurements 1. Carlos E. Davila. Electrical Engineering Department, Southern Methodist University

GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS. Mitsuru Kawamoto 1,2 and Yujiro Inouye 1

Blind Source Separation with a Time-Varying Mixing Matrix

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

squares based sparse system identification for the error in variables

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

STAT 200C: High-dimensional Statistics

Numerical Methods. Rafał Zdunek Underdetermined problems (2h.) Applications) (FOCUSS, M-FOCUSS,

Optimization methods

Gaussian Mixtures Proposal Density in Particle Filter for Track-Before-Detect

IN real-world adaptive filtering applications, severe impairments

Iterative Methods for Solving A x = b

Regularization and Variable Selection via the Elastic Net

Newton-like method with diagonal correction for distributed optimization

CO-OPERATION among multiple cognitive radio (CR)

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

An algorithm for robust fitting of autoregressive models Dimitris N. Politis

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

A METHOD OF ADAPTATION BETWEEN STEEPEST- DESCENT AND NEWTON S ALGORITHM FOR MULTI- CHANNEL ACTIVE CONTROL OF TONAL NOISE AND VIBRATION

A Matrix Theoretic Derivation of the Kalman Filter

Comparison of Modern Stochastic Optimization Algorithms

Submitted to Electronics Letters. Indexing terms: Signal Processing, Adaptive Filters. The Combined LMS/F Algorithm Shao-Jen Lim and John G. Harris Co

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

Improved least-squares-based combiners for diffusion networks

Wavelet Footprints: Theory, Algorithms, and Applications

arxiv: v1 [cs.it] 26 Oct 2018

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez

UTILIZING PRIOR KNOWLEDGE IN ROBUST OPTIMAL EXPERIMENT DESIGN. EE & CS, The University of Newcastle, Australia EE, Technion, Israel.

Solving Corrupted Quadratic Equations, Provably

Battery-Aware Selective Transmitters in Energy-Harvesting Sensor Networks: Optimal Solution and Stochastic Dual Approximation

On Mixture Regression Shrinkage and Selection via the MR-LASSO

Machine Learning Linear Models

Optimization methods

Co-Prime Arrays and Difference Set Analysis

Two-Dimensional Sparse Arrays with Hole-Free Coarray and Reduced Mutual Coupling

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Improved Unitary Root-MUSIC for DOA Estimation Based on Pseudo-Noise Resampling

Transcription:

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) Robust in the Presence of Correlated Noise Using Outlier Sparsity Shahroh Farahmand and Georgios B. Giannais Abstract Relative to batch alternatives, the recursive leastsquares () algorithm has well-appreciated merits of reduced complexity and storage requirements for online processing of stationary signals, and also for tracing slowly-varying nonstationary signals. However, is challenged when in addition to noise, outliers are also present in the data due to e.g., impulsive disturbances. Existing robust approaches are resilient to outliers, but require the nominal noise to be white an assumption that may not hold in e.g., sensor networs where neighboring sensors are affected by correlated ambient noise. Pre-whitening with the nown noise covariance is not a viable option because it spreads the outliers to noncontaminated measurements, which leads to inefficient utilization of the available data and unsatisfactory performance. In this correspondence, a robust algorithm is developed capable of handling outliers and correlated noise simultaneously. In the proposed method, outliers are treated as nuisance variables and estimated jointly with the wanted parameters. Identifiability is ensured by exploiting the sparsity of outliers, which is effected via regularizing the least-squares (LS) criterion with the l -norm of the outlier vectors. This leads to an optimization problem whose solution yields the robust estimates. For low-complexity real-time operation, a sub-optimal online algorithm is proposed, which entails closed-form updates per time step in the spirit of. Simulations demonstrate the effectiveness and improved performance of the proposed method in comparison with the non-robust, and its state-of-the-art robust renditions. I. INTRODUCTION Online processing with reduced complexity and storage requirements relative to batch alternatives, as well as adaptability to slow parameter variations, are the main reasons behind the popularity of the recursive least-squares () algorithm for adaptive estimation of linear regression models. Despite its success in various applications, is challenged by the presence of outliers, that is measurement or noise samples not adhering to a pre-specified nominal model. One major source of outliers is impulsive noise, which appears for example as double-tal in an echo cancelation setting [3. In another application, which motivates the present contribution, outliers may arise with low-cost unreliable sensors deployed for estimating the coefficients of a linear regression model involving also (possibly correlated) ambient noise. Outlier-resilient approaches are imperative in such applications; see e.g., [0. Robust schemes have been reported in [3, [5, [, [3, and [6. Huber s M-estimation criterion was considered Copyright (c) 0 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. The authors are with the Dept. of Electrical and Computer Engr. at the Univ. of Minnesota, Minneapolis, MN 55455, USA; (e-mails: {shahroh, georgios}@umn.edu). This wor was supported by the AFOSR MURI Grant FA9550-0--0567. in [, Hampel s three-part redescending M-estimate was advocated in [6, and a modified Huber cost was put forth in [5. These approaches improve upon the non-robust when outliers are present, but they also have limitations: i) some entail costs that are not convex [5, thus providing no guarantee for achieving the global optimum; ii) the optimization problem to be solved per time step has dimensionality (and hence computations and storage) growing linearly with time; and iii) correlated nominal noise of nown covariance can not be handled, which is critical for estimation using a networ of distributed sensors because ambient noise is typically dependent across neighboring sensors. To mitigate the curse of dimensionality, [5 provides suboptimal online recursions in the spirit of, while [3 and [3 limit the effect of each new datum on the estimate, thus also ensuring low-complexity -lie iterations. However, all existing approaches cannot handle correlated ambient noise of nown covariance. It is worth stressing that pre-whitening is not recommended because it spreads the outliers to noncontaminated measurements rendering outlier-resilient estimation even more challenging. In this correspondence, outliers are treated as nuisance variables that are jointly estimated with the wanted regression coefficients. The identifiability challenge arising due to the extra unnowns is addressed by exploiting the outlier sparsity. The latter is effected by regularizing the LS cost with the l - norm of the outlier vectors. Consequently, one arrives at an optimization problem which can identify outliers while at the same time it can accommodate correlated ambient noise. Since dimensionality of this optimization problem also grows with time, its solution is utilized at the initialization stage, and provides an offline benchmar. To cope with dimensionality, an online variant is also developed, which enjoys low-complexity and closed-form updates in the spirit of. Initializing realtime updates with the offline benchmar, the novel online robust (OR-) algorithm is compared against and the robust schemes in [3 and [3 via simulations, which demonstrate its efficiency and improved performance. II. PROBLEM STATEMENT AND PRELIMINARIES Consider the outlier-aware linear regression model considered originally in [7, where (generally vector) measurements y R Dy involving the unnown vector x R Dx become available sequentially in time (indexed by ), in the presence of (possibly correlated) nominal noise v ; that is y = H x + v + o ()

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) where H denotes a nown regression matrix, o models the outliers arising from e.g., impulsive noise, and v is zeromean with nown covariance matrix R, uncorrelated with v : := [v T, v T,..., v T T, but possibly correlated across its own entries. The wanted vector x is either constant, or, varies slowly with. The goal is to estimate x and o, given the measurements y : := [y T, y T,..., y T T. To motivate the vector model in (), consider a sensor networ with each sensor measuring distances (in rectangular coordinates) between itself and the target that is desired to localize and trac. The sensor measurements are synchronized by the cloc of an access point (AP), and are corrupted by outliers as well as nominal noise that is correlated across neighboring sensors. Had o been equal to zero or nown for that matter, one could apply the algorithm on the outlier compensated measurements y o. Therefore, at time step, one would solve [, p. 5 ˆx := arg min x J (x ) := arg min x β i y i H i x o i R i where 0 < β is a pre-selected forgetting factor, and x R := xt Rx. Equating to zero the gradient of J (x ) in (), yields ( β i H T i R i H i ) ˆx = ( β i H T i R i (y i o i ) or in a compact matrix-vector notation as Φ ˆx = φ, with obvious definitions for φ and Φ assumed invertible. The latter can be solved online using the recursions (see e.g., [) ˆx = ˆx + W (y o H ˆx ) (3) W = β Φ H T (H β Φ H T + R ) Φ = β Φ W H β Φ. The challenge with o being unnown in addition to x is that () is under-determined, which gives rise to identifiability issues if J in () is to be jointly optimized over x and o : := [o T... o T T. Indeed, e.g., o = y and x = 0 is one choice of the unnown vectors always minimizing J in (). The ensuing section develops algorithms allowing joint estimation of x and o : (or o only), by exploiting the attribute of sparsity present in the outlier vectors. III. OUTLIER-SPARSITY-AWARE ROBUST To ensure uniqueness in the joint optimization over o : and x, consider regularizing the cost in (). Among candidate regularizers, popular ones include the l p -norm of outlier vectors with p = 0,,. A ey observation guiding the selection of p, is that outliers arising from sources such as impulsive noise are sparse. The degree of sparsity is quantified by the l 0 - (pseudo)norm of o i, which equals the number of its nonzero entries. Unfortunately, the l 0 -norm maes the optimization problem non-convex, and in fact NP-hard to solve. This motivates adopting the closest convex approximation of l 0, () ) namely the l -norm, which is nown to approach (and under certain conditions on the regressors coincide with) the solution obtained with the l 0 -norm [4. Different from [, [4 that rely on the l -norm of sparse vectors x, the novelty here is adoption of the l -norm of sparse outlier vectors for robust processing. Adopting this regularization of J in (), yields [ˆx, ô : = arg min x,o : β i [ y i H i x o i R i + λ i o i. (4) Besides being convex, (4) is nicely lined with Huber s M- estimates if the nominal noise v i is white [8. Indeed, with R i = I i, solving (4) amounts to solving [8 ˆx = arg min x D y β i ρ i (y i,d h T i,dx ) (5) d= where y i,d (h T i,d ) denotes the dth entry (row) of y i (respectively H i ), and ρ i is Huber s function given by ρ i (x) := { x, x λ i. x λ i λ i, x > λ i While solving (4) recovers an estimate of x, its practicality is progressively compromised with time because the number of variables to be estimated increases with. In addition, (4) can not be solved recursively, and at each time instant the entire optimization should be carried out anew. This shortcoming affects Huber M-estimates too, and similar limitations have been identified also by [5 and [6. For this reason, the offline benchmar solver of (4) will be used as a warm-start to initialize online methods developed in the next subsection. A. Online Robust (OR-) Since it is impractical to re-estimate past and current outliers per time instant, the main idea in this section is to estimate only the most recent outliers. Consequently, the proposed OR- proceeds in two steps per. Given ˆx from, the current o is estimated in the first step via ô = arg min o [ y H ˆx o + λ R o. (6) With ô available, ordinary is applied to the outlier-free measurements y ô [cf. (3) in the second step, which incurs complexity comparable to ordinary. Clearly, (6) solves a problem of fixed dimension D y per time step; hence, it offers a practical alternative to the offline benchmar in (4). If R is diagonal, denote it as diag(r,, r,,..., r,dyd y ), then (6) decouples into scalar sub-problems across entries of o as ô,d = arg min o,d [ (y,d h T,dˆx o,d ) r,dd + λ o,d d =,..., D y.,

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 3 The solution to these sub-problems is available in closed form using the so termed least-absolute shrinage and selection operator (Lasso); see e.g., [9. In the present context, this closedform solution is given by the so-termed soft-thresholding operator as ô,d = max { y,d h T,dˆx λ r,dd, 0 } sign(y,d h T,dˆx ), d =,..., D y. (7) Thus, OR- with white nominal noise v amounts to simple non-iterative closed-form evaluations per time step. However, when v is colored, (6) can no longer be solved in closed form. If R is non-diagonal, general-purpose convex solvers such as SeDuMi can be utilized to solve (6) with complexity O(Dy 3.5 ) [4. In the next subsections, two methods based on coordinate descent (CD), and the alternating direction method of multipliers (AD-MoM) are developed to solve (6) iteratively. While both are asymptotically convergent, it suffices to run only a few iterations in practice, which reduces complexity of general solvers and parallels the low-complexity of OR- when v is white. B. CD based OR- To iteratively solve (6) with respect to o, consider initializing with o (0) = 0. Then, for j =,..., J, vector is updated one entry at a time using CD. Specifically, the dth entry at iteration j is updated as,:d,d := arg min o,d y H ˆx o,d o (j ),d+:d y R +λ o,d. (8) Problem (8) can be alternatively written as a scalar Lasso one, that is (,d := arg min o,d γ (j) o,d,d) + λ,d o,d (9) where γ (j),d := d [α,d r,i,d,i r,d,d Dy i=d+ r,i,d o (j ),i α := R (y H ˆx ), λ,d := λ / r,d,d with r,d,d := [R d,d denoting the (d, d ) entry of R The solution to (9) is given by (cf. (7)) { } ( ) γ,d = max (j) λ,d, 0 sign.,d γ (j),d. After J iterations, one sets ô = o (J). Global convergence of the iterates o (J) to the solution of (6) is guaranteed as J from the results in [5. In practice however, a fixed small J suffices. C. AD-MoM based OR- To decouple the non-differentiable term of the cost in (6) from the differentiable one, consider introducing the auxiliary variable c, and re-writing (6) in constrained form as [ô, ĉ = arg min o,c [ y H ˆx o R subject to o = c. The augmented Lagrangian thus becomes L(o, c, µ) = y H ˆx o R + λ c + µ T (o c ) + κ o c + λ c where µ is the Lagrange multiplier, and κ is a positive constant controlling the effect of the quadratic term introduced to ensure strict convexity of the cost. After initializing with c (0) = µ (0) = 0, AD-MoM recursions per iteration j (see e.g., [), in the present context amount to = (R + κi) ( R (y H ˆx ) + κc (j ) + µ (j )) (0a) { } ( ) c (j),d = max o(j),d + µ(j) d κ λ κ, 0 sign,d + µ(j) d, κ d =,..., D y µ (j) = µ (j ) + κ( (0b) c(j) ). (0c) After J iterations, set ô = o (J). Global convergence of the iterates o (J) to the solution of (6) is guaranteed as J from the results in [, p. 56. As with CD, a fixed small J suffices in practice. Note also that the matrix R needs to be computed only once, and used throughout the iterations. D. Parameter Selection Successful performance of OR- hinges on proper selection of the sparsity-enforcing coefficients λ in (8) and (0). Too large a λ brings OR- bac to the ordinary, while too small a λ creates many spurious outliers and degrades OR- performance. Suppose that λ is constant with respect to ; that is, let λ = λ y for all. Judicious tuning of λ y has been investigated for robust batch linear regression estimators, and two criteria to optimally select λ y can be found in [6 and [8. However, none of these algorithms can be run online. As a remedy, it is recommended to select λ y during the initialization (batch) phase of OR-, and retain its value for all future time instants. OR- initialization is carried out by solving the offline benchmar for = up to = 0 using SeDuMi; that is [ˆx 0, ô :0 = arg min x 0,o :0 0 [ β 0 i y i H i x 0 o i + λ R y o i. i ()

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 4 Table I. OR- Algorithm Initialization. Collect measurements y for =,..., 0. Solve () to obtain ˆx 0. Use either of the two methods in [6 to determine the best λ = λ y, Repeat for 0 + Solve (6) to obtain ô. Use CD iterations in (9) or AD-MoM iterations in (0). Run in (3) for the outlier compensated measurements y ô to obtain ˆx. End for At time 0 either one of the two criteria in [6 is applied to () to find the best λ y. Using this λ y, () is solved to obtain the initial ˆx 0. Then, one proceeds to time 0 + and runs the OR- algorithm as detailed in the previous subsections. Table I summarizes the OR- algorithm. Remar. (Complexity of OR-) This remar quantifies the complexity order of OR- per time index. Starting from (3), the second sub-equation requires inversion of a D y D y matrix, which incurs complexity O(Dy) 3 once per time index ; while the third sub-equation involves matrix multiplications at complexity O(DxD y ). Hence, the complexity of the non-robust for the vector-matrix model () is O(Dy 3 +DxD y ). For OR-, one needs to solve (6) per time step also. Utilizing SeDuMi to solve (6) incurs complexity + D x D y ), where the second term corresponds to evaluating H ˆx. Consider now CD or AD-MoM iterations. Per time index, CD iterations must compute α, which incurs complexity O(Dy) 3 for matrix inversion, and O(D x D y ) O(D 3.5 y for matrix-vector multiplication. In each of the J CD iterations, one needs to evaluate D y times γ (j),d, which incurs overall complexity O(JDy). Thus, the overall complexity of optimizing (6) via CD iterations is O(Dy 3 + D x D y + JDy). In as much as J is selected on the order of O(D y ) or smaller, CD iterations do not increase the complexity order of the non-robust in (3). Furthermore, its complexity order will be smaller than that of SeDuMi. With regards to AD- MoM, two D y D y matrices must be inverted once per time step at complexity O(Dy). 3 The matrix-vector multiplication H ˆx is also performed once per time index at complexity O(D x D y ). Matrix-vector multiplications in (0a) should be carried out J times, which incurs complexity O(JDy). Thus, the overall complexity of optimizing (6) via AD-MoM iterations is O(Dy 3 + D x D y + JDy). Again, if J is on the order of O(D y ) or smaller, AD-MoM iterations do not incur higher complexity than the non-robust in (3). Remar. (Complexity of alternative robust methods) For comparison purposes with OR-, this remar discusses the complexity order of the schemes in [3, [5, [3, and [6. Focusing for simplicity on [3 which entails non-robust recursions, consider correlated measurements arriving in vectors of dimension D y. To de-correlate, pre-whitening with the square root matrix incurs complexity O(D 3 y). For each new datum extracted after pre-whitening, vector i in [3, Eq. (5) involves vector-matrix products at complexity O(D x). With D y such data per time step, the overall complexity grows to O(D xd y ). Note that complexity of non-robust in (3) is also O(D 3 y + D xd y ). It was further argued in the previous remar that OR- does not increase the complexity order. Since the robust schemes in [3, [5, [3, and [6 incur complexity at least as high as that of non-robust, it follows that they will incur at best the same order of complexity as the proposed OR-. IV. CONVERGENCE ANALYSIS For the stationary setup, almost sure convergence of OR- to the true x o is established in this section under ergodicity conditions on the regressors, nominal noise, and outlier vectors. The following assumptions are needed to this end: (as) The true parameter vector, denoted as x o, is time invariant. (as) Three ergodicity conditions hold: lim H i R i H T i = R > 0, lim lim with probability one (w.p.) (a) H T i R v i = r Hv, (w.p.) (b) H T i R o i = r Ho, (w.p.). (c) If the random vectors v i, o i, and h i are mixing, which is the case for most stationary processes with vanishing memory in practice, then () is satisfied. Since H i is uncorrelated with v i which is zero-mean, it follows readily that r Hv = 0. (as3) Regressors {h i } are zero-mean and uncorrelated with outliers {o i }, which implies that r Ho = 0. Since the true x o is constant, the forgetting factor is set to β =. The OR- algorithm is also slightly modified to facilitate convergence analysis. Specifically, for values exceeding a finite value, the estimated outlier vector ô is deterministically set to zero. This can be achieved by selecting (cf. [6) λ = λ y, = 0 +,...,, λ = max ( λ, R (y H ˆx ) ), >. (3) Note that the OR- estimate is the minimizer of J (x, ô : ) w.r.t. x with J ( ) defined by (), and ô : obtained from (6) with λ as in (3). It then follows from (as)-(as3) that lim J (x, ô : ) = xt R x x T ( R x 0 + r hv + r ho ) + lim ô T i R i H i x, (w.p.) = xt R x x T R x 0, (w.p.). (4a) (4b)

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 5 0 3 0 3 0 0 OR SeDuMi OR : 0 CD it OR : 0 CD it OR : 50 CD it OR : 00 CD it 0 0 OR SeDuMi OR : 0 AD MoM it OR : 0 AD MoM it OR : 50 AD MoM it OR : 00 AD MoM it 0 0 0 0 0 0 0 0 00 400 600 800 000 0 0 00 400 600 800 000 Fig.. Learning curves of OR- and its CD-based implementation. Fig.. Learning curves of OR- and its AD-MoM based implementation. The last term in (4a) becomes zero because ô i = 0 >, and hence the summation assumes a finite value. Setting the gradient of (4b) equal to zero yields x = x o, which means that the true x o is recovered asymptotically. These arguments establish the following result. Proposition. Under (as)-(as3), the modified OR- iterates converge almost surely to the true x o. The ey observation in the convergence analysis is that even ordinary asymptotically converges to the true x o. Therefore, one can ensure convergence by maing OR- behave lie ordinary asymptotically. The intuition behind asymptotic convergence of the ordinary lies with the observation that outliers can be seen as an additional nominal noise contaminating the measurements. Therefore, outlier contaminated measurements can be seen as non-contaminated ones, but with larger nominal noise variance. Furthermore, (as3) ensures that no bias appears asymptotically. Hence, when infinitely many measurements are collected, the true x o is recovered. Both and OR- converge almost surely to the true x o. However, the finite-sample performance of the two algorithms is drastically different. Unlie, OR- maintains satisfactory performance even for a small finite. Simulations will demonstrate the superior performance of OR- compared to and two robust alternatives. V. SIMULATIONS A setup with D x = 0 and D y = 0 is tested. In the stationary case, x = x o, is selected as the all-one vector, with zero-mean, unit-variance, Gaussian noise added. Entries of the regressor matrix H are chosen as zero-mean, white, Gaussian distributed random variables with variance σh scaled to achieve an SNR = D x σh = in normal scale. Per time step, ambient zero-mean, colored, Gaussian noise v is simulated as the output of a first-order auto-regressive (AR) filter with pole at 0.95 to a unit variance white Gaussian input. Vectors v are uncorrelated across. Measurements are also contaminated by impulsive noise, which generates outliers. When an outlier occurs, which has a 0% chance of happening, Table II. OR- run-time Algorithm / Sub-algorithm Run-time in seconds.0549 OR- w/ SeDuMi 56.8047 OR- w/ AD-MoM 0 it 35.38 OR- w/ AD-MoM 0 it 35.459 OR- w/ AD-MoM 50 it 35.54 OR- w/ AD-MoM 00 it 35.9595 Offline Initialization 33.8649 OR- w/ CD 0 it 37.7787 OR- w/ CD 0 it 40.845 OR- w/ CD 50 it 48.074 OR- w/ CD 00 it 58.86 the corresponding observation is drawn from a uniform distribution independent of all other variables with variance equal to 0, 000. The way outliers are generated differs from the model assumed in (). This serves to signify the universality of the proposed algorithm as it demonstrates its operability even when the outliers are generated differently than modeled. The root mean-square error := ˆx x is used as performance metric. Furthermore, 0 = 0 and ranges from to, 000. It is assumed that the percentage of outliers is nown to all estimators. OR- utilizes this nowledge by using the corresponding criterion from [6 to tune the parameter λ y. A. Stationary Scenario Fig. depicts versus time for a sample realization. is initialized with the batch LS solution. It can be seen that performs poorly, while OR- performs satisfactorily. Initialization with the batch offline estimator in (4) clearly improves the accuracy of OR-. Different renditions of OR-, which result from the way (6) is solved, are also compared. The best performance is obtained when (6) is solved exactly using SeDuMi. To assess performance of the low-complexity CD-based solver, the is depicted for 0, 0, 50 and 00 iterations. With 00 CD iterations the performance of OR- comes very close to the SeDuMi based one. Liewise, with 00 AD-MoM iterations, estimation performance comes close to that of SeDuMi as can be verified from Fig.. AD-MoM s regularization scale in the augmented

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 6 0 3 0 OR R : Huber init R : Zero init 0 3 0 OR R : Huber init R : Zero init 0 0 0 0 0 0 0 0 0 0 00 400 600 800 000 0 0 00 400 600 800 000 Fig. 3. setup. Comparison between R- in [3 and OR- in a stationary Fig. 4. setup. Comparison between R- in [3 and OR- in a stationary Lagrangian is selected as κ = 0, which being positive ensures convergence due to the convexity of (6). (Convergence rate depends on the κ value; κ = 0 was observed to maintain a satisfactory convergence rate.) To compare complexities of the various OR- renditions, Table II lists the run-time of each algorithm depicted in Figs. and. Caution should be exercised though since run-time comparisons depend on how efficiently each algorithm is programmed in MATLAB. Nonetheless, the table provides an idea of each algorithms relative complexity. Clearly, AD-MoM and CD based renditions incur lower complexity than the SeDuMi based OR- solver even with 00 AD-MoM or CD iterations. Given the comparable performance of the AD-MoM and CD based solvers to that of SeDuMi, their use is well motivated. Note also that the bul of complexity for AD-MoM and CD based OR- solvers comes from the offline initialization stage which entails multiple SeDuMi runs. One SeDuMi run will be needed for each choice of the sparsity-promoting coefficient λ. Focusing on OR- with 00 AD-MoM iterations for example, the run-time excluding the offline initialization will tae about seconds. This comes close to that of non-robust, which is about second. OR- with 00 CD iterations is compared against the robust algorithm of [3, dubbed R-, and that of [3, dubbed R-, in Figs. 3 and 4, respectively. Both of these approaches are proposed for white ambient noise, and do not tae noise correlations into account. While one can pre-whiten the data by multiplying with R and then apply R- or R-, this spreads the outliers to all non-contaminated measurements and the resulting performance is as bad as that of. Simulations suggested that the preferred option is to ignore noise correlations which prevents outliers from spreading. This case has been considered in the simulations. To initialize R- and R- from = up to = 0 = 0, two different approaches are tested: i) solving the batch Huber cost via iteratively re-weighted LS (I); and ii) starting from the all-zero initialization, and then running R- or R-. The performance of R- is depicted in Fig. 3. While better than, R- is outperformed by OR- 0 3 0 0 0 0 OR R 0 0 00 400 600 800 000 Fig. 5. Comparison between R- in [3 and OR- in a non-stationary setting. because R- does not account for noise correlations. The two initializations perform similarly. Fig. 4 depicts the performance of R- versus OR-. Huber initialization provides a considerable improvement in this case, but still OR- exhibits a noticeable improvement over R-. B. Non-Stationary Scenario For the non-stationary case, all parameters including x are selected similar to the stationary case. To obtain x, each entry of x is passed through a first-order AR filter with Gaussian noise input. Specifically, letting x,d denote the dth entry of x the corresponding entry of x + is generated according to x +,d = 0.95x,d + 0.05e,d, where e,d is zero-mean, unit-variance, Gaussian noise assumed independent of all other noise terms. This procedure is repeated for > to obtain a slow-varying x. Figs. 5 and 6 depict the of OR- with β = 0.95 versus, R- of [3, and R- of [3. It can be seen that OR- maredly outperforms all three alternatives.

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 7 0 0 0 8 0 6 0 4 0 0 0 R OR 0 0 00 400 600 800 000 Fig. 6. Comparison between R- in [3 and OR- in a non-stationary setting. VI. CONCLUSIONS AND CURRENT RESEARCH A robust algorithm was developed capable of simultaneously handling measurement outliers and accounting for possibly correlated nominal noise. To achieve these desirable properties, outliers were treated as nuisance variables and were estimated jointly with the parameter vector of interest. Identifiability was ensured by regularizing the LS cost with the l -norm of the outlier variables. The resulting optimization problem, dubbed as offline benchmar, was too complex to be solved per time step. For low-complexity implementation, a sub-optimal online algorithm was developed which involved only simple closed-form updates per time step. Initialization with the offline benchmar followed by online updates was advocated as the OR- algorithm of choice in practice. Numerical tests demonstrated improved performance of OR- compared to and state-of-the-art robust renditions. Current research focuses on robust online estimation of dynamical processes adhering to a Gauss-Marov model, which in addition to slow parameter changes that can be followed by OR-, enables tracing of fast-varying parameters too. Preliminary results in this direction can be found in [6. Acnowledgement. The authors wish to than Dr. Daniele Angelosante of ABB, Switzerland, for helpful discussions and suggestions on the early stages of this wor. REFERENCES [ D. Angelosante, J.-A. Bazerque, and G. B. Giannais, Online Adaptive Estimation of Sparse Signals: Where meets the -norm, IEEE Trans. on Signal Processing, vol. 58, no. 7, pp. 3436-3447, July 00. [ D. P. Bertseas and J. N. Tsitsilis, Parallel and Distributed Computation: Numerical Methods, nd ed., Athena Scient., 999. [3 M. Z. A. Bhotto and A. Antoniou, Robust recursive least-squares adaptive-filtering algorithms for impulsive-noise environments, IEEE Signal Processing Letters, vol. 8, no. 3, pp. 85-88, March 0. [4 E. J. Candes and T. Tao, Decoding by linear programming, IEEE Trans. on Information Theory, vol. 5, no., pp. 403-45, Dec. 005. [5 S. C. Chan and Y. X. Zou, A recursive least M-estimate algorithm for robust adaptive filtering in impulsive noise: Fast algorithm and convergence performance analysis, IEEE Trans. on Signal Proc., vol. 5, no. 4, pp. 975 99, Apr. 004. [6 S. Farahmand, D. Angelosante, and G. B. Giannais, Doubly robust Kalman smoothing by exploiting outlier sparsity, Proc. of Asilomar Conf. on Signals, Syst., and Comp., Pacific Grove, CA, Nov. 00. [7 J.-J. Fuchs, An inverse problem approach to robust regression, in Proc. of Int. Conf. Acoust., Speech, Signal Process., Phoenix, AZ, 999, pp. 809-8. [8 G. B. Giannais, G. Mateos, S. Farahmand, V. Keatos, and H. Zhu, USPACOR: Universal sparsity-controlling outlier rejection, Proc. of Intl. Conf. on Acoust., Speech, and Sig. Processing, Prague, Czech Republic, May 0. [9 T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, nd ed. New Yor, NY: Springer, 009. [0 P. J. Huber and E. M. Ronchetti, Robust Statistics. New Yor, NJ, USA: Wiley, 009. [ S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, 993. [ P. Petrus, Robust Huber adaptive filter, IEEE Trans. on Signal Processing, vol. 47, no. 4, pp. 9 33, April 999. [3 L. Rey Vega, H. Rey, J. Benesty, and S. Tressens, A fast robust recursive least-squares algorithm, IEEE Trans. on Signal Processing, vol. 57, no. 3, pp. 09 6, March 009. [4 SeDuMi. [Online. http://sedumi.ie.lehigh.edu/ [5 P. Tseng, Convergence of a bloc coordinate descent method for nondifferentiable minimization, Journal of Optimization Theory and Applications, vol. 09, no. 3, pp. 475-494, June 00. [6 Y. Zou, S. C. Chan, and T. S. Ng, A recursive least M-estimate (RLM) adaptive filter for robust filtering in impulsive noise, IEEE Signal Processing Letters, vol. 7, no., pp. 34-36, Nov. 000.