Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) Robust in the Presence of Correlated Noise Using Outlier Sparsity Shahroh Farahmand and Georgios B. Giannais Abstract Relative to batch alternatives, the recursive leastsquares () algorithm has well-appreciated merits of reduced complexity and storage requirements for online processing of stationary signals, and also for tracing slowly-varying nonstationary signals. However, is challenged when in addition to noise, outliers are also present in the data due to e.g., impulsive disturbances. Existing robust approaches are resilient to outliers, but require the nominal noise to be white an assumption that may not hold in e.g., sensor networs where neighboring sensors are affected by correlated ambient noise. Pre-whitening with the nown noise covariance is not a viable option because it spreads the outliers to noncontaminated measurements, which leads to inefficient utilization of the available data and unsatisfactory performance. In this correspondence, a robust algorithm is developed capable of handling outliers and correlated noise simultaneously. In the proposed method, outliers are treated as nuisance variables and estimated jointly with the wanted parameters. Identifiability is ensured by exploiting the sparsity of outliers, which is effected via regularizing the least-squares (LS) criterion with the l -norm of the outlier vectors. This leads to an optimization problem whose solution yields the robust estimates. For low-complexity real-time operation, a sub-optimal online algorithm is proposed, which entails closed-form updates per time step in the spirit of. Simulations demonstrate the effectiveness and improved performance of the proposed method in comparison with the non-robust, and its state-of-the-art robust renditions. I. INTRODUCTION Online processing with reduced complexity and storage requirements relative to batch alternatives, as well as adaptability to slow parameter variations, are the main reasons behind the popularity of the recursive least-squares () algorithm for adaptive estimation of linear regression models. Despite its success in various applications, is challenged by the presence of outliers, that is measurement or noise samples not adhering to a pre-specified nominal model. One major source of outliers is impulsive noise, which appears for example as double-tal in an echo cancelation setting [3. In another application, which motivates the present contribution, outliers may arise with low-cost unreliable sensors deployed for estimating the coefficients of a linear regression model involving also (possibly correlated) ambient noise. Outlier-resilient approaches are imperative in such applications; see e.g., [0. Robust schemes have been reported in [3, [5, [, [3, and [6. Huber s M-estimation criterion was considered Copyright (c) 0 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. The authors are with the Dept. of Electrical and Computer Engr. at the Univ. of Minnesota, Minneapolis, MN 55455, USA; (e-mails: {shahroh, georgios}@umn.edu). This wor was supported by the AFOSR MURI Grant FA9550-0--0567. in [, Hampel s three-part redescending M-estimate was advocated in [6, and a modified Huber cost was put forth in [5. These approaches improve upon the non-robust when outliers are present, but they also have limitations: i) some entail costs that are not convex [5, thus providing no guarantee for achieving the global optimum; ii) the optimization problem to be solved per time step has dimensionality (and hence computations and storage) growing linearly with time; and iii) correlated nominal noise of nown covariance can not be handled, which is critical for estimation using a networ of distributed sensors because ambient noise is typically dependent across neighboring sensors. To mitigate the curse of dimensionality, [5 provides suboptimal online recursions in the spirit of, while [3 and [3 limit the effect of each new datum on the estimate, thus also ensuring low-complexity -lie iterations. However, all existing approaches cannot handle correlated ambient noise of nown covariance. It is worth stressing that pre-whitening is not recommended because it spreads the outliers to noncontaminated measurements rendering outlier-resilient estimation even more challenging. In this correspondence, outliers are treated as nuisance variables that are jointly estimated with the wanted regression coefficients. The identifiability challenge arising due to the extra unnowns is addressed by exploiting the outlier sparsity. The latter is effected by regularizing the LS cost with the l - norm of the outlier vectors. Consequently, one arrives at an optimization problem which can identify outliers while at the same time it can accommodate correlated ambient noise. Since dimensionality of this optimization problem also grows with time, its solution is utilized at the initialization stage, and provides an offline benchmar. To cope with dimensionality, an online variant is also developed, which enjoys low-complexity and closed-form updates in the spirit of. Initializing realtime updates with the offline benchmar, the novel online robust (OR-) algorithm is compared against and the robust schemes in [3 and [3 via simulations, which demonstrate its efficiency and improved performance. II. PROBLEM STATEMENT AND PRELIMINARIES Consider the outlier-aware linear regression model considered originally in [7, where (generally vector) measurements y R Dy involving the unnown vector x R Dx become available sequentially in time (indexed by ), in the presence of (possibly correlated) nominal noise v ; that is y = H x + v + o ()

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) where H denotes a nown regression matrix, o models the outliers arising from e.g., impulsive noise, and v is zeromean with nown covariance matrix R, uncorrelated with v : := [v T, v T,..., v T T, but possibly correlated across its own entries. The wanted vector x is either constant, or, varies slowly with. The goal is to estimate x and o, given the measurements y : := [y T, y T,..., y T T. To motivate the vector model in (), consider a sensor networ with each sensor measuring distances (in rectangular coordinates) between itself and the target that is desired to localize and trac. The sensor measurements are synchronized by the cloc of an access point (AP), and are corrupted by outliers as well as nominal noise that is correlated across neighboring sensors. Had o been equal to zero or nown for that matter, one could apply the algorithm on the outlier compensated measurements y o. Therefore, at time step, one would solve [, p. 5 ˆx := arg min x J (x ) := arg min x β i y i H i x o i R i where 0 < β is a pre-selected forgetting factor, and x R := xt Rx. Equating to zero the gradient of J (x ) in (), yields ( β i H T i R i H i ) ˆx = ( β i H T i R i (y i o i ) or in a compact matrix-vector notation as Φ ˆx = φ, with obvious definitions for φ and Φ assumed invertible. The latter can be solved online using the recursions (see e.g., [) ˆx = ˆx + W (y o H ˆx ) (3) W = β Φ H T (H β Φ H T + R ) Φ = β Φ W H β Φ. The challenge with o being unnown in addition to x is that () is under-determined, which gives rise to identifiability issues if J in () is to be jointly optimized over x and o : := [o T... o T T. Indeed, e.g., o = y and x = 0 is one choice of the unnown vectors always minimizing J in (). The ensuing section develops algorithms allowing joint estimation of x and o : (or o only), by exploiting the attribute of sparsity present in the outlier vectors. III. OUTLIER-SPARSITY-AWARE ROBUST To ensure uniqueness in the joint optimization over o : and x, consider regularizing the cost in (). Among candidate regularizers, popular ones include the l p -norm of outlier vectors with p = 0,,. A ey observation guiding the selection of p, is that outliers arising from sources such as impulsive noise are sparse. The degree of sparsity is quantified by the l 0 - (pseudo)norm of o i, which equals the number of its nonzero entries. Unfortunately, the l 0 -norm maes the optimization problem non-convex, and in fact NP-hard to solve. This motivates adopting the closest convex approximation of l 0, () ) namely the l -norm, which is nown to approach (and under certain conditions on the regressors coincide with) the solution obtained with the l 0 -norm [4. Different from [, [4 that rely on the l -norm of sparse vectors x, the novelty here is adoption of the l -norm of sparse outlier vectors for robust processing. Adopting this regularization of J in (), yields [ˆx, ô : = arg min x,o : β i [ y i H i x o i R i + λ i o i. (4) Besides being convex, (4) is nicely lined with Huber s M- estimates if the nominal noise v i is white [8. Indeed, with R i = I i, solving (4) amounts to solving [8 ˆx = arg min x D y β i ρ i (y i,d h T i,dx ) (5) d= where y i,d (h T i,d ) denotes the dth entry (row) of y i (respectively H i ), and ρ i is Huber s function given by ρ i (x) := { x, x λ i. x λ i λ i, x > λ i While solving (4) recovers an estimate of x, its practicality is progressively compromised with time because the number of variables to be estimated increases with. In addition, (4) can not be solved recursively, and at each time instant the entire optimization should be carried out anew. This shortcoming affects Huber M-estimates too, and similar limitations have been identified also by [5 and [6. For this reason, the offline benchmar solver of (4) will be used as a warm-start to initialize online methods developed in the next subsection. A. Online Robust (OR-) Since it is impractical to re-estimate past and current outliers per time instant, the main idea in this section is to estimate only the most recent outliers. Consequently, the proposed OR- proceeds in two steps per. Given ˆx from, the current o is estimated in the first step via ô = arg min o [ y H ˆx o + λ R o. (6) With ô available, ordinary is applied to the outlier-free measurements y ô [cf. (3) in the second step, which incurs complexity comparable to ordinary. Clearly, (6) solves a problem of fixed dimension D y per time step; hence, it offers a practical alternative to the offline benchmar in (4). If R is diagonal, denote it as diag(r,, r,,..., r,dyd y ), then (6) decouples into scalar sub-problems across entries of o as ô,d = arg min o,d [ (y,d h T,dˆx o,d ) r,dd + λ o,d d =,..., D y.,

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 3 The solution to these sub-problems is available in closed form using the so termed least-absolute shrinage and selection operator (Lasso); see e.g., [9. In the present context, this closedform solution is given by the so-termed soft-thresholding operator as ô,d = max { y,d h T,dˆx λ r,dd, 0 } sign(y,d h T,dˆx ), d =,..., D y. (7) Thus, OR- with white nominal noise v amounts to simple non-iterative closed-form evaluations per time step. However, when v is colored, (6) can no longer be solved in closed form. If R is non-diagonal, general-purpose convex solvers such as SeDuMi can be utilized to solve (6) with complexity O(Dy 3.5 ) [4. In the next subsections, two methods based on coordinate descent (CD), and the alternating direction method of multipliers (AD-MoM) are developed to solve (6) iteratively. While both are asymptotically convergent, it suffices to run only a few iterations in practice, which reduces complexity of general solvers and parallels the low-complexity of OR- when v is white. B. CD based OR- To iteratively solve (6) with respect to o, consider initializing with o (0) = 0. Then, for j =,..., J, vector is updated one entry at a time using CD. Specifically, the dth entry at iteration j is updated as,:d,d := arg min o,d y H ˆx o,d o (j ),d+:d y R +λ o,d. (8) Problem (8) can be alternatively written as a scalar Lasso one, that is (,d := arg min o,d γ (j) o,d,d) + λ,d o,d (9) where γ (j),d := d [α,d r,i,d,i r,d,d Dy i=d+ r,i,d o (j ),i α := R (y H ˆx ), λ,d := λ / r,d,d with r,d,d := [R d,d denoting the (d, d ) entry of R The solution to (9) is given by (cf. (7)) { } ( ) γ,d = max (j) λ,d, 0 sign.,d γ (j),d. After J iterations, one sets ô = o (J). Global convergence of the iterates o (J) to the solution of (6) is guaranteed as J from the results in [5. In practice however, a fixed small J suffices. C. AD-MoM based OR- To decouple the non-differentiable term of the cost in (6) from the differentiable one, consider introducing the auxiliary variable c, and re-writing (6) in constrained form as [ô, ĉ = arg min o,c [ y H ˆx o R subject to o = c. The augmented Lagrangian thus becomes L(o, c, µ) = y H ˆx o R + λ c + µ T (o c ) + κ o c + λ c where µ is the Lagrange multiplier, and κ is a positive constant controlling the effect of the quadratic term introduced to ensure strict convexity of the cost. After initializing with c (0) = µ (0) = 0, AD-MoM recursions per iteration j (see e.g., [), in the present context amount to = (R + κi) ( R (y H ˆx ) + κc (j ) + µ (j )) (0a) { } ( ) c (j),d = max o(j),d + µ(j) d κ λ κ, 0 sign,d + µ(j) d, κ d =,..., D y µ (j) = µ (j ) + κ( (0b) c(j) ). (0c) After J iterations, set ô = o (J). Global convergence of the iterates o (J) to the solution of (6) is guaranteed as J from the results in [, p. 56. As with CD, a fixed small J suffices in practice. Note also that the matrix R needs to be computed only once, and used throughout the iterations. D. Parameter Selection Successful performance of OR- hinges on proper selection of the sparsity-enforcing coefficients λ in (8) and (0). Too large a λ brings OR- bac to the ordinary, while too small a λ creates many spurious outliers and degrades OR- performance. Suppose that λ is constant with respect to ; that is, let λ = λ y for all. Judicious tuning of λ y has been investigated for robust batch linear regression estimators, and two criteria to optimally select λ y can be found in [6 and [8. However, none of these algorithms can be run online. As a remedy, it is recommended to select λ y during the initialization (batch) phase of OR-, and retain its value for all future time instants. OR- initialization is carried out by solving the offline benchmar for = up to = 0 using SeDuMi; that is [ˆx 0, ô :0 = arg min x 0,o :0 0 [ β 0 i y i H i x 0 o i + λ R y o i. i ()

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 4 Table I. OR- Algorithm Initialization. Collect measurements y for =,..., 0. Solve () to obtain ˆx 0. Use either of the two methods in [6 to determine the best λ = λ y, Repeat for 0 + Solve (6) to obtain ô. Use CD iterations in (9) or AD-MoM iterations in (0). Run in (3) for the outlier compensated measurements y ô to obtain ˆx. End for At time 0 either one of the two criteria in [6 is applied to () to find the best λ y. Using this λ y, () is solved to obtain the initial ˆx 0. Then, one proceeds to time 0 + and runs the OR- algorithm as detailed in the previous subsections. Table I summarizes the OR- algorithm. Remar. (Complexity of OR-) This remar quantifies the complexity order of OR- per time index. Starting from (3), the second sub-equation requires inversion of a D y D y matrix, which incurs complexity O(Dy) 3 once per time index ; while the third sub-equation involves matrix multiplications at complexity O(DxD y ). Hence, the complexity of the non-robust for the vector-matrix model () is O(Dy 3 +DxD y ). For OR-, one needs to solve (6) per time step also. Utilizing SeDuMi to solve (6) incurs complexity + D x D y ), where the second term corresponds to evaluating H ˆx. Consider now CD or AD-MoM iterations. Per time index, CD iterations must compute α, which incurs complexity O(Dy) 3 for matrix inversion, and O(D x D y ) O(D 3.5 y for matrix-vector multiplication. In each of the J CD iterations, one needs to evaluate D y times γ (j),d, which incurs overall complexity O(JDy). Thus, the overall complexity of optimizing (6) via CD iterations is O(Dy 3 + D x D y + JDy). In as much as J is selected on the order of O(D y ) or smaller, CD iterations do not increase the complexity order of the non-robust in (3). Furthermore, its complexity order will be smaller than that of SeDuMi. With regards to AD- MoM, two D y D y matrices must be inverted once per time step at complexity O(Dy). 3 The matrix-vector multiplication H ˆx is also performed once per time index at complexity O(D x D y ). Matrix-vector multiplications in (0a) should be carried out J times, which incurs complexity O(JDy). Thus, the overall complexity of optimizing (6) via AD-MoM iterations is O(Dy 3 + D x D y + JDy). Again, if J is on the order of O(D y ) or smaller, AD-MoM iterations do not incur higher complexity than the non-robust in (3). Remar. (Complexity of alternative robust methods) For comparison purposes with OR-, this remar discusses the complexity order of the schemes in [3, [5, [3, and [6. Focusing for simplicity on [3 which entails non-robust recursions, consider correlated measurements arriving in vectors of dimension D y. To de-correlate, pre-whitening with the square root matrix incurs complexity O(D 3 y). For each new datum extracted after pre-whitening, vector i in [3, Eq. (5) involves vector-matrix products at complexity O(D x). With D y such data per time step, the overall complexity grows to O(D xd y ). Note that complexity of non-robust in (3) is also O(D 3 y + D xd y ). It was further argued in the previous remar that OR- does not increase the complexity order. Since the robust schemes in [3, [5, [3, and [6 incur complexity at least as high as that of non-robust, it follows that they will incur at best the same order of complexity as the proposed OR-. IV. CONVERGENCE ANALYSIS For the stationary setup, almost sure convergence of OR- to the true x o is established in this section under ergodicity conditions on the regressors, nominal noise, and outlier vectors. The following assumptions are needed to this end: (as) The true parameter vector, denoted as x o, is time invariant. (as) Three ergodicity conditions hold: lim H i R i H T i = R > 0, lim lim with probability one (w.p.) (a) H T i R v i = r Hv, (w.p.) (b) H T i R o i = r Ho, (w.p.). (c) If the random vectors v i, o i, and h i are mixing, which is the case for most stationary processes with vanishing memory in practice, then () is satisfied. Since H i is uncorrelated with v i which is zero-mean, it follows readily that r Hv = 0. (as3) Regressors {h i } are zero-mean and uncorrelated with outliers {o i }, which implies that r Ho = 0. Since the true x o is constant, the forgetting factor is set to β =. The OR- algorithm is also slightly modified to facilitate convergence analysis. Specifically, for values exceeding a finite value, the estimated outlier vector ô is deterministically set to zero. This can be achieved by selecting (cf. [6) λ = λ y, = 0 +,...,, λ = max ( λ, R (y H ˆx ) ), >. (3) Note that the OR- estimate is the minimizer of J (x, ô : ) w.r.t. x with J ( ) defined by (), and ô : obtained from (6) with λ as in (3). It then follows from (as)-(as3) that lim J (x, ô : ) = xt R x x T ( R x 0 + r hv + r ho ) + lim ô T i R i H i x, (w.p.) = xt R x x T R x 0, (w.p.). (4a) (4b)

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 5 0 3 0 3 0 0 OR SeDuMi OR : 0 CD it OR : 0 CD it OR : 50 CD it OR : 00 CD it 0 0 OR SeDuMi OR : 0 AD MoM it OR : 0 AD MoM it OR : 50 AD MoM it OR : 00 AD MoM it 0 0 0 0 0 0 0 0 00 400 600 800 000 0 0 00 400 600 800 000 Fig.. Learning curves of OR- and its CD-based implementation. Fig.. Learning curves of OR- and its AD-MoM based implementation. The last term in (4a) becomes zero because ô i = 0 >, and hence the summation assumes a finite value. Setting the gradient of (4b) equal to zero yields x = x o, which means that the true x o is recovered asymptotically. These arguments establish the following result. Proposition. Under (as)-(as3), the modified OR- iterates converge almost surely to the true x o. The ey observation in the convergence analysis is that even ordinary asymptotically converges to the true x o. Therefore, one can ensure convergence by maing OR- behave lie ordinary asymptotically. The intuition behind asymptotic convergence of the ordinary lies with the observation that outliers can be seen as an additional nominal noise contaminating the measurements. Therefore, outlier contaminated measurements can be seen as non-contaminated ones, but with larger nominal noise variance. Furthermore, (as3) ensures that no bias appears asymptotically. Hence, when infinitely many measurements are collected, the true x o is recovered. Both and OR- converge almost surely to the true x o. However, the finite-sample performance of the two algorithms is drastically different. Unlie, OR- maintains satisfactory performance even for a small finite. Simulations will demonstrate the superior performance of OR- compared to and two robust alternatives. V. SIMULATIONS A setup with D x = 0 and D y = 0 is tested. In the stationary case, x = x o, is selected as the all-one vector, with zero-mean, unit-variance, Gaussian noise added. Entries of the regressor matrix H are chosen as zero-mean, white, Gaussian distributed random variables with variance σh scaled to achieve an SNR = D x σh = in normal scale. Per time step, ambient zero-mean, colored, Gaussian noise v is simulated as the output of a first-order auto-regressive (AR) filter with pole at 0.95 to a unit variance white Gaussian input. Vectors v are uncorrelated across. Measurements are also contaminated by impulsive noise, which generates outliers. When an outlier occurs, which has a 0% chance of happening, Table II. OR- run-time Algorithm / Sub-algorithm Run-time in seconds.0549 OR- w/ SeDuMi 56.8047 OR- w/ AD-MoM 0 it 35.38 OR- w/ AD-MoM 0 it 35.459 OR- w/ AD-MoM 50 it 35.54 OR- w/ AD-MoM 00 it 35.9595 Offline Initialization 33.8649 OR- w/ CD 0 it 37.7787 OR- w/ CD 0 it 40.845 OR- w/ CD 50 it 48.074 OR- w/ CD 00 it 58.86 the corresponding observation is drawn from a uniform distribution independent of all other variables with variance equal to 0, 000. The way outliers are generated differs from the model assumed in (). This serves to signify the universality of the proposed algorithm as it demonstrates its operability even when the outliers are generated differently than modeled. The root mean-square error := ˆx x is used as performance metric. Furthermore, 0 = 0 and ranges from to, 000. It is assumed that the percentage of outliers is nown to all estimators. OR- utilizes this nowledge by using the corresponding criterion from [6 to tune the parameter λ y. A. Stationary Scenario Fig. depicts versus time for a sample realization. is initialized with the batch LS solution. It can be seen that performs poorly, while OR- performs satisfactorily. Initialization with the batch offline estimator in (4) clearly improves the accuracy of OR-. Different renditions of OR-, which result from the way (6) is solved, are also compared. The best performance is obtained when (6) is solved exactly using SeDuMi. To assess performance of the low-complexity CD-based solver, the is depicted for 0, 0, 50 and 00 iterations. With 00 CD iterations the performance of OR- comes very close to the SeDuMi based one. Liewise, with 00 AD-MoM iterations, estimation performance comes close to that of SeDuMi as can be verified from Fig.. AD-MoM s regularization scale in the augmented

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 6 0 3 0 OR R : Huber init R : Zero init 0 3 0 OR R : Huber init R : Zero init 0 0 0 0 0 0 0 0 0 0 00 400 600 800 000 0 0 00 400 600 800 000 Fig. 3. setup. Comparison between R- in [3 and OR- in a stationary Fig. 4. setup. Comparison between R- in [3 and OR- in a stationary Lagrangian is selected as κ = 0, which being positive ensures convergence due to the convexity of (6). (Convergence rate depends on the κ value; κ = 0 was observed to maintain a satisfactory convergence rate.) To compare complexities of the various OR- renditions, Table II lists the run-time of each algorithm depicted in Figs. and. Caution should be exercised though since run-time comparisons depend on how efficiently each algorithm is programmed in MATLAB. Nonetheless, the table provides an idea of each algorithms relative complexity. Clearly, AD-MoM and CD based renditions incur lower complexity than the SeDuMi based OR- solver even with 00 AD-MoM or CD iterations. Given the comparable performance of the AD-MoM and CD based solvers to that of SeDuMi, their use is well motivated. Note also that the bul of complexity for AD-MoM and CD based OR- solvers comes from the offline initialization stage which entails multiple SeDuMi runs. One SeDuMi run will be needed for each choice of the sparsity-promoting coefficient λ. Focusing on OR- with 00 AD-MoM iterations for example, the run-time excluding the offline initialization will tae about seconds. This comes close to that of non-robust, which is about second. OR- with 00 CD iterations is compared against the robust algorithm of [3, dubbed R-, and that of [3, dubbed R-, in Figs. 3 and 4, respectively. Both of these approaches are proposed for white ambient noise, and do not tae noise correlations into account. While one can pre-whiten the data by multiplying with R and then apply R- or R-, this spreads the outliers to all non-contaminated measurements and the resulting performance is as bad as that of. Simulations suggested that the preferred option is to ignore noise correlations which prevents outliers from spreading. This case has been considered in the simulations. To initialize R- and R- from = up to = 0 = 0, two different approaches are tested: i) solving the batch Huber cost via iteratively re-weighted LS (I); and ii) starting from the all-zero initialization, and then running R- or R-. The performance of R- is depicted in Fig. 3. While better than, R- is outperformed by OR- 0 3 0 0 0 0 OR R 0 0 00 400 600 800 000 Fig. 5. Comparison between R- in [3 and OR- in a non-stationary setting. because R- does not account for noise correlations. The two initializations perform similarly. Fig. 4 depicts the performance of R- versus OR-. Huber initialization provides a considerable improvement in this case, but still OR- exhibits a noticeable improvement over R-. B. Non-Stationary Scenario For the non-stationary case, all parameters including x are selected similar to the stationary case. To obtain x, each entry of x is passed through a first-order AR filter with Gaussian noise input. Specifically, letting x,d denote the dth entry of x the corresponding entry of x + is generated according to x +,d = 0.95x,d + 0.05e,d, where e,d is zero-mean, unit-variance, Gaussian noise assumed independent of all other noise terms. This procedure is repeated for > to obtain a slow-varying x. Figs. 5 and 6 depict the of OR- with β = 0.95 versus, R- of [3, and R- of [3. It can be seen that OR- maredly outperforms all three alternatives.

IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) 7 0 0 0 8 0 6 0 4 0 0 0 R OR 0 0 00 400 600 800 000 Fig. 6. Comparison between R- in [3 and OR- in a non-stationary setting. VI. CONCLUSIONS AND CURRENT RESEARCH A robust algorithm was developed capable of simultaneously handling measurement outliers and accounting for possibly correlated nominal noise. To achieve these desirable properties, outliers were treated as nuisance variables and were estimated jointly with the parameter vector of interest. Identifiability was ensured by regularizing the LS cost with the l -norm of the outlier variables. The resulting optimization problem, dubbed as offline benchmar, was too complex to be solved per time step. For low-complexity implementation, a sub-optimal online algorithm was developed which involved only simple closed-form updates per time step. Initialization with the offline benchmar followed by online updates was advocated as the OR- algorithm of choice in practice. Numerical tests demonstrated improved performance of OR- compared to and state-of-the-art robust renditions. Current research focuses on robust online estimation of dynamical processes adhering to a Gauss-Marov model, which in addition to slow parameter changes that can be followed by OR-, enables tracing of fast-varying parameters too. Preliminary results in this direction can be found in [6. Acnowledgement. The authors wish to than Dr. Daniele Angelosante of ABB, Switzerland, for helpful discussions and suggestions on the early stages of this wor. REFERENCES [ D. Angelosante, J.-A. Bazerque, and G. B. Giannais, Online Adaptive Estimation of Sparse Signals: Where meets the -norm, IEEE Trans. on Signal Processing, vol. 58, no. 7, pp. 3436-3447, July 00. [ D. P. Bertseas and J. N. Tsitsilis, Parallel and Distributed Computation: Numerical Methods, nd ed., Athena Scient., 999. [3 M. Z. A. Bhotto and A. Antoniou, Robust recursive least-squares adaptive-filtering algorithms for impulsive-noise environments, IEEE Signal Processing Letters, vol. 8, no. 3, pp. 85-88, March 0. [4 E. J. Candes and T. Tao, Decoding by linear programming, IEEE Trans. on Information Theory, vol. 5, no., pp. 403-45, Dec. 005. [5 S. C. Chan and Y. X. Zou, A recursive least M-estimate algorithm for robust adaptive filtering in impulsive noise: Fast algorithm and convergence performance analysis, IEEE Trans. on Signal Proc., vol. 5, no. 4, pp. 975 99, Apr. 004. [6 S. Farahmand, D. Angelosante, and G. B. Giannais, Doubly robust Kalman smoothing by exploiting outlier sparsity, Proc. of Asilomar Conf. on Signals, Syst., and Comp., Pacific Grove, CA, Nov. 00. [7 J.-J. Fuchs, An inverse problem approach to robust regression, in Proc. of Int. Conf. Acoust., Speech, Signal Process., Phoenix, AZ, 999, pp. 809-8. [8 G. B. Giannais, G. Mateos, S. Farahmand, V. Keatos, and H. Zhu, USPACOR: Universal sparsity-controlling outlier rejection, Proc. of Intl. Conf. on Acoust., Speech, and Sig. Processing, Prague, Czech Republic, May 0. [9 T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, nd ed. New Yor, NY: Springer, 009. [0 P. J. Huber and E. M. Ronchetti, Robust Statistics. New Yor, NJ, USA: Wiley, 009. [ S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, 993. [ P. Petrus, Robust Huber adaptive filter, IEEE Trans. on Signal Processing, vol. 47, no. 4, pp. 9 33, April 999. [3 L. Rey Vega, H. Rey, J. Benesty, and S. Tressens, A fast robust recursive least-squares algorithm, IEEE Trans. on Signal Processing, vol. 57, no. 3, pp. 09 6, March 009. [4 SeDuMi. [Online. http://sedumi.ie.lehigh.edu/ [5 P. Tseng, Convergence of a bloc coordinate descent method for nondifferentiable minimization, Journal of Optimization Theory and Applications, vol. 09, no. 3, pp. 475-494, June 00. [6 Y. Zou, S. C. Chan, and T. S. Ng, A recursive least M-estimate (RLM) adaptive filter for robust filtering in impulsive noise, IEEE Signal Processing Letters, vol. 7, no., pp. 34-36, Nov. 000.