Sliding Window Recursive Quadratic Optimization with Variable Regularization

11 American Control Conference on O'Farrell Street, San Francisco, CA, USA June 29 - July 1, 11 Sliding Window Recursive Quadratic Optimization with Variable Regularization Jesse B. Hoagg, Asad A. Ali, Magnus Mossberg, and Dennis S. Bernstein Abstract In this paper, we present a sliding-window variable-regularization recursive least squares algorithm. In contrast to standard recursive least squares, the algorithm presented in this paper operates on a finite window of data, where old data are discarded as new data become available. This property can be beneficial for estimating time-varying parameters. Furthermore, standard recursive least squares uses time-invariant regularization. More specifically, the inverse of the initial covariance matrix in standard recursive least squares can be viewed as a regularization term, which weights the difference between the next estimate and the initial estimate. This regularization is fixed for all steps of the recursion. The algorithm derived in this paper allows for time-varying regularization. In particular, the present paper allows for timevarying regularization in the weighting as well as what is being weighted. Specifically, the regularization term can weight the difference between the next estimate and a time-varying vector of parameters rather than the initial estimate. I. INTRODUCTION Within signal processing, identification, estimation, and control, recursive least squares RLS and gradient-based optimization techniques are among the most fundamental and widely used algorithms [1] [8]. The standard RLS algorithm operates on a growing window of data, that is, new data are added to the RLS cost function as they become available and old data are not directly discarded but rather progressively discounted through the use of a forgetting factor. In contrast, a sliding-window RLS algorithm operates on a finite window of data with fixed length; new data replace old data in the sliding-window RLS cost function. Sliding-window leastsquares techniques are available in both batch and recursive formulations [9] [13]. A sliding-window RLS algorithm with time-varying regularization is developed in the present paper. A growingwindow RLS algorithm with time-varying regularization is presented in [14]. In standard RLS, the positive-definite initialization of the covariance matrix can be interpreted as the weighting on a regularization term within the context of a quadratic optimization. Until at least n measurements are available, this regularization term compensates for the lac of persistency in order to obtain a unique solution from the RLS algorithm. Traditionally, the regularization term is fixed for J. B. Hoagg is with the Department of Mechanical Engineering, The University of Kentucy, Lexington, KY. Email: jhoagg@engr.uy.edu A. A. Ali is with the Department of Aerospace Engineering, The University of Michigan, Ann Arbor, MI. Email: asadali@umich.edu M. Mossberg is with the Department of Physics and Electrical Engineering, Karlstad University, Karlstad, Sweden. Email: Magnus.Mossberg@au.se D. S. Bernstein is with the Department of Aerospace Engineering, The University of Michigan, Ann Arbor, MI. Email: dsbaero@umich.edu all steps of the recursion. In the present wor, we derive a sliding-window variable-regularization RLS SW-VR-RLS algorithm, where the weighting in the regularization term may change at each step. As a special case, the regularization can be decreased in magnitude or ran as the ran of the covariance matrix increases, and can be removed entirely when no longer needed. This ability is not available in standard RLS where the regularization term is weighted by the inverse of the initial covariance. A second extension presented in this paper also involves the regularization term. Specifically, the regularization term in traditional RLS weights the difference between the next estimate and the initial estimate. In the present paper, the regularization term weights the difference between the next estimate and an arbitrarily chosen time-varying vector. As a special case, the time-varying vector can be the current estimate, and thus the regularization term weights the difference between the next estimate and the current estimate. This formulation allows us to modulate the rate at which the current estimate changes from step to step. For these extensions, we derive SW-VR-RLS update equations. In the next section, we derive the update equations for SW-VR-RLS. In the remaining sections of the paper, we investigate the performance of SW-VR-RLS under various conditions of noise and persistency. II. SW-VR-RLS ALGORITHM For all i, let A i R n n, b i,α i R n, and R i R n n, where A i and R i are positive semidefinite, define A, b, let r be a nonnegative integer, and assume that, for all r, i r A i + R and i r+1 A i + R +1 are positive definite. Define the sliding-window regularized quadratic cost J x i r x T A i x+b T i x +x α T R x α, 1 where x R n and x α is the minimizer of J x. The minimizer x of 1 is given by 1 x 1 2 A i +R b i 2R α. 2 i r i r We now derive the update equations for the SW-VR-RLS algorithm. To rewrite 2 recursively, define P i r A i +R 1, 978-1-4577-79-8/11/$26. 11 AACC 3275

which means that and +1 P 1 +1 Now, x 1 2 P i+1 r i r i r A i +R +1 b i 2R α A i +A +1 A r +R +1 R +R P 1 +A +1 A r +R +1 R. x +1 1 2 P +1 1 2 P +1 +1 i+1 r i r b i 2R +1 α +1 b i +b +1 b r 2R +1 α +1. 3 and it follows from 3 that x +1 1 2 P +1 2P 1 x +2R α +b +1 b r 2R +1 α +1 1 2 P +1 2P 1 +1 A +1 +A r R +1 +R x +2R α +b +1 b r 2R +1 α +1 x P +1 A +1 A r x +R +1 R x +R α + 1 2 b +1 b r R +1 α +1. To rewrite P +1 recursively, consider the decomposition A +1 ψ +1 ψ T +1, 4 where ψ +1 R n n +1 and n +1 rana+1. Consequently, P +1 P 1 +A +1 A r +R +1 R 1, 5 where the inverse exists since i r A i + R is positive definite. Next, define M +1 P 1 +R +1 R A r 1, 6 where the inverse exists since P 1 +R +1 R A r i r+1. A i +R +1, which is assumed to be positive definite. It follows from 4 6 that P +1 M 1 +1 +A 1 +1 M 1 1. +1 +ψ +1ψ+1 T Using the matrix inversion lemma X +UCV 1 X 1 X 1 U C 1 +VX 1 U 1 VX 1, 7 with X M 1 +1, U ψ +1, C I, and V ψ+1 T, it follows that P +1 M +1 I n ψ +1 In+1 +ψ+1m T 1 +1 ψ +1 Next, define ψ T +1M +1. 8 Q +1 P 1 +R +1 R 1, 9 where this inverse exists since P 1 +R +1 R M 1 +1, and thus Q +1 M +1. It follows from 4, 6, and 9 that M +1 Q 1 +1 A 1 r Q 1 1. +1 ψ rψ r T Using 7 with X Q 1 +1, U ψ r, C I, and V, it follows that ψ T r M +1 Q +1 I n ψ r In+1 +ψ T rq +1 ψ r 1 ψ T rq +1. Next, consider the decomposition R +1 R φ +1 S +1 φ T +1, 1 where φ +1 R n m +1, m +1 ranr+1 R, and S +1 R m +1 m +1 is a matrix of the form ±1 S +1 ±1...... ±1 It follows from 9 and 1 that Q +1 P 1 +φ +1 S +1 φ T +1 1. Using 7 with X P 1, U φ +1, C S +1, and V φ T +1, it follows that Q +1 P I n φ +1 S+1 +φ T 1 +1P φ +1 φ T +1P. In summary, for, the recursive minimizer of 1 is 3276

given by Q +1 P I n φ +1 S+1 +φ T +1P φ +1 1 φ T +1P, 11 M +1 Q +1 I n ψ r In+1 +ψ T rq +1 ψ r 1 ψ T rq +1, 12 P +1 M +1 I n ψ +1 In+1 +ψ T +1M +1 ψ +1 1 ψ T +1M +1, 13 x +1 x P +1 A +1 A r x +R +1 R x +R α + 1 2 b +1 b r R +1 α +1, 14 where x α, P R 1, ψ +1 is given by 4, and φ +1 is given by 1. In the case where the regularization weighting is constant, that is, for all,r R >, 11 simplifies to Q +1 P, and thus propagation of Q is not required. III. SETUP FOR NUMERICAL SIMULATIONS For all, let x,opt R n, ψ R n, where its i th entry ψ,i is generated from a zero mean, unit variance Gaussian distribution. The entries of ψ are independent. Define β ψ T x,opt. Let l be the number of data points. Define σ ψ,i 1 l ψ 2 l,i 1, l σ β 1 l 1 l 1 β 2 l x T,opt x,opt. Next, for i 1,...,n, let N,i R, and M R be generated from zero-mean Gaussian distributions with variances σ 2 N,i and σ2 M, respectively, where σ N,i and σ M are determined from the signal-to-noise ratio SNR. More specifically, for i 1,...,n, σ ψ,i σ β SNR ψ,i, and SNR β, σ N,i σ M 1 K where, for i 1,...,n, σ N,i K 1 N2,i and σ M 1 K K 1 M2. For, define A ψ + N ψ + N T and b 2β +M ψ +N, where N is the noise in ψ and M is the noise in β. Define z 1 [.8 1.12 1.6 1.5 2.2 2.1.32 ] T, z 2 [ 1.11.2 1.1.2.4.23 2.5 ] T. Unless otherwise specified, for all, x,opt z 1, α x, and x 7 1. Define the performance x,opt x ε. x,opt IV. NUMERICAL SIMULATIONS OF SW-VR-RLS WITH NOISELESS DATA In this section, we investigate the effect of window size r, R, and α on SW-VR-RLS. The data contain no noise, specifically, for all, N 7 1, and M. A. Effect of Window Size In the following example, we test SW-VR-RLS for three different values of r. Specifically, r 1, r 1, or r 5. In all cases, α x 1, for all, R I 7 7, A and b are the same for all cases, and z1, 115, x,opt z 2, > 115. For this example, Figure 1 shows that, for 115, larger values of r yield faster convergence of ε to zero. For > 115, x,opt x 115,opt, and larger values of r yield faster convergence of ε to zero; however, larger values of r can yield worse transient performance because the larger window retains the data relating to z 1 for more time steps. ε % 1 1 1 1 1 5 1 15 25 r 1 r 1 r 5 Fig. 1: Effect of r on convergence of x to x,opt. In this example, for 115, larger values of r yield faster convergence of ε to zero. For > 115, x,opt x 115,opt, and larger values of r yield faster convergence of ε to zero; however, larger values of r yield worse transient performance because the larger window retains the data relating to z 1 for more time steps. B. Effect of R In this section, we examine the effect of R, where, for all, R is constant. In the following example, we test SW-VR-RLS for three different R. Specifically, for all, R 1I 7 7, R I 7 7, R.1I 7 7. In all cases, for all, A and b are the same, α x 1, r 15, and z1, 115, x,opt z 2, > 115. For this example, Figure 2 shows that, for 115, smaller values of R yields faster convergence of ε to zero. Similarly, for > 115, x,opt x 115,opt, and smaller values of R yields faster convergence of ε to zero. 3277

ε % 1 1 1 1 1 R 1I 7 7 R I 7 7 R.1I 7 7 ε % 12 1 8 6 4 2 5 1 15 25 Fig. 2: Effect of R on convergence of x to x,opt. For this example, for 115, smaller values of R yields faster convergence of ε to zero. Similarly, for > 115, x,opt x 115,opt, and smaller values of R yields faster convergence of ε to zero. C. Loss of Persistency In this section, we study the effect of loss of persistency on SW-VR-RLS. More specifically, for all 5, A A 5 and b b 5. Moreover, for all, R.1I 7 7, r 15, and α x 1. For this example, Figure 3 shows that ε approaches zero; however, Figure 4 shows that P increases after the data lose persistency, but P remains bounded. 1 1 3 5 7 9 1 Fig. 4: Effect of loss of persistency on P. In this example, P increases after the data lose persistency, but P remains bounded. where ν is a positive integer. In the following example, we test SW-VR-RLS for three different ν. Specifically, ν 1, ν 5 or ν 15. In all cases, for all, A and b are the same, R I 7 7, SNR β SNR ψ,i 5 and r 1. For this example, Figure 5 shows that larger values of ν yield smaller asymptotic values of ε. 9 7 α L ν, ν 1 α L ν, ν 5 α L ν, ν 1 9 7 ε % 5 ε % 5 3 3 1 1 3 5 7 9 1 Fig. 3: Effect of loss of persistency on convergence of x to x,opt. The data lose persistency at the 5th step. In this example, ε approaches zero. V. NUMERICAL SIMULATIONS WITH NOISY DATA In this section, we investigate the effect of window size r, R, and α on SW-VR-RLS when the data have noise. More specifically, for all, M and N,i are generated from zero mean Gaussian distributions with variances depending on SNR ψ,i and SNR β. A. Effect of α In this section, we first compare SW-VR-RLS for different choices of α. More specifically, we let α L ν where L ν x 1, < ν, x ν, > ν, 1 3 5 7 Fig. 5: Convergence of x to x,opt. For this example, larger values of ν yield smaller asymptotic values of ε. Next, we let α W ρ where x, 1, W ρ 1 1 1 i1 x i, 1 < ρ, 1 ρ ρ i1 x i, > ρ, where ρ is a positive integer. In the following example, we test SW-VR-RLS for three different values of ρ. Specifically, ρ 1, ρ 5 or ρ 15. In all cases, for all, A and b are the same, R I 7 7, SNR β SNR ψ,i 5 and r 1. For this example, Figure 6 shows that larger values of ρ yield smaller asymptotic values of ε. B. Effect of Window Size In the following example, we test SW-VR-RLS for three different r. Specifically, r 5, r 7, or r 1. In all cases, α x 1, SNR β SNR ψ,i 5, for all, 3278

ε % 9 7 5 3 1 1 3 5 7 9 1 11 1 α W ρ, ρ 1 α W ρ, ρ 5 α W ρ, ρ 15 Fig. 6: Convergence of x to x,opt. For this example, larger values of ρ yield smaller asymptotic values of ε. R I 7 7, A and b are the same for all cases, and z1, 115, x,opt z 2, > 115. For this example, Figure 7 shows thatr 5 yields a smaller asymptotic value of ε than r 1 and r 7. However, this trend is not monotonic since r 7 yields a larger asymptotic value of ε than r 1. Numerical tests suggest that the asymptotic value of ε increases as the window size is increased from r 1 until it peas at a certain value of r. After this the asymptotic value of ε decreases as r increases. ε % 1 1 1 1 1 5 1 15 25 r 5 r 7 r 1 Fig. 7: Effect ofr on convergence ofx tox,opt. This figure shows that r 5 yields a smaller asymptotic value of ε than r 1 and r 7. However, this trend is not monotonic since r 7 yields a larger asymptotic value of ε than r 1. C. Effect of R First, we examine the effect of R where for all, R is constant. In the following example, we test SW-VR-RLS for three different R. Specifically, for all, R I 7 7, R.1I 7 7, R.1I 7 7. In all cases, α x 1, SNR β SNR ψ,i 5, r 1, for all, A and b are the same for all cases, and z1, 115, x,opt z 2, > 115. For this example, Figure 8 shows that a smaller value of R ε % 1 1 1 1 1 5 1 15 25 R 1I 7 7 R I 7 7 R.1I 7 7 Fig. 8: Effect of R on convergence of x to x,opt. For this example, a smaller value of R results in faster convergence of ε to its asymptotic value, but yields a larger asymptotic value of ε. results in faster convergence of ε to its asymptotic value but yields a larger value. Once x becomes close to x,opt, x oscillates about x,opt. The amplitude of this oscillation depends on R, specifically, a larger value of R allows less change in x,opt which maes the amplitude of oscillation smaller. Therefore, choosing a larger R yields a smaller asymptotic value of ε. Next, we let R start small and then grow to a specified value as increases. More specifically R X X I 7 7 e y e γ, 15 where X R 7 7 and γ is a positive integer. In this example, we compare SW-VR-RLS with R I 7 7 and R given by 15 with X I 7 7 and γ.2. For this example, Figure 9 shows that R given by 15 yields smaller asymptotic value of ε than R I 7 7. ε % 9 7 5 3 R 23.67e.2 I 7 7 R I 7 7 1 1 1 1 1 1 Fig. 9: Effect of R on convergence of x to x,opt. For this example, R 23.67e.2 I 7 7 yields a smaller asymptotic value of ε than R I 7 7. D. Loss of Persistency In this section, we study the effect of loss of persistency on SW-VR-RLS. More specifically, for all 5, A A 5 3279

ε % 1 1 1 1 1 1 1 1 Fig. 1: Effect of loss of persistency on convergence ofx tox,opt. The data lose persistency at the 5th step. In this example, ε increases after the data lose persistency, but ε remains bounded. P 12 1 8 6 4 2 1 1 1 1 1 Fig. 11: Effect of loss of persistency on P. In this example, P increases after the data lose persistency, but P remains bounded. and b b 5. Moreover, for all, R.1I 7 7, r 15, SNR β SNR ψ,i 5, and α x 1. For this example, Figure 1 shows that ε increases after the data lose persistency, but ε remains bounded. Figure 11 shows that P increases after the data lose persistency, but P remains bounded. VI. CONCLUSIONS In this paper, we presented a sliding-window variableregularization recursive least squares SW-VR-RLS algorithm. This algorithm operates on a finite window of data, where old data are discarded as new data become available. Furthermore, this algorithm allows for a time-varying regularization term in the sliding-window RLS cost function. More specifically, SW-VR-RLS allows us to vary both the weighting in the regularization as well as what is being weighted, that is, the regularization term can weight the difference between the next state estimate and a time-varying vector of parameters rather than the initial state estimate. REFERENCES [1] A. H. Sayed, Adaptive Filters. Hoboen, New Jersey: John Wiley and Sons, Inc., 8. [2] J. N. Juang, Applied System Identification. Upper Saddle River, NJ: Prentice-Hall, 1993. [3] L. Ljung and T. Söderström, Theory and practice of Recursive Identification. Cambridge, MA: The MIT Press, 1983. [4] L. Ljung, System Identification: Theory for the User, 2nd ed. Upper Saddle River, NJ: Prentice-Hall Information and Systems Sciences, 1999. [5] K. J. Åström and B. Wittenmar, Adaptive Control, 2nd ed. Addison- Wesley, 1995. [6] G. C. Goodwin and K. S. Sin, Adaptive Filtering, Prediction, and Control. Prentice Hall, 1984. [7] G. Tao, Adaptive Control Design and Analysis. Wiley, 3. [8] P. Ioannou and J. Sun, Robust Adaptive Control. Prentice Hall, 1996. [9] F. Gustafsson, Adaptive Filtering and Change Detection. Wiley,. [1] J. Jiang and Y. Zhang, A novel variable-length sliding window blocwise least-squares algorithm for on-line estimation of timevarying parameters, Int. J. Adaptive Contr. Sig. Proc., vol. 18, pp. 55 521, 4. [11] M. Belge and E. L. Miller, A sliding window RLS-lie adaptive algorithm for filtering alpha-stable noise, IEEE Sig. Proc. Let., vol. 7, no. 4, pp. 86 89,. [12] B. Y. Choi and Z. Bien, Sliding-windowed weighted recursive leastsquares method for parameter estimation, Electronics Let., vol. 25, no., pp. 1381 1382, 1989. [13] H. Liu and Z. He, A sliding-exponential window RLS adaptive algorithm: properties and applications, Sig. Proc., vol. 45, no. 3, pp. 357 368, 1995. [14] A. A. Ali, J. B. Hoagg, M. Mossberg, and D. S. Bernstein, Growing window recursive quadratic optimization with variable regularization, in Proc. Conf. Dec. Contr., Atlanta, GA, December 1, pp. 496 51. 32