STATISTICALLY LINEARIZED RECURSIVE LEAST SQUARES. Matthieu Geist and Olivier Pietquin. IMS Research Group Supélec, Metz, France

SAISICALLY LINEARIZED RECURSIVE LEAS SQUARES Mattheu Gest and Olver Petqun IMS Research Group Supélec, Metz, France ABSRAC hs artcle proposes a new nterpretaton of the sgmapont kalman flter SPKF for parameter estmaton as beng a statstcally lnearzed recursve least-squares algorthm. hs gves new nsght on the SPKF for parameter estmaton and partcularly ths provdes an alternatve proof for a result of Van der Merwe. On the other hand, t legtmates the use of statstcal lnearzaton and suggests many ways to use t for parameter estmaton, not necessarly n a least-squares sens. Index erms Recursve least-squares, statstcal lnearzaton, parameter estmaton.. INRODUCION he Unscented Kalman Flter UKF [] has recently been ntroduced as an effcent dervatve-free alternatve to the Extended Kalman Flter EKF [2] for the nonlnear flterng problem. he basc dea behnd UKF s that t s easer to approxmate an arbtrary random varable rather than an arbtrary nonlnear functon. It uses an approxmaton scheme, the so-called unscented transform U [3], to approxmate the statstcs of nterest nvolved by Kalman equatons the flter beng seen as the optmal lnear state estmator mnmzng the expected mean-square error condtoned on past observatons. More generally, a Kalman flter for whch statstcs of nterest are computed by approxmatng the random varable rather than the nonlnear functon s called a Sgma-Pont Kalman Flter SPKF [4]. UKF, but also Dvded Dfference Flter DVF [5] or Central Dfference Flter CDF [6] for example, belong to the SPKF famly. A specal form of SPKF, whch s the case of nterest of ths paper, s SPKF for parameter estmaton [4]. In ths settng, the am s to estmate a set of statonary parameters nstead of trackng a hdden state. It s a smpler case than the general SPKF, because the evoluton equaton of the correspondng state-space model s at most a random walk. However, t s a case of nterest, notably for the Machne Learnng communty, as t s an effcent dervatve-free learnng method provdng an uncertanty nformaton whch can be useful e.g., for actve learnng or the exploraton/explotaton dlemma n renforcement learnng. SPKF for parameter estmaton has been used successfully for supervsed learnng [7] and even for renforcement learnng [8]. In ths artcle, t s shown how the SPKF for parameter estmaton can be obtaned from a least-squares perspectve usng statstcal lnearzaton. hs gves new nsghts by lnkng recursve least-squares to SPKF and suggests that statstcal lnearzaton can provde useful for other optmzaton problems than pure L 2 mnmzaton. An nterpretaton of general UKF as performng a statstcal lnearzaton has been proposed before [9]: the state-space formulaton of the flterng problem s statstcally lnearzed, whch s qute dfferent from the proposed least-squares-based approach. Sec. 2 ntroduce some necessary prelmnares about statstcal lnearzaton and recursve least-squares. Sec. 3 provdes the dervaton of the proposed statstcally lnearzed recursve least-squares approach. Sec. 4 shows that the proposed method actually allows to obtan any form of SPKF for parameter estmaton. It s also shown that the proposed approach can be seen as an alternatve proof for a result of [4], statng that SPKF for parameter estmaton s a maxmum a posteror MAP estmator. Sec. 5 proposes some perspectves of ths work. 2. PRELIMINARIES In ths secton statstcal lnearzaton and lnear recursve least-squares are brefly remnded. For ease of notatons, a scalar output s assumed through ths artcle, however the presented results extend easly to the vectoral case. 2.. Statstcal lnearzaton Let g : x R n y = gx R be a nonlnear functon. Assume that t s evaluated n r ponts x, y =

gx. he followng statstcs are defned : x = r P xx = r P xy = r P yy = r x, ȳ = r y x xx x 2 x xy ȳ 3 y ȳ 2 4 Statstcal lnearzaton conssts n lnearzng y = gx around x by adoptng a statstcal pont of vew. It fnds a lnear model y = Ax + b by mnmzng the sum of squared errors between values of nonlnear an lnearzed functons n the regresson ponts: mn A,b e e wth e = y Ax + b 5 he soluton of Eq. 5 s gven by [0]: A = P yx P xx b = ȳ A x 6 Moreover, t s easy to check that the covarance matrx of the error s gven by: P ee = r e e 7 = P yy AP xx A 8 For now, how to choose regresson ponts has not been dscussed. hs s left for Sec. 3. 2.2. Lnear recursve least-squares Assume the followng lnear observaton model: y = x + v 9 where x R n, y R and v s a whte observaton nose of varance P vv. Least-squares approach seeks at estmatng the n parameter vector by mnmzng the squared error among observed samples x, y,..., x, y : LS = argmn J, J = P vv y x 2 0 he least-squares soluton s classcally obtaned by zerong the gradent of the cost functon J, whch gves the least-squares LS estmate: LS = x x P vv P vv x y he parameter vector can be estmated onlne by beng updated for each new observaton. For ths, the matrx P = P vv x x s computed recursvely by usng the Sherman-Morrson formula: P = P P x x P P vv + x P x 2 By nectng Eq. 2 nto Eq. and by assumng that some prors 0 and P 0 are chosen, the recursve leastsquares RLS algorthm s obtaned: K = RLS P x P vv + x P 3 x = RLS + K y RLS x 4 P = P K Pvv + x P x K 5 hs RLS formulaton provdes useful for the statstcal lnearzaton of nonlnear least-squares presented n the next secton. Notce that ths estmator does not mnmze J, but a regularzed verson of ths cost functon: RLS = argmn J + 0 P 0 0 6 From a Bayesan pont of vew, the LS estmate can be seen as a maxmum lkelhood ML estmate whereas the RLS estmate can be seen as a maxmum a posteror MAP estmate. From now, ths dfference s no longer specfed, as t s clear from the context batch or recursve estmaton. 3. SAISICALLY LINEARIZED RECURSIVE LEAS-SQUARES Assume the followng nonlnear observaton model: y = f x + v 7 where x R n, y R, v s a whte observaton nose of varance P vv and f s a parametrc functon approxmator of nterest for example an artfcal neural network for whch specfes synaptc weghts []. For a set of observed samples, the least-squares soluton s gven by: = argmn P vv y f x 2 8 o address ths nonlnear least-squares problem, the nonlnear observaton model s statstcally lnearzed see Sec. 2.: y = A + b + e + v 9

At ths pont, t should be noted that a set of ponts has to be sampled so as to perform statstcal lnearzaton, that s to compute A, b and e. For now, ths s left as an open queston, ths problem beng addressed later. Let u = e +v be the nose assocated to observaton model 9. Noses v and e beng ndependent, the varance of u s gven by P uu = P vv +P ee. Observaton models 7 and 9 beng equvalent, the least-squares soluton can be rewrtten as: = argmn = P uu y A + b 20 A A P uu P uu A y b 2 Usng the Sherman-Morrson formula, a recursve formulaton of ths estmaton can be obtaned see Sec. 2.2: P A K = P uu + A P A 22 = + K y b A 23 P = P K Puu + A P A K 24 he problem of choosng a specfc statstcal lnearzaton s now addressed. Wth the recursve formulaton, and P are known, and the ssue s to compute A and b. A frst thng s to choose around what pont to lnearze and wth whch magntude. Recall that the prevous estmate s known. Moreover, the matrx P can be nterpreted as the varance matrx assocated to. It s thus legtmate to sample r ponts P, y = f x such that = and = P. he followng statstcs are thus avalable how to sample these r ponts s dscussed n Sec. 4: P P y P yy = = r = P = r = r = r y, ȳ = r y 25 26 y ȳ 27 ȳ 2 28 he soluton to the statstcal lnearzaton problem s thus see Sec. 2.: A = P y P = P y P 29 b = ȳ A = ȳ A 30 he nose varance nduced by the statstcal lnearzaton s gven by see agan Sec. 2.: P ee = P yy A P A 3 Inectng Eq. 29 and 3 nto Eq. 22 gves recall also that P uu = P vv + P ee : P A K = P uu + A P A 32 P Py P = P vv + P yy A P A + A P A 33 P y = P vv + P yy 34 Inectng Eq. 29-30 nto Eq. 23 gves: = + K y b A 35 = + K y ȳ A A 36 = + K y ȳ 37 Inectng Eq. 3 nto Eq. 24 gves recall agan that P uu = P vv + P ee : P = P K Puu + A P A K 38 = P K P vv + P yy K 39 Eq. 34, 37 and 39 defne the statstcally lnearzed recursve least squares SL-RLS algorthm: K = P y P vv + P yy 40 = + K y ȳ 4 P = P K P vv + P yy K 42 he last queston to answer s how to sample the r ponts such that = and P = P. 4. LINKS O SPKF FOR PARAMEER ESIMAION A frst natural dea to sample these r ponts s to assume a Gaussan dstrbuton of mean and of varance matrx P and to compute statstcs of nterest usng a Monte Carlo approach. However, more effcent methods exst, notably the unscented transform [3]. It conssts n determnstcally samplng a set of 2n + so-called sgma-ponts as follows: = = 0 43 = + n + κp n 44 = n + κp + 2n 45

as well as assocated weghts: w 0 = κ n + κ and w = > 0 46 2n + κ where κ s a scalng factor whch controls the accuracy of n the unscented transform [3] and + κp s the th column of the Cholesky decomposton of the matrx n + κp. he mage of each of these sgma-ponts s computed: y = f x, 47 and statstcs of nterest are computed as follows: ȳ = P y = P yy = =0 =0 =0 w y 48 w y ȳ 49 w y ȳ 2 50 As a non-equweghted sum can be rewrtten as an equweghted sum by consderng some of the terms more than one tme by assumng that weghts are ratonal numbers, whch s not a too strong hypothess, the unscented transform can be nterpreted as a form of statstcal lnearzaton. If the unscented transform s consdered as the statstcal lnearzaton process, than the SL-RLS algorthm, that s Eq. 40-42, s exactly the UKF when no evoluton model s consdered n the state-space model. In other words, SL-RLS s the UKF for parameter estmaton. In a smlar way, f the scaled unscented transform [2] s used to perform statstcal lnearzaton, SL-RLS s the scaled UKF for parameter estmaton. If the statstcal lnearzaton s performed usng a Sterlng s nterpolaton, SL-RLS s DDF or CDF for parameter estmaton. So, generally speakng, dependng on the scheme chosen to perform statstcal lnearzaton, SL-RLS s SPKF for parameter estmaton. hs nterpretaton of the SPKF for parameter estmaton as beng a statstcally lnearzed recursve leastsquares algorthm allows to provde an alternatve, smpler and requrng less assumptons proof of a result of Van der Merwe. hs result [4, Ch. 4.5.] states that the SPKF for parameter estmaton algorthm s equvalent to a MAP estmate of the underlyng parameters under a Gaussan posteror and nose dstrbuton assumpton. heorem SL-RLS estmate s a MAP estmate. Assume that pror and nose dstrbutons are Gaussan. hen the statcally lnearzed recursve leastsquares estmate s equvalent to the maxmum a posteror estmate. Proof. By constructon, wth prors defned by 0 and P 0, the SL-RLS estmate mnmzes the followng regularzed cost functon: = argmn y A + b 2 P =0 vv + P ee + 0 P 0 0 = argmn y f x 2 P =0 vv + 0 P 0 0 5 52 On the other hand, the MAP estmator s defned as and usng the Bayes rule: MAP py : p = argmax p y : = argmax py : 53 As the observaton nose s whte, the ont lkelhood s the product of local lkelhoods, and the probablty py : does not depend on, so: MAP = argmax p pr 54 Pror and nose dstrbutons are assumed to be Gaussan, thus: p exp 2 0 P 0 0 55 py exp y f x 2 56 2P vv Fnally, maxmzng a product of probablty dstrbutons s equvalent to mnmzng the sum of the negatves of ther logarthms, whch gves the result: MAP = argmn y f x 2 P =0 vv + 0 P 0 0 = 57 hs alternatve proof s shorter than the orgnal one. But above all t does not assume that the posteror dstrbuton s Gaussan, whch s a very strong assumpton for a nonlnear observaton model. 5. PERSPECIVES In ths artcle a statstcally lnearzed recursve leastsquares algorthm has been ntroduced. It has been

shown to be actually the SPKF for parameter estmaton algorthm. hs gves new nsghts to sgma-pont Kalman flters by showng that they are generalzatons of a statstcally lnearzed least-squares approach. hs new pont of vew allowed to provde an alternatve proof of a result statng that wthout evoluton model, SPKF estmate s the maxmum a posteror estmate. he proof proposed n Sec. 4 s shorter and above all t does not assume a Gaussan posteror, whch s a very strong hypothess n the case of a nonlnear evoluton model. he technque of statstcal lnearzaton can be appled n much more general problems than the L 2 mnmzaton addressed n ths paper. he fact that statstcally lnearzed recursve least-squares s ndeed a specal form of sgma-pont Kalman flterng tends to ustfy ths approach. Interestng perspectves can be but are not lmted to the applcaton of ths general statstcal lnearzaton to L mnmzaton e.g., [3], L regularzaton e.g., [4] or fxed-pont approxmaton e.g., [5, 6]. 6. REFERENCES [] Smon J. Juler and Jeffrey K. Uhlmann, A new extenson of the Kalman flter to nonlnear systems, n Int. Symp. Aerospace/Defense Sensng, Smul. and Controls 3, 997. [2] Dan Smon, Optmal State Estmaton: Kalman, H Infnty, and Nonlnear Approaches, Wley & Sons, August 2006. [3] S. J. Juler and J. K. Uhlmann, Unscented flterng and nonlnear estmaton, Proceedngs of the IEEE, vol. 92, no. 3, pp. 40 422, 2004. [4] Rudolph van der Merwe, Sgma-Pont Kalman Flters for Probablstc Inference n Dynamc State- Space Models, Ph.D. thess, OGI School of Scence & Engneerng, Oregon Health & Scence Unversty, Portland, OR, USA, 2004. [8] Mattheu Gest, Olver Petqun, and Gabrel Frcout, Kalman emporal Dfferences: the determnstc case, n IEEE Internatonal Symposum on Adaptve Dynamc Programmng and Renforcement Learnng ADPRL 2009, Nashvlle, N, USA, Aprl 2009. [9] ne Lefebvre, Herman Bruynnckx, and Jors De Shutter, Comments on A New Method for the Nonlnear ransformaton of Means and Covarances n Flters and Estmators, IEEE ransactons on Automatc Control, vol. 47, no. 8, pp. 406 409, 2002. [0]. W. Anderson, An Introducton to Multvarate Statstcal Analyss, Wley, 984. [] Chrstopher M. Bshop, Neural Networks for Pattern Recognton, Oxford Unversty Press, New York, USA, 995. [2] Smon J. Juler, he scaled unscented transformaton, n Amercan Control Conference, 2002, vol. 6, pp. 4555 4559. [3] G.O. Wesolowsky, New Descent Algorthm for the Least Absolute Value Regresson Problem, Communcatons n Statstcs - Smulaton and Computaton,, no. 5, pp. 479 49, 98. [4] R. bshran, Regresson Shrnkage and Selecton va the LASSO, Journal of the Royal Statstcal Socety. Seres B Methodologcal, pp. 267 288, 996. [5] Steven J. Bradtke and Andrew G. Barto, Lnear Least-Squares algorthms for temporal dfference learnng, Machne Learnng, vol. 22, no. -3, pp. 33 57, 996. [6] A. Nedć and D. P. Bertsekas, Least Squares Polcy Evaluaton Algorthms wth Lnear Functon Approxmaton, Dscrete Event Dynamc Systems: heory and Applcatons, vol. 3, pp. 79 0, 2003. [5] M. Noorgaard, N. Poulsen, and O. Ravn, New Developments n State Estmaton for Nonlnear Systems, Automatca, 2000. [6] K. Ito and K. Xong, Gaussan Flters for Nonlnear Flterng Problems, IEEE ransactons on Automatc Control, vol. 25, no. 5, pp. 90 927, 2000. [7] Smon Haykn, Kalman Flterng and Neural Networks, Wley, 200.