STATISTICALLY LINEARIZED RECURSIVE LEAST SQUARES. Matthieu Geist and Olivier Pietquin. IMS Research Group Supélec, Metz, France

Size: px

Start display at page:

Download "STATISTICALLY LINEARIZED RECURSIVE LEAST SQUARES. Matthieu Geist and Olivier Pietquin. IMS Research Group Supélec, Metz, France"

Alaina Hardy
5 years ago
Views:

1 SAISICALLY LINEARIZED RECURSIVE LEAS SQUARES Mattheu Gest and Olver Petqun IMS Research Group Supélec, Metz, France ABSRAC hs artcle proposes a new nterpretaton of the sgmapont kalman flter SPKF for parameter estmaton as beng a statstcally lnearzed recursve least-squares algorthm. hs gves new nsght on the SPKF for parameter estmaton and partcularly ths provdes an alternatve proof for a result of Van der Merwe. On the other hand, t legtmates the use of statstcal lnearzaton and suggests many ways to use t for parameter estmaton, not necessarly n a least-squares sens. Index erms Recursve least-squares, statstcal lnearzaton, parameter estmaton.. INRODUCION he Unscented Kalman Flter UKF [] has recently been ntroduced as an effcent dervatve-free alternatve to the Extended Kalman Flter EKF [2] for the nonlnear flterng problem. he basc dea behnd UKF s that t s easer to approxmate an arbtrary random varable rather than an arbtrary nonlnear functon. It uses an approxmaton scheme, the so-called unscented transform U [3], to approxmate the statstcs of nterest nvolved by Kalman equatons the flter beng seen as the optmal lnear state estmator mnmzng the expected mean-square error condtoned on past observatons. More generally, a Kalman flter for whch statstcs of nterest are computed by approxmatng the random varable rather than the nonlnear functon s called a Sgma-Pont Kalman Flter SPKF [4]. UKF, but also Dvded Dfference Flter DVF [5] or Central Dfference Flter CDF [6] for example, belong to the SPKF famly. A specal form of SPKF, whch s the case of nterest of ths paper, s SPKF for parameter estmaton [4]. In ths settng, the am s to estmate a set of statonary parameters nstead of trackng a hdden state. It s a smpler case than the general SPKF, because the evoluton equaton of the correspondng state-space model s at most a random walk. However, t s a case of nterest, notably for the Machne Learnng communty, as t s an effcent dervatve-free learnng method provdng an uncertanty nformaton whch can be useful e.g., for actve learnng or the exploraton/explotaton dlemma n renforcement learnng. SPKF for parameter estmaton has been used successfully for supervsed learnng [7] and even for renforcement learnng [8]. In ths artcle, t s shown how the SPKF for parameter estmaton can be obtaned from a least-squares perspectve usng statstcal lnearzaton. hs gves new nsghts by lnkng recursve least-squares to SPKF and suggests that statstcal lnearzaton can provde useful for other optmzaton problems than pure L 2 mnmzaton. An nterpretaton of general UKF as performng a statstcal lnearzaton has been proposed before [9]: the state-space formulaton of the flterng problem s statstcally lnearzed, whch s qute dfferent from the proposed least-squares-based approach. Sec. 2 ntroduce some necessary prelmnares about statstcal lnearzaton and recursve least-squares. Sec. 3 provdes the dervaton of the proposed statstcally lnearzed recursve least-squares approach. Sec. 4 shows that the proposed method actually allows to obtan any form of SPKF for parameter estmaton. It s also shown that the proposed approach can be seen as an alternatve proof for a result of [4], statng that SPKF for parameter estmaton s a maxmum a posteror MAP estmator. Sec. 5 proposes some perspectves of ths work. 2. PRELIMINARIES In ths secton statstcal lnearzaton and lnear recursve least-squares are brefly remnded. For ease of notatons, a scalar output s assumed through ths artcle, however the presented results extend easly to the vectoral case. 2.. Statstcal lnearzaton Let g : x R n y = gx R be a nonlnear functon. Assume that t s evaluated n r ponts x, y =

2 gx. he followng statstcs are defned : x = r P xx = r P xy = r P yy = r x, ȳ = r y x xx x 2 x xy ȳ 3 y ȳ 2 4 Statstcal lnearzaton conssts n lnearzng y = gx around x by adoptng a statstcal pont of vew. It fnds a lnear model y = Ax + b by mnmzng the sum of squared errors between values of nonlnear an lnearzed functons n the regresson ponts: mn A,b e e wth e = y Ax + b 5 he soluton of Eq. 5 s gven by [0]: A = P yx P xx b = ȳ A x 6 Moreover, t s easy to check that the covarance matrx of the error s gven by: P ee = r e e 7 = P yy AP xx A 8 For now, how to choose regresson ponts has not been dscussed. hs s left for Sec Lnear recursve least-squares Assume the followng lnear observaton model: y = x + v 9 where x R n, y R and v s a whte observaton nose of varance P vv. Least-squares approach seeks at estmatng the n parameter vector by mnmzng the squared error among observed samples x, y,..., x, y : LS = argmn J, J = P vv y x 2 0 he least-squares soluton s classcally obtaned by zerong the gradent of the cost functon J, whch gves the least-squares LS estmate: LS = x x P vv P vv x y he parameter vector can be estmated onlne by beng updated for each new observaton. For ths, the matrx P = P vv x x s computed recursvely by usng the Sherman-Morrson formula: P = P P x x P P vv + x P x 2 By nectng Eq. 2 nto Eq. and by assumng that some prors 0 and P 0 are chosen, the recursve leastsquares RLS algorthm s obtaned: K = RLS P x P vv + x P 3 x = RLS + K y RLS x 4 P = P K Pvv + x P x K 5 hs RLS formulaton provdes useful for the statstcal lnearzaton of nonlnear least-squares presented n the next secton. Notce that ths estmator does not mnmze J, but a regularzed verson of ths cost functon: RLS = argmn J + 0 P From a Bayesan pont of vew, the LS estmate can be seen as a maxmum lkelhood ML estmate whereas the RLS estmate can be seen as a maxmum a posteror MAP estmate. From now, ths dfference s no longer specfed, as t s clear from the context batch or recursve estmaton. 3. SAISICALLY LINEARIZED RECURSIVE LEAS-SQUARES Assume the followng nonlnear observaton model: y = f x + v 7 where x R n, y R, v s a whte observaton nose of varance P vv and f s a parametrc functon approxmator of nterest for example an artfcal neural network for whch specfes synaptc weghts []. For a set of observed samples, the least-squares soluton s gven by: = argmn P vv y f x 2 8 o address ths nonlnear least-squares problem, the nonlnear observaton model s statstcally lnearzed see Sec. 2.: y = A + b + e + v 9

3 At ths pont, t should be noted that a set of ponts has to be sampled so as to perform statstcal lnearzaton, that s to compute A, b and e. For now, ths s left as an open queston, ths problem beng addressed later. Let u = e +v be the nose assocated to observaton model 9. Noses v and e beng ndependent, the varance of u s gven by P uu = P vv +P ee. Observaton models 7 and 9 beng equvalent, the least-squares soluton can be rewrtten as: = argmn = P uu y A + b 20 A A P uu P uu A y b 2 Usng the Sherman-Morrson formula, a recursve formulaton of ths estmaton can be obtaned see Sec. 2.2: P A K = P uu + A P A 22 = + K y b A 23 P = P K Puu + A P A K 24 he problem of choosng a specfc statstcal lnearzaton s now addressed. Wth the recursve formulaton, and P are known, and the ssue s to compute A and b. A frst thng s to choose around what pont to lnearze and wth whch magntude. Recall that the prevous estmate s known. Moreover, the matrx P can be nterpreted as the varance matrx assocated to. It s thus legtmate to sample r ponts P, y = f x such that = and = P. he followng statstcs are thus avalable how to sample these r ponts s dscussed n Sec. 4: P P y P yy = = r = P = r = r = r y, ȳ = r y y ȳ 27 ȳ 2 28 he soluton to the statstcal lnearzaton problem s thus see Sec. 2.: A = P y P = P y P 29 b = ȳ A = ȳ A 30 he nose varance nduced by the statstcal lnearzaton s gven by see agan Sec. 2.: P ee = P yy A P A 3 Inectng Eq. 29 and 3 nto Eq. 22 gves recall also that P uu = P vv + P ee : P A K = P uu + A P A 32 P Py P = P vv + P yy A P A + A P A 33 P y = P vv + P yy 34 Inectng Eq nto Eq. 23 gves: = + K y b A 35 = + K y ȳ A A 36 = + K y ȳ 37 Inectng Eq. 3 nto Eq. 24 gves recall agan that P uu = P vv + P ee : P = P K Puu + A P A K 38 = P K P vv + P yy K 39 Eq. 34, 37 and 39 defne the statstcally lnearzed recursve least squares SL-RLS algorthm: K = P y P vv + P yy 40 = + K y ȳ 4 P = P K P vv + P yy K 42 he last queston to answer s how to sample the r ponts such that = and P = P. 4. LINKS O SPKF FOR PARAMEER ESIMAION A frst natural dea to sample these r ponts s to assume a Gaussan dstrbuton of mean and of varance matrx P and to compute statstcs of nterest usng a Monte Carlo approach. However, more effcent methods exst, notably the unscented transform [3]. It conssts n determnstcally samplng a set of 2n + so-called sgma-ponts as follows: = = 0 43 = + n + κp n 44 = n + κp + 2n 45

4 as well as assocated weghts: w 0 = κ n + κ and w = > n + κ where κ s a scalng factor whch controls the accuracy of n the unscented transform [3] and + κp s the th column of the Cholesky decomposton of the matrx n + κp. he mage of each of these sgma-ponts s computed: y = f x, 47 and statstcs of nterest are computed as follows: ȳ = P y = P yy = =0 =0 =0 w y 48 w y ȳ 49 w y ȳ 2 50 As a non-equweghted sum can be rewrtten as an equweghted sum by consderng some of the terms more than one tme by assumng that weghts are ratonal numbers, whch s not a too strong hypothess, the unscented transform can be nterpreted as a form of statstcal lnearzaton. If the unscented transform s consdered as the statstcal lnearzaton process, than the SL-RLS algorthm, that s Eq , s exactly the UKF when no evoluton model s consdered n the state-space model. In other words, SL-RLS s the UKF for parameter estmaton. In a smlar way, f the scaled unscented transform [2] s used to perform statstcal lnearzaton, SL-RLS s the scaled UKF for parameter estmaton. If the statstcal lnearzaton s performed usng a Sterlng s nterpolaton, SL-RLS s DDF or CDF for parameter estmaton. So, generally speakng, dependng on the scheme chosen to perform statstcal lnearzaton, SL-RLS s SPKF for parameter estmaton. hs nterpretaton of the SPKF for parameter estmaton as beng a statstcally lnearzed recursve leastsquares algorthm allows to provde an alternatve, smpler and requrng less assumptons proof of a result of Van der Merwe. hs result [4, Ch. 4.5.] states that the SPKF for parameter estmaton algorthm s equvalent to a MAP estmate of the underlyng parameters under a Gaussan posteror and nose dstrbuton assumpton. heorem SL-RLS estmate s a MAP estmate. Assume that pror and nose dstrbutons are Gaussan. hen the statcally lnearzed recursve leastsquares estmate s equvalent to the maxmum a posteror estmate. Proof. By constructon, wth prors defned by 0 and P 0, the SL-RLS estmate mnmzes the followng regularzed cost functon: = argmn y A + b 2 P =0 vv + P ee + 0 P 0 0 = argmn y f x 2 P =0 vv + 0 P On the other hand, the MAP estmator s defned as and usng the Bayes rule: MAP py : p = argmax p y : = argmax py : 53 As the observaton nose s whte, the ont lkelhood s the product of local lkelhoods, and the probablty py : does not depend on, so: MAP = argmax p pr 54 Pror and nose dstrbutons are assumed to be Gaussan, thus: p exp 2 0 P py exp y f x P vv Fnally, maxmzng a product of probablty dstrbutons s equvalent to mnmzng the sum of the negatves of ther logarthms, whch gves the result: MAP = argmn y f x 2 P =0 vv + 0 P 0 0 = 57 hs alternatve proof s shorter than the orgnal one. But above all t does not assume that the posteror dstrbuton s Gaussan, whch s a very strong assumpton for a nonlnear observaton model. 5. PERSPECIVES In ths artcle a statstcally lnearzed recursve leastsquares algorthm has been ntroduced. It has been

5 shown to be actually the SPKF for parameter estmaton algorthm. hs gves new nsghts to sgma-pont Kalman flters by showng that they are generalzatons of a statstcally lnearzed least-squares approach. hs new pont of vew allowed to provde an alternatve proof of a result statng that wthout evoluton model, SPKF estmate s the maxmum a posteror estmate. he proof proposed n Sec. 4 s shorter and above all t does not assume a Gaussan posteror, whch s a very strong hypothess n the case of a nonlnear evoluton model. he technque of statstcal lnearzaton can be appled n much more general problems than the L 2 mnmzaton addressed n ths paper. he fact that statstcally lnearzed recursve least-squares s ndeed a specal form of sgma-pont Kalman flterng tends to ustfy ths approach. Interestng perspectves can be but are not lmted to the applcaton of ths general statstcal lnearzaton to L mnmzaton e.g., [3], L regularzaton e.g., [4] or fxed-pont approxmaton e.g., [5, 6]. 6. REFERENCES [] Smon J. Juler and Jeffrey K. Uhlmann, A new extenson of the Kalman flter to nonlnear systems, n Int. Symp. Aerospace/Defense Sensng, Smul. and Controls 3, 997. [2] Dan Smon, Optmal State Estmaton: Kalman, H Infnty, and Nonlnear Approaches, Wley & Sons, August [3] S. J. Juler and J. K. Uhlmann, Unscented flterng and nonlnear estmaton, Proceedngs of the IEEE, vol. 92, no. 3, pp , [4] Rudolph van der Merwe, Sgma-Pont Kalman Flters for Probablstc Inference n Dynamc State- Space Models, Ph.D. thess, OGI School of Scence & Engneerng, Oregon Health & Scence Unversty, Portland, OR, USA, [8] Mattheu Gest, Olver Petqun, and Gabrel Frcout, Kalman emporal Dfferences: the determnstc case, n IEEE Internatonal Symposum on Adaptve Dynamc Programmng and Renforcement Learnng ADPRL 2009, Nashvlle, N, USA, Aprl [9] ne Lefebvre, Herman Bruynnckx, and Jors De Shutter, Comments on A New Method for the Nonlnear ransformaton of Means and Covarances n Flters and Estmators, IEEE ransactons on Automatc Control, vol. 47, no. 8, pp , [0]. W. Anderson, An Introducton to Multvarate Statstcal Analyss, Wley, 984. [] Chrstopher M. Bshop, Neural Networks for Pattern Recognton, Oxford Unversty Press, New York, USA, 995. [2] Smon J. Juler, he scaled unscented transformaton, n Amercan Control Conference, 2002, vol. 6, pp [3] G.O. Wesolowsky, New Descent Algorthm for the Least Absolute Value Regresson Problem, Communcatons n Statstcs - Smulaton and Computaton,, no. 5, pp , 98. [4] R. bshran, Regresson Shrnkage and Selecton va the LASSO, Journal of the Royal Statstcal Socety. Seres B Methodologcal, pp , 996. [5] Steven J. Bradtke and Andrew G. Barto, Lnear Least-Squares algorthms for temporal dfference learnng, Machne Learnng, vol. 22, no. -3, pp , 996. [6] A. Nedć and D. P. Bertsekas, Least Squares Polcy Evaluaton Algorthms wth Lnear Functon Approxmaton, Dscrete Event Dynamc Systems: heory and Applcatons, vol. 3, pp. 79 0, [5] M. Noorgaard, N. Poulsen, and O. Ravn, New Developments n State Estmaton for Nonlnear Systems, Automatca, [6] K. Ito and K. Xong, Gaussan Flters for Nonlnear Flterng Problems, IEEE ransactons on Automatc Control, vol. 25, no. 5, pp , [7] Smon Haykn, Kalman Flterng and Neural Networks, Wley, 200.

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set