STATISTICALLY LINEARIZED RECURSIVE LEAST SQUARES. Matthieu Geist and Olivier Pietquin. IMS Research Group Supélec, Metz, France

Similar documents
Generalized Linear Methods

ONLINE BAYESIAN KERNEL REGRESSION FROM NONLINEAR MAPPING OF OBSERVATIONS

1 Convex Optimization

Linear Approximation with Regularization and Moving Least Squares

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Lecture Notes on Linear Regression

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Maximum Likelihood Estimation (MLE)

Gaussian process classification: a message-passing viewpoint

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Feature Selection: Part 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

ANOMALIES OF THE MAGNITUDE OF THE BIAS OF THE MAXIMUM LIKELIHOOD ESTIMATOR OF THE REGRESSION SLOPE

Linear Feature Engineering 11

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Time-Varying Systems and Computations Lecture 6

Appendix B: Resampling Algorithms

Joseph Formulation of Unscented and Quadrature Filters. with Application to Consider States

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

EEE 241: Linear Systems

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Hidden Markov Models

Global Sensitivity. Tuesday 20 th February, 2018

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Lecture 10 Support Vector Machines II

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

An adaptive SMC scheme for ABC. Bayesian Computation (ABC)

Quantifying Uncertainty

Classification as a Regression Problem

The Expectation-Maximization Algorithm

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

1 Motivation and Introduction

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Parametric fractional imputation for missing data analysis

10-701/ Machine Learning, Fall 2005 Homework 3

Parameter Estimation for Dynamic System using Unscented Kalman filter

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Chapter 9: Statistical Inference and the Relationship between Two Variables

The exam is closed book, closed notes except your one-page cheat sheet.

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Tracking with Kalman Filter

ECE559VV Project Report

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Maximum Likelihood Estimation

Chapter 11: Simple Linear Regression and Correlation

Week 5: Neural Networks

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

e i is a random error

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

EM and Structure Learning

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Laboratory 1c: Method of Least Squares

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Lecture 20: November 7

Feature Selection & Dynamic Tracking F&P Textbook New: Ch 11, Old: Ch 17 Guido Gerig CS 6320, Spring 2013

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

4DVAR, according to the name, is a four-dimensional variational method.

Composite Hypotheses testing

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

THE Kalman filter (KF) rooted in the state-space formulation

Lecture 3 Stat102, Spring 2007

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Estimation: Part 2. Chapter GREG estimation

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Kernel Methods and SVMs Extension

Homework Assignment 3 Due in class, Thursday October 15

STAT 3008 Applied Regression Analysis

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Support Vector Machines

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Semi-supervised Classification with Active Query Selection

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

The Study of Teaching-learning-based Optimization Algorithm

Assortment Optimization under MNL

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Relevance Vector Machines Explained

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Communication with AWGN Interference

Economics 130. Lecture 4 Simple Linear Regression Continued

10) Activity analysis

Support Vector Machines

Transcription:

SAISICALLY LINEARIZED RECURSIVE LEAS SQUARES Mattheu Gest and Olver Petqun IMS Research Group Supélec, Metz, France ABSRAC hs artcle proposes a new nterpretaton of the sgmapont kalman flter SPKF for parameter estmaton as beng a statstcally lnearzed recursve least-squares algorthm. hs gves new nsght on the SPKF for parameter estmaton and partcularly ths provdes an alternatve proof for a result of Van der Merwe. On the other hand, t legtmates the use of statstcal lnearzaton and suggests many ways to use t for parameter estmaton, not necessarly n a least-squares sens. Index erms Recursve least-squares, statstcal lnearzaton, parameter estmaton.. INRODUCION he Unscented Kalman Flter UKF [] has recently been ntroduced as an effcent dervatve-free alternatve to the Extended Kalman Flter EKF [2] for the nonlnear flterng problem. he basc dea behnd UKF s that t s easer to approxmate an arbtrary random varable rather than an arbtrary nonlnear functon. It uses an approxmaton scheme, the so-called unscented transform U [3], to approxmate the statstcs of nterest nvolved by Kalman equatons the flter beng seen as the optmal lnear state estmator mnmzng the expected mean-square error condtoned on past observatons. More generally, a Kalman flter for whch statstcs of nterest are computed by approxmatng the random varable rather than the nonlnear functon s called a Sgma-Pont Kalman Flter SPKF [4]. UKF, but also Dvded Dfference Flter DVF [5] or Central Dfference Flter CDF [6] for example, belong to the SPKF famly. A specal form of SPKF, whch s the case of nterest of ths paper, s SPKF for parameter estmaton [4]. In ths settng, the am s to estmate a set of statonary parameters nstead of trackng a hdden state. It s a smpler case than the general SPKF, because the evoluton equaton of the correspondng state-space model s at most a random walk. However, t s a case of nterest, notably for the Machne Learnng communty, as t s an effcent dervatve-free learnng method provdng an uncertanty nformaton whch can be useful e.g., for actve learnng or the exploraton/explotaton dlemma n renforcement learnng. SPKF for parameter estmaton has been used successfully for supervsed learnng [7] and even for renforcement learnng [8]. In ths artcle, t s shown how the SPKF for parameter estmaton can be obtaned from a least-squares perspectve usng statstcal lnearzaton. hs gves new nsghts by lnkng recursve least-squares to SPKF and suggests that statstcal lnearzaton can provde useful for other optmzaton problems than pure L 2 mnmzaton. An nterpretaton of general UKF as performng a statstcal lnearzaton has been proposed before [9]: the state-space formulaton of the flterng problem s statstcally lnearzed, whch s qute dfferent from the proposed least-squares-based approach. Sec. 2 ntroduce some necessary prelmnares about statstcal lnearzaton and recursve least-squares. Sec. 3 provdes the dervaton of the proposed statstcally lnearzed recursve least-squares approach. Sec. 4 shows that the proposed method actually allows to obtan any form of SPKF for parameter estmaton. It s also shown that the proposed approach can be seen as an alternatve proof for a result of [4], statng that SPKF for parameter estmaton s a maxmum a posteror MAP estmator. Sec. 5 proposes some perspectves of ths work. 2. PRELIMINARIES In ths secton statstcal lnearzaton and lnear recursve least-squares are brefly remnded. For ease of notatons, a scalar output s assumed through ths artcle, however the presented results extend easly to the vectoral case. 2.. Statstcal lnearzaton Let g : x R n y = gx R be a nonlnear functon. Assume that t s evaluated n r ponts x, y =

gx. he followng statstcs are defned : x = r P xx = r P xy = r P yy = r x, ȳ = r y x xx x 2 x xy ȳ 3 y ȳ 2 4 Statstcal lnearzaton conssts n lnearzng y = gx around x by adoptng a statstcal pont of vew. It fnds a lnear model y = Ax + b by mnmzng the sum of squared errors between values of nonlnear an lnearzed functons n the regresson ponts: mn A,b e e wth e = y Ax + b 5 he soluton of Eq. 5 s gven by [0]: A = P yx P xx b = ȳ A x 6 Moreover, t s easy to check that the covarance matrx of the error s gven by: P ee = r e e 7 = P yy AP xx A 8 For now, how to choose regresson ponts has not been dscussed. hs s left for Sec. 3. 2.2. Lnear recursve least-squares Assume the followng lnear observaton model: y = x + v 9 where x R n, y R and v s a whte observaton nose of varance P vv. Least-squares approach seeks at estmatng the n parameter vector by mnmzng the squared error among observed samples x, y,..., x, y : LS = argmn J, J = P vv y x 2 0 he least-squares soluton s classcally obtaned by zerong the gradent of the cost functon J, whch gves the least-squares LS estmate: LS = x x P vv P vv x y he parameter vector can be estmated onlne by beng updated for each new observaton. For ths, the matrx P = P vv x x s computed recursvely by usng the Sherman-Morrson formula: P = P P x x P P vv + x P x 2 By nectng Eq. 2 nto Eq. and by assumng that some prors 0 and P 0 are chosen, the recursve leastsquares RLS algorthm s obtaned: K = RLS P x P vv + x P 3 x = RLS + K y RLS x 4 P = P K Pvv + x P x K 5 hs RLS formulaton provdes useful for the statstcal lnearzaton of nonlnear least-squares presented n the next secton. Notce that ths estmator does not mnmze J, but a regularzed verson of ths cost functon: RLS = argmn J + 0 P 0 0 6 From a Bayesan pont of vew, the LS estmate can be seen as a maxmum lkelhood ML estmate whereas the RLS estmate can be seen as a maxmum a posteror MAP estmate. From now, ths dfference s no longer specfed, as t s clear from the context batch or recursve estmaton. 3. SAISICALLY LINEARIZED RECURSIVE LEAS-SQUARES Assume the followng nonlnear observaton model: y = f x + v 7 where x R n, y R, v s a whte observaton nose of varance P vv and f s a parametrc functon approxmator of nterest for example an artfcal neural network for whch specfes synaptc weghts []. For a set of observed samples, the least-squares soluton s gven by: = argmn P vv y f x 2 8 o address ths nonlnear least-squares problem, the nonlnear observaton model s statstcally lnearzed see Sec. 2.: y = A + b + e + v 9

At ths pont, t should be noted that a set of ponts has to be sampled so as to perform statstcal lnearzaton, that s to compute A, b and e. For now, ths s left as an open queston, ths problem beng addressed later. Let u = e +v be the nose assocated to observaton model 9. Noses v and e beng ndependent, the varance of u s gven by P uu = P vv +P ee. Observaton models 7 and 9 beng equvalent, the least-squares soluton can be rewrtten as: = argmn = P uu y A + b 20 A A P uu P uu A y b 2 Usng the Sherman-Morrson formula, a recursve formulaton of ths estmaton can be obtaned see Sec. 2.2: P A K = P uu + A P A 22 = + K y b A 23 P = P K Puu + A P A K 24 he problem of choosng a specfc statstcal lnearzaton s now addressed. Wth the recursve formulaton, and P are known, and the ssue s to compute A and b. A frst thng s to choose around what pont to lnearze and wth whch magntude. Recall that the prevous estmate s known. Moreover, the matrx P can be nterpreted as the varance matrx assocated to. It s thus legtmate to sample r ponts P, y = f x such that = and = P. he followng statstcs are thus avalable how to sample these r ponts s dscussed n Sec. 4: P P y P yy = = r = P = r = r = r y, ȳ = r y 25 26 y ȳ 27 ȳ 2 28 he soluton to the statstcal lnearzaton problem s thus see Sec. 2.: A = P y P = P y P 29 b = ȳ A = ȳ A 30 he nose varance nduced by the statstcal lnearzaton s gven by see agan Sec. 2.: P ee = P yy A P A 3 Inectng Eq. 29 and 3 nto Eq. 22 gves recall also that P uu = P vv + P ee : P A K = P uu + A P A 32 P Py P = P vv + P yy A P A + A P A 33 P y = P vv + P yy 34 Inectng Eq. 29-30 nto Eq. 23 gves: = + K y b A 35 = + K y ȳ A A 36 = + K y ȳ 37 Inectng Eq. 3 nto Eq. 24 gves recall agan that P uu = P vv + P ee : P = P K Puu + A P A K 38 = P K P vv + P yy K 39 Eq. 34, 37 and 39 defne the statstcally lnearzed recursve least squares SL-RLS algorthm: K = P y P vv + P yy 40 = + K y ȳ 4 P = P K P vv + P yy K 42 he last queston to answer s how to sample the r ponts such that = and P = P. 4. LINKS O SPKF FOR PARAMEER ESIMAION A frst natural dea to sample these r ponts s to assume a Gaussan dstrbuton of mean and of varance matrx P and to compute statstcs of nterest usng a Monte Carlo approach. However, more effcent methods exst, notably the unscented transform [3]. It conssts n determnstcally samplng a set of 2n + so-called sgma-ponts as follows: = = 0 43 = + n + κp n 44 = n + κp + 2n 45

as well as assocated weghts: w 0 = κ n + κ and w = > 0 46 2n + κ where κ s a scalng factor whch controls the accuracy of n the unscented transform [3] and + κp s the th column of the Cholesky decomposton of the matrx n + κp. he mage of each of these sgma-ponts s computed: y = f x, 47 and statstcs of nterest are computed as follows: ȳ = P y = P yy = =0 =0 =0 w y 48 w y ȳ 49 w y ȳ 2 50 As a non-equweghted sum can be rewrtten as an equweghted sum by consderng some of the terms more than one tme by assumng that weghts are ratonal numbers, whch s not a too strong hypothess, the unscented transform can be nterpreted as a form of statstcal lnearzaton. If the unscented transform s consdered as the statstcal lnearzaton process, than the SL-RLS algorthm, that s Eq. 40-42, s exactly the UKF when no evoluton model s consdered n the state-space model. In other words, SL-RLS s the UKF for parameter estmaton. In a smlar way, f the scaled unscented transform [2] s used to perform statstcal lnearzaton, SL-RLS s the scaled UKF for parameter estmaton. If the statstcal lnearzaton s performed usng a Sterlng s nterpolaton, SL-RLS s DDF or CDF for parameter estmaton. So, generally speakng, dependng on the scheme chosen to perform statstcal lnearzaton, SL-RLS s SPKF for parameter estmaton. hs nterpretaton of the SPKF for parameter estmaton as beng a statstcally lnearzed recursve leastsquares algorthm allows to provde an alternatve, smpler and requrng less assumptons proof of a result of Van der Merwe. hs result [4, Ch. 4.5.] states that the SPKF for parameter estmaton algorthm s equvalent to a MAP estmate of the underlyng parameters under a Gaussan posteror and nose dstrbuton assumpton. heorem SL-RLS estmate s a MAP estmate. Assume that pror and nose dstrbutons are Gaussan. hen the statcally lnearzed recursve leastsquares estmate s equvalent to the maxmum a posteror estmate. Proof. By constructon, wth prors defned by 0 and P 0, the SL-RLS estmate mnmzes the followng regularzed cost functon: = argmn y A + b 2 P =0 vv + P ee + 0 P 0 0 = argmn y f x 2 P =0 vv + 0 P 0 0 5 52 On the other hand, the MAP estmator s defned as and usng the Bayes rule: MAP py : p = argmax p y : = argmax py : 53 As the observaton nose s whte, the ont lkelhood s the product of local lkelhoods, and the probablty py : does not depend on, so: MAP = argmax p pr 54 Pror and nose dstrbutons are assumed to be Gaussan, thus: p exp 2 0 P 0 0 55 py exp y f x 2 56 2P vv Fnally, maxmzng a product of probablty dstrbutons s equvalent to mnmzng the sum of the negatves of ther logarthms, whch gves the result: MAP = argmn y f x 2 P =0 vv + 0 P 0 0 = 57 hs alternatve proof s shorter than the orgnal one. But above all t does not assume that the posteror dstrbuton s Gaussan, whch s a very strong assumpton for a nonlnear observaton model. 5. PERSPECIVES In ths artcle a statstcally lnearzed recursve leastsquares algorthm has been ntroduced. It has been

shown to be actually the SPKF for parameter estmaton algorthm. hs gves new nsghts to sgma-pont Kalman flters by showng that they are generalzatons of a statstcally lnearzed least-squares approach. hs new pont of vew allowed to provde an alternatve proof of a result statng that wthout evoluton model, SPKF estmate s the maxmum a posteror estmate. he proof proposed n Sec. 4 s shorter and above all t does not assume a Gaussan posteror, whch s a very strong hypothess n the case of a nonlnear evoluton model. he technque of statstcal lnearzaton can be appled n much more general problems than the L 2 mnmzaton addressed n ths paper. he fact that statstcally lnearzed recursve least-squares s ndeed a specal form of sgma-pont Kalman flterng tends to ustfy ths approach. Interestng perspectves can be but are not lmted to the applcaton of ths general statstcal lnearzaton to L mnmzaton e.g., [3], L regularzaton e.g., [4] or fxed-pont approxmaton e.g., [5, 6]. 6. REFERENCES [] Smon J. Juler and Jeffrey K. Uhlmann, A new extenson of the Kalman flter to nonlnear systems, n Int. Symp. Aerospace/Defense Sensng, Smul. and Controls 3, 997. [2] Dan Smon, Optmal State Estmaton: Kalman, H Infnty, and Nonlnear Approaches, Wley & Sons, August 2006. [3] S. J. Juler and J. K. Uhlmann, Unscented flterng and nonlnear estmaton, Proceedngs of the IEEE, vol. 92, no. 3, pp. 40 422, 2004. [4] Rudolph van der Merwe, Sgma-Pont Kalman Flters for Probablstc Inference n Dynamc State- Space Models, Ph.D. thess, OGI School of Scence & Engneerng, Oregon Health & Scence Unversty, Portland, OR, USA, 2004. [8] Mattheu Gest, Olver Petqun, and Gabrel Frcout, Kalman emporal Dfferences: the determnstc case, n IEEE Internatonal Symposum on Adaptve Dynamc Programmng and Renforcement Learnng ADPRL 2009, Nashvlle, N, USA, Aprl 2009. [9] ne Lefebvre, Herman Bruynnckx, and Jors De Shutter, Comments on A New Method for the Nonlnear ransformaton of Means and Covarances n Flters and Estmators, IEEE ransactons on Automatc Control, vol. 47, no. 8, pp. 406 409, 2002. [0]. W. Anderson, An Introducton to Multvarate Statstcal Analyss, Wley, 984. [] Chrstopher M. Bshop, Neural Networks for Pattern Recognton, Oxford Unversty Press, New York, USA, 995. [2] Smon J. Juler, he scaled unscented transformaton, n Amercan Control Conference, 2002, vol. 6, pp. 4555 4559. [3] G.O. Wesolowsky, New Descent Algorthm for the Least Absolute Value Regresson Problem, Communcatons n Statstcs - Smulaton and Computaton,, no. 5, pp. 479 49, 98. [4] R. bshran, Regresson Shrnkage and Selecton va the LASSO, Journal of the Royal Statstcal Socety. Seres B Methodologcal, pp. 267 288, 996. [5] Steven J. Bradtke and Andrew G. Barto, Lnear Least-Squares algorthms for temporal dfference learnng, Machne Learnng, vol. 22, no. -3, pp. 33 57, 996. [6] A. Nedć and D. P. Bertsekas, Least Squares Polcy Evaluaton Algorthms wth Lnear Functon Approxmaton, Dscrete Event Dynamc Systems: heory and Applcatons, vol. 3, pp. 79 0, 2003. [5] M. Noorgaard, N. Poulsen, and O. Ravn, New Developments n State Estmaton for Nonlnear Systems, Automatca, 2000. [6] K. Ito and K. Xong, Gaussan Flters for Nonlnear Flterng Problems, IEEE ransactons on Automatc Control, vol. 25, no. 5, pp. 90 927, 2000. [7] Smon Haykn, Kalman Flterng and Neural Networks, Wley, 200.