hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif

Similar documents
State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

OBJECTIVES OF TIME SERIES ANALYSIS

Notes on Kalman Filtering

Linear Gaussian State Space Models

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Vehicle Arrival Models : Headway

Smoothing. Backward smoother: At any give T, replace the observation yt by a combination of observations at & before T

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Sequential Importance Resampling (SIR) Particle Filter

20. Applications of the Genetic-Drift Model

References are appeared in the last slide. Last update: (1393/08/19)

Testing for a Single Factor Model in the Multivariate State Space Framework

A Bayesian Approach to Spectral Analysis

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

GMM - Generalized Method of Moments

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Math 10B: Mock Mid II. April 13, 2016

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

Lecture 9: September 25

EXERCISES FOR SECTION 1.5

Exponential Smoothing

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

Let us start with a two dimensional case. We consider a vector ( x,

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Chapter 5. Heterocedastic Models. Introduction to time series (2008) 1

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Linear Response Theory: The connection between QFT and experiments

Lecture 3: Exponential Smoothing

10. State Space Methods

Temporal probability models. Chapter 15, Sections 1 5 1

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

Testing the Random Walk Model. i.i.d. ( ) r

Modeling Economic Time Series with Stochastic Linear Difference Equations

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

NCSS Statistical Software. , contains a periodic (cyclic) component. A natural model of the periodic component would be

Lecture 4 Notes (Little s Theorem)

Institute for Mathematical Methods in Economics. University of Technology Vienna. Singapore, May Manfred Deistler

Estimation of Poses with Particle Filters

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

Unit Root Time Series. Univariate random walk

Maximum Likelihood Parameter Estimation in State-Space Models

Air Traffic Forecast Empirical Research Based on the MCMC Method

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

ECON 482 / WH Hong Time Series Data Analysis 1. The Nature of Time Series Data. Example of time series data (inflation and unemployment rates)

Math 333 Problem Set #2 Solution 14 February 2003

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

Solutions to Odd Number Exercises in Chapter 6

Types of Exponential Smoothing Methods. Simple Exponential Smoothing. Simple Exponential Smoothing

Ensamble methods: Bagging and Boosting

Some Basic Information about M-S-D Systems

Licenciatura de ADE y Licenciatura conjunta Derecho y ADE. Hoja de ejercicios 2 PARTE A

Lecture Notes 2. The Hilbert Space Approach to Time Series

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

DEPARTMENT OF STATISTICS

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Homework 10 (Stats 620, Winter 2017) Due Tuesday April 18, in class Questions are derived from problems in Stochastic Processes by S. Ross.

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

3.1 More on model selection

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Comparing Means: t-tests for One Sample & Two Related Samples

Time series Decomposition method

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution

Math 334 Fall 2011 Homework 11 Solutions

) were both constant and we brought them from under the integral.

Financial Econometrics Kalman Filter: some applications to Finance University of Evry - Master 2

Probabilistic Robotics

Ordinary dierential equations

An EM algorithm for maximum likelihood estimation given corrupted observations. E. E. Holmes, National Marine Fisheries Service

Regression with Time Series Data

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Properties of Autocorrelated Processes Economics 30331

Object tracking: Using HMMs to estimate the geographical location of fish

Linear Surface Gravity Waves 3., Dispersion, Group Velocity, and Energy Propagation

Data Assimilation. Alan O Neill National Centre for Earth Observation & University of Reading

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

Chapter 2. First Order Scalar Equations

Introduction to Mobile Robotics

Augmented Reality II - Kalman Filters - Gudrun Klinker May 25, 2004

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS

KINEMATICS IN ONE DIMENSION

Transcription:

Chaper Kalman Filers. Inroducion We describe Bayesian Learning for sequenial esimaion of parameers (eg. means, AR coeciens). The updae procedures are known as Kalman Filers. We show how Dynamic Linear Models, Recursive Leas Squares and Seepes Descen algorihms are all special cases of he Kalman ler... Sequenial Esimaion of Nonsaionary Mean In he lecure on Bayesian mehods we described he sequenial esimaion of a saionary mean. We now exend ha analysis o he nonsaionary case. A reasonable model of a ime varying mean is ha i can drif from sample o sample. If he drif is random (laer on we will also consider deerminisic drifs) hen we have = + w (.) where he random drif is Gaussian p(w ) = N(w ; ; w) wih drif variance w. The daa poins are hen Gaussian abou mean. If hey have a xed variance x (laer on we will also consider ime-varing variance) where e = x. Hence p(e ) = N(e ; ; x). x = + e (.) A ime our esimae of has a Gaussian disribuion wih mean ^ and variance ^. We sress ha his is he variance of our mean esimae and no he variance of he daa. The sandard error esimae for his variance ( =) is no longer valid as we have nonsaionary daa. We herefore have o esimae i as we go along. This means we keep running esimaes of he disribuion of he mean. A ime his disribuion has a mean ^ and a variance ^. The disribuion a ime is 7

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif variance o he variance from he previous ime sep) r = ^ + (.4) w and he likelihood is p(x j ) = N(x ; ^ ; x) (.5) The poserior is hen given by p( jx ) = N( ; ^ ; ^ ) (.6) where he mean is and he variance is ^ = ^ + r x + r (x ^ ) (.7) ^ = r x r + x (.8) We now wrie he above equaions in a slighly dieren form o allow for comparison wih laer esimaion procedures where and ^ = ^ + K e (.9) ^ = r ( K ) K = r x + r (.) e = x ^ (.) In he nex secion we will see ha our updae equaions are a special case of a Kalman ler where e is he predicion error and K is he Kalman gain. In gure. we give a numerical example where daa poins were generaed; he rs having a mean of 4 and he nex a mean of. The updae equaions have wo paramaers which we mus se (i) he daa variance and (ii) he drif x variance w. Togeher, hese parameers deermine (a) how responsive he racking will be and (b) how sable i will be. The wo plos are for wo dieren values of w and =. Laer we will see how hese wo parameers can be learn. x.. A single sae variable We now look a a general mehodology for he sequenial esimaion of a nonsaionary parameer (his can be anyhing - no necesarily he daa mean).

4 8 6 4 4 8 6 4 (a) 5 5 (b) 5 5 Figure.: Sequenial esimaion of nonsaionary mean. The graphs plo daa values x (crosses) and esimaed mean values ^ (circles) along wih error bars ^ (verical lines) versus ieraion number for wo dieren drif noise values (a) = : and (b) = :. w w The parameer's evoluion is modelled as a linear dynamical sysem. The sae-space equaions are = g + w ; w N(w ; ; w) (.) x = f + e ; e N(e ; ; x) The value of he parameer a ime is referred o as he sae of he sysem. This sae can change deerminisically, by being muliplied by g, and sochasically by added a random drif w. This drif is referred o as sae noise. The observed daa (eg. ime series values) are referred o as observaions x which are generaed from he sae according o he second equaion. This allows for a linear ransformaion plus he addiion of observaion noise. A ime our esimae of has a Gaussian disribuion wih mean ^ and variance ^. The prior disribuion is herefore given by where r is he prior variance and he likelihood is The poserior is hen given by where p( ) = N( ; g ^ ; r ) (.3) r = g ^ + w (.4) p(x j ) = N(x ; f ^ ; x) (.5) p( jx ) = N( ; ^ ; ^ ) (.6) ^ = g ^ + K e (.7) ^ = r ( K f ) and K = r x + f r f (.8)

The above equaions consiue a -dimensional Kalman Filer (he sae is -dimensional because here is only sae variable). Nex we consider many sae variables...3 Muliple sae variables We now consider linear dynamical sysems where daa is generaed according o he model = G + w ; w N(w ; ; W ) y = F + v ; v N(v ; ; V ) (.9) where are `sae' or `laen' variables, G is a `ow' marix, w is `sae noise' disribued according o a normal disribuion wih zero mean and covariance marix W, y are he mulivariae observaions, F is a ransformaion marix and v is `observaion noise' disribued according o a normal disribuion wih zero mean and covariance marix V. The model is parameerised by he marices G, W, F and V. These parameers may depend on (as indicaed by he subscrip). The Kalman ler is a recursive procedure for esimaing he laen variables, [9]. Meinhold and Singpurwalla [4] show how his esimaion procedure is derived (also see lecure on Bayesian mehods). The laen variables are normally disribued wih a mean and covariance ha can be esimaed wih he following recursive formulae ^ = G ^ + K e (.) = R K F R where K is he `Kalman gain' marix, e is he predicion error and R is he `prior covariance' of he laen variables (ha is, prior o y being observed). These quaniies are calculaed as follows K = R F T V + F R F T (.) e = y F G ^ R = G G T + W To apply hese equaions you need o know he parameers G, W,F and V and make iniial guesses for he sae mean and covariance; ^ and. Equaions (3) and () can hen be applied o esimae he sae mean and covariance a he nex ime sep. The equaions are hen applied recursively.

A useful quaniy is he likelihood of an observaion given he model parameers before hey are updaed p(y ) = N y ; F ^ ; V + F G T G F T (.) In Bayesian erminology his likelihood is known as he evidence for he daa poin [4]. Daa poins wih low evidence correspond o periods when he saisics of he underlying sysem are changing (non-saionariy) or, less consisenly, o daa poins having large observaion noise componens. The sae-space equaions may be viewed as a dynamic version of facor analysis where he facor,, evolves over ime according o linear dynamics. Shumway and Soer [56] derive an Expecaion-Maximisaion (EM) algorihm (see nex lecure) in which he parameers of he model G, W and V can all be learn. Only F is assumed known. Noe ha hese parameers are no longer dependen on. This does no, however, mean ha he model is no longer dynamic; he sae,, is sill ime dependen. Ghahramani and Hinon [] have recenly exended he algorihm o allow F o be learn as well. These learning algorihms are bach learning algorihms raher han recursive updae procedures. They are herefore no suiable for `on-line' learning (where he learning algorihm has only one `look' a each observaion). In he engineering and saisical forecasing lieraure [44] [] he ransformaion marix, F, is known. I is relaed o he observed ime series (or oher observed ime series) according o a known deerminisic funcion se by he saisician or `model builder'. Assumpions are hen made abou he ow marix, G. Assumpions are also made abou he sae noise covariance, W, and he observaion noise covariance, V, or hey are esimaed on-line. We now look a a se of assumpions which reduces he Kalman ler o a `Dynamic Linear Model'...4 Dynamic Linear Models In his secion we consider Dynamic Linear Models (DLMs) [] which for a univariae ime series are = + w ; w N(w ; ; W ) y = F + v ; v N(v ; ; ) (.3) This is a linear regression model wih ime-varying coeciens. I is idenical o he generic Kalman ler model wih G = I. Subsiuing his ino he updae equaions gives ^ = ^ + K e (.4) = R K F R

where K = R F T ^y (.5) R = + W ^y = + = F R F T e = y ^y ^y = F ^ (.6) where ^y is he predicion and ^y is he esimaed predicion variance. This is composed of wo erms; he observaion noise,, and he componen of predicion variance due o sae uncerainy,. The likelihood of a daa poin under he old model (or evidence) is p(y ) = N y ; ^y ; ^y (.7) If we make he furher assumpion ha he ransformaion vecor (is no longer a marix because we have univariae predicions) is equal o F = [y ; y ; :::; y p] hen we have a Dynamic Auoregressive (DAR) model. To apply he model we make iniial guesses for he sae (AR parameers) mean and covariance (^ and ) and use he above equaions. We mus also plug in guesses for he sae noise covariance, W, and he observaion noise variance,. In a laer secion we show how hese can be esimaed on-line. I is also ofen assumed ha he sae noise covariance marix is he isoropic marix, W = qi. Nex, we look a a se of assumpions ha reduce he Kalman ler o Recursive Leas Squares...5 Recursive leas squares If here is no sae noise (w =,W = ) and no sae ow (G = I) hen he linear dynamical sysem in equaion () reduces o a saic linear sysem ( = ). If we furher assume ha our observaions are univariae we can re-wrie he sae-space equaions as y = F + v ; v N(v ; ; ) (.8) This is a regression model wih consan coeciens. We can, however, esimae hese coeciens in a recursive manner by subsiuing our assumpions abou W, G and ino he Kalman ler updae equaions. This gives V

^ = ^ + K e (.9) = K F (.3) where K = F T ^y (.3) ^y = + = F F T e = y ^y ^y = F ^ (.3) where ^y is he predicion and ^y is he esimaed predicion variance. This is composed of wo erms; he observaion noise,, and he componen of predicion variance due o sae uncerainy,. The above equaions are idenical o he updae equaions for recursive leas squares (RLS) as dened by Abraham and Ledoler (equaion (8.6) in []). The likelihood of a daa poin under he old model (or evidence) is p(y ) = N y ; ^y ; ^y (.33) If we make he furher assumpion ha he ransformaion vecor (is no longer a marix because we have univariae predicions) is equal o F = [y ; y ; :::; y p] hen we have a recursive leas squares esimaion procedure for an auoregressive (AR) model. To apply he model we make iniial guesses for he sae (AR parameers) mean and covariance (^ and ) and use he above equaions. We mus also plug in our guess for he observaion noise variance,. In a laer secion we show how his can be esimaed on-line...6 Esimaion of noise parameers To use he DLM updae equaions i is necessary o make guesses for he sae noise covariance, W, and he observaion noise variance,. In his secion we show how hese can be esimaed on-line. Noe, we eiher esimae he sae noise or he observaion noise - no boh.

Jazwinski's mehod for esimaing sae noise This mehod, reviewed in [4] is ulimaely due o Jazwinski [8] who derives he following equaions using he MLII approach (see Bayes lecure). We assume ha he sae noise covariance marix is he isoropic marix, W = qi. The parameer q can be updaed according o q = h e q F F T! (.34) where h(x) is he `ramp' funcion h(x) = ( x if x oherwise (.35) and q is he esimaed predicion variance assuming ha q = q = + F F T (.36) Thus, if our esimae of predicion error assuming no sae noise is smaller han our observed error (e ) we should infer ha he sae noise is non-zero. This will happen when we ransi from one saionary regime o anoher; our esimae of q will increase. This, in urn, will increase he learning rae (see laer secion). A smoohed esimae is q = q + ( )h e q F F T! (.37) where is a smoohing parameer. Alernaively, equaion.34 can be applied o a window of samples [4]. Jazwinski's mehod for esimaing observaion noise This mehod, reviewed in [4] is ulimaely due o Jazwinski [8] who derives he following equaions by applying he MLII framework (see Bayes lecure). Equaion.6 shows ha he esimaed predicion variance is composed of wo componens; he observaion noise and he componen due o sae uncerainy. Thus, o esimae he observaion noise one needs o subrac he second componen from he measured squared error = h e F R F T (.38)

This esimae can be derived by seing so as o maximise he evidence (likelihood) of a new daa poin (equaion.7). A smoohed esimae is = + ( )h e F R F T (.39) where is a smoohing parameer. Alernaively, equaion.38 can be applied o a window of samples [4]. For RLS hese updae equaions can be used by subsiuing R =. We sress, however, ha his esimae is especially unsuiable for RLS applied o non-saionariy daa (bu hen you should only use RLS for saionary daa, anyway). This is because he learning rae becomes dramaically decreased. We also sress ha Jazwinski's mehods canno boh be applied a he same ime; he 'exra' predicion error is explained eiher as greaer observaion noise or as greaer sae noise. Skagens' mehod Skagen [57] les W = I ie. assumes he sae noise covariance is isoropic wih a variance ha is proporional o he observaion noise. He observes ha if is kep xed hen varying over six orders of magniude has lile or no eec on he Kalman ler updaes. He herefore ses o an arbirary value eg.. He hen denes a measure R as he relaive reducion in predicion error due o adapion and chooses o give a value of R = :5...7 Comparison wih seepes descen For a linear predicor, he learning rule for `on-line' seepes descen is [3] ^ = ^ + F T e (.4) where is he learning rae, which is xed and chosen arbirarily beforehand. This mehod is oherwise known as Leas Mean Squares (LMS). Haykin [7] (page 36) discusses he condiions on which lead o a convergen learning process. Comparison of he above rule wih he DLM learning rule in equaion.5 shows ha DLM has a learning rae marix equal o = + q I + (.4) The average learning rae, averaged over all sae variables, is given by

DLM = T r ( + q I) p ( + ) (.4) where T r() denoes he race of he covariance marix and p is he number of sae variables. DLM hus uses a learning rae which is direcly proporional o he variance of he sae variables and is inversely proporional o he esimaed predicion variance. If he predicion variance due o sae uncerainy is signicanly smaller han he predicion variance due o sae noise ( ), as i will be once he ler has reached a seady soluion, hen increasing he sae noise parameer, q, will increase he learning rae. This is he mechanism by which DLM increases is learning rae when a new dynamic regime is encounered. The average learning rae for he RLS ler is RLS = T r ( ) p ( + ) (.43) As here is no sae noise (q = ) here is no mechanism by which he learning rae can be increased when a new dynamic regime is encounered. This underlines he fac ha RLS is a saionary model. In fac, RLS behaves paricularly poorly when given non-saionary daa. When a new dynamic regime is encounered, will increase (and so may if we're updaing i online). This leads no o he desired increase in learning rae, bu o a decrease. For saionary daa, however, he RLS model behaves well. As he model encouners more daa he parameer covariance marix decreases which in urn leads o a decrease in learning rae. In on-line gradien descen learning i is desirable o sar wih a high learning rae (o achieve faser convergence) bu end wih a low learning rae (o preven oscillaion). RLS exhibis he desirable propery of adaping is learning rae in exacly his manner. DLM also exhibis his propery when given saionary daa, bu when given non-saionary daa, has he added propery of being able o increasing is learning rae when necessary. We conclude his secion by noing ha DLM and RLS may be viewed as linear online gradien descen esimaors wih variable learning raes; RLS for saionary daa and DLM for non-saionary daa...8 Oher algorihms The Leas Mean Squares (LMS) algorihm [7] (Chaper 9) is idenical o he seepesdescen mehod (as described in his paper) - boh mehods have consan learning raes.

Our commens on he RLS algorihm are relevan o RLS as dened by Abraham and Ledoler []. There are, however, a number of varians of RLS. Haykin [7] (page 564) denes an exponenially weighed RLS algorihm, where pas samples are given exponenially less aenion han more recen samples. This gives rise o a limied racking abiliy (see chaper 6 in [7]). The racking abiliy can be furher improved by adding sae noise (Exended RLS- [7], page 76) or a nonconsan sae ransiion marix (Exended RLS- [7], page 77). The Exended RLS- algorihm is herefore similar o he DAR model described in his paper...9 An example This example demonsraes he basic funcioning of he dynamic AR model and compares i o RLS. A ime series was generaed consising of a Hz sine wave in he rs second, a Hz sinewave in he second second and a 3Hz sine wave in he hird second. All signals conained addiive Gaussian noise wih sandard deviaion.. One hundred samples were generaed per second. A DAR model wih p = 8 AR coeciens was rained on he daa. The algorihm was given a xed value of observaion noise ( = :). The sae noise was iniially se o zero and was adaped using Jazwinski's algorihm described in equaion.34, using a smoohing value of = :. The model was iniialised using linear regression; he rs p daa poins were regressed ono he p + h daa poin using an SVD implemenaion of leas squares, resuling in he linear regression weigh vecor w LR. The sae a ime sep = p + was iniialised o his weigh vecor; p+ = w LR. The iniial sae covariance marix was se o he linear regression covariance marix, p+ = F p+f T p+. Model parameers before ime p + were se o zero. An RLS model (wih p = 8 AR coeciens) was also rained on he daa. The algorihm was given a xed value of observaion noise ( = :). The model was iniilised by seing p+ = w LR and p+ = I (seing p+ = F p+f T p+ resuled in an iniial learning rae ha was'n sucienly large for he model o adap o he daa - see laer). Figure. shows he original ime series and he evidence of each poin in he ime series under he DAR model. Daa poins occuring a he ransiions beween dieren dynamic regimes have low evidence. Figure.3 shows ha he sae noise parameer, q, increases by an amoun necessary for he esimaed predicion error o equal he acual predicion error. The sae noise is high a ransiions beween dieren dynamic regimes. Wihin each dynamic regime he sae noise is zero. Figure.4 shows ha he variance of sae variables reduces as he model is exposed o more daa from he same saionary regime. When a new saionary regime is encounered he sae variance increases (because q increases).

.5.5.5.5.5 (a).5.5.5 3.5 (b).5.5.5 3 Figure.: (a) Original ime series (b) Log evidence of daa poins under DAR model, log p(y ). Figure.5 shows ha he learning rae of he DAR model increases when he sysem eners a new saionary regime, whereas he learning rae of RLS acually decreases. The RLS learning rae is iniially higher because he sae covariance marix was iniialised dierenly (iniialising i in he same way gave much poorer RLS specral esimaes). Figure.6 shows he specral esimaes obained from he DAR and RLS models. The learning rae plos and specrogram plos show ha DAR is suiable for nonsaionary daa whereas RLS is no... Discussion Dynamic Linear Models, Recursive Leas Squares and Seepes-Descen Learning. are special cases of linear dynamical sysems and heir learning rules are special cases of he Kalman ler. Seepes-Descen Learning is suiable for modelling saionary daa. I uses a learning rae parameer which needs o be high a he beginning of learning (o ensure fas learning) bu low a he end of learning (o preven oscillaions). The learning rae parameer is usually hand-uned o fulll hese crieria. Recursive Leas Squares is also suiable for modelling saionary daa. I has he advanage of having an adapive learning rae ha reduces gradually as learning proceeds. I reduces in response o a reducion in he uncerainy (covariance) of he model parameers. Dynamic Linear Models are suiable for saionary and non-saionary enviromens. The models possess sae-noise and observaion noise parameers which can be updaed on-line so as o maximise he evidence of he observaions.

.5.5.5.5.5.5 (a).5.5.5 3.5.5.5 (c).5.5.5 3 (b).5.5.5 3.45.4.35.3.5..5..5 (d).5.5.5 3 Figure.3: (a) Squared predicion error, e, (b) Esimaed predicion error wih q =, q, (c) Esimaed predicion error, ^y (he baseline level is due o he xed observaion noise componen, = :) and (d) Esimae of sae noise variance, q. The sae noise, q, increases by an amoun necessary for he esimaed predicion error (plo c) o equal he acual prediciion error (plo a) - see equaion.34..45.4.35.3.5..5..5.5.5.5 3 Figure.4: Average prior variance of sae variables, T r(r p ). As he model is exposed o more daa from he same saionary regime he esimaes of he sae variables become more accurae (less variance). When a new saionary regime is encounered he sae variance increases (because q increases).

.4..8.6.4. (a).5.5.5 3 3.5.5.5 (b).5.5.5 3 Figure.5: Average learning raes for (a) DAR model (b) RLS model. The learning rae for RLS is se o a higher iniial value (indirecly by seing o have larger enries) o give i a beer chance of racking he daa. The DAR model responds o a new dynamic regime by increasing he learning rae. The RLS responds by decreasing he learning rae and is herefore unable o rack he nonsaionariy. Seconds 3.5 3.5.5.5 Seconds 3.5 3.5.5.5 (a) 3 4 Frequency (b) 3 4 Frequency Figure.6: Specrograms for (a) DAR model (b) RLS model.