Multi-Modal Inference in Time Series

Size: px

Start display at page:

Download "Multi-Modal Inference in Time Series"

Lillian Nash
6 years ago
Views:

1 Multi-Modal Iferece i Time Series No-Liear State Space Models Master-Thesis vo Saket Kamthe aus Pue,Idia Jauar 2014 Fachbereich Iformatik/ETiT

2 Multi-Modal Iferece i Time Series No-Liear State Space Models Vorgelegte Master-Thesis vo Saket Kamthe aus Pue,Idia 1. Gutachte: Dr.-Ig. Marc P Deiseroth 2. Gutachte: Prof. Dr. Ja Peters 3. Gutachte: Prof. Dr.-Ig. Heig Puder Tag der Eireichug: Bitte zitiere Sie dieses Dokumet als: URN: ur:b:de:tuda-tuprits URL: Dieses Dokumet wird bereitgestellt vo tuprits, E-Publishig-Service der TU Darmstadt tuprits@ulb.tu-darmstadt.de Die Veröffetlichug steht uter folgeder Creative Commos Lizez: Nameseug Keie kommerzielle Nutzug Keie Bearbeitug 2.0 Deutschlad

3 Erklärug zur Master-Thesis Hiermit versichere ich, die vorliegede Master-Thesis ohe Hilfe Dritter ur mit de agegebee Quelle ud Hilfsmittel agefertigt zu habe. Alle Stelle, die aus Quelle etomme wurde, sid als solche ketlich gemacht. Diese Arbeit hat i gleicher oder ählicher Form och keier Prüfugsbehörde vorgelege. Darmstadt, de Jauary 31, 2014 (Saket Kamthe) 1

4 Abstract Multi-modal desities appear frequetly i time series ad practical applicatios. However, they caot be represeted by commo state estimators, such as the Exteded Kalma Filter (EKF) ad the Usceted Kalma Filter (UKF), which additioally suffer from the fact that ucertaity is ofte ot captured sufficietly well, which ca result i icoheret ad diverget trackig performace. I this thesis, we address these issues by devisig a o-liear filterig algorithm where desities are represeted by Gaussia mixture models, whose parameters are estimated i closed form. The filtered results ca be further improved by a backward pass or smoothig. However, the optimal backward filter does ot offer a closed form solutio ad, hece, approximatios are eeded. We propose a ovel algorithm for smoothig i oliear dyamical systems with multi-modal beliefs. The resultig method exhibits a superior performace o typical bechmarks. i

5 Ackowledgemets First of all, I would thak my supervisors Dr.-Ig. Marc P Deiseroth ad Prof. Ja Peters for acceptig me as a exteral studet, ad Prof. Heig Puder for his support as a iteral supervisor. I have leared a lot from my iteractios with Marc, ad I am grateful for his cotiued support eve after movig to Lodo. I admire him for his hads-o approach to supervisio, especially for ecouragig some of my outladish ideas. I thak him for his suggestios o the approximate iferece algorithms ad his patiece to help me draft my first research paper. Both, the paper ad his suggestios costitute a substatial part of this thesis. I thak Prof. Ja Peters for his support durig my thesis with ot oly techical supervisio ad fiacial assistace but also guidace o host of other questios I keep directig towards him. I ackowledge the partial fiacial support for my thesis by Akademisches Ausladsamt at Techische Uiversität Darmstadt I thak Felix Schmitt, Domiik Notz, Marco Ewerto ad Roberto Caladra for makig my stay i the lab ejoyable ad for idulgig i discussios from abstract mathematics to utter osese. My floor-mates Aishwarya Sharma ad Divya Yea helped me to cope up with absolute lows i my persoal life ad their support allowed me to cocetrate o my thesis, ad, hece, I would like to thak them. This space would be too small to express my gratitude to my parets, as I caot thak them eough for their cotiued support throughout my studies without which I could ot have reached this stage i my career. ii

6 Cotets 1. Itroductio Scope of the Work ad Cotributios Scope Cotributios Bayesia Filter Filterig Smoothig Liear Gaussia Filter The Kalma Filter The Rauch-Tug-Striebel (RTS) Smoother Noliear Filters Exteded Kalma Filter Usceted Kalma Filter Particle Filters Gaussia Sum Filter Summary of State of the Art Multi-Modal Filter ad Smoother Multi-Modal Filter Estimatio of the Gaussia Mixture Parameters Propagatio of Ucertaity Filterig Mixture Reductio Multi-Modal Smoothig Gaussia Sum Smoother for No-liear systems Summary of the Multi-modal filter ad Smoother Numerical Results Uivariate No-liear Growth Model No-statioary Model Statioary Model Lorez System Discussio ad Coclusio Coclusio Future work A. Gaussia Coditioig 47 iii

7 1 Itroductio The Hawk-Eye (Owes et al., 2003) techology has revolutioised the way we watch teis. How do they geerate ball trajectory? A path traced by a dyamical system (ball) is called trajectory of that dyamical system. We express trajectory as a times series, that is our trace is over time, as we are iterested i predictig the trajectory (over time). The dyamical systems are ofte described by differetial equatio i time. I the Figure 1 Hawk-Eye techology geerates the trajectory of the ball. The proprietary system was origially developed for cricket i 2001 ad ow is icluded i Eglish Premier League as well. The Hawk-Eye uses a set of cameras that capture the motio of the ball ad based o the model of dyamical system we estimate the positio of ball i a teis court (Owes et al., 2003). Trackig a movig object based o observatios is a classic example of estimatio or filterig (Bar-Shalom et al., 2004). Filterig is a sigal processig term for estimatio of a true sigal from a oisy data (we filter oise). The filterig algorithms to estimate the true sigal based o oisy time series data have wide applicatios from biomedicie to rocket sciece. I fact, the first (liear) filter for trackig o-statioary systems was proposed by Kalma (Kalma, 1960) for Natioal Aeroautics ad Space Admiistratio (NASA). The Kalma filter, the Hawk-Eye system ad this work share a commo theme, they estimate the latet state of a system (ball) from sequetial oisy observatios. Filterig ad estimatio i a o-liear dyamical system has bee studied extesively ad hece, has resulted i uique termiology. We use the teis ball trackig as a example to establish termiology ad as a visual aid 1. We defie the ball ad the forces actig o it (wid, gravity etc.) as a system. We wish to track the ball, i.e. the positio of the ball Figure 1.1.: Ball trackig: The shaded trace represets estimated trajectory of the i space ad its velocity, which tell us the curret ball. The trajectory ca be represeted state of our system, as our state chages over time we defie state by state variables. Now we caot actually measure these state variables directly without disturbig the ogoig match, so we observe the latet state of ball. as a time series describig time evolutio of a dyamical system.(for defiitios of the terms see text ) For techical reasos, it is easier to measure distace ad the agle the ball makes with respect to observer, we call these measuremets as observed variables. Now, we aim to predict how the ball will move i future based o the measuremets of the ball by the observer. We may be able to approximately model the 1 The ball trackig is used oly as a visual aid to iterpret abstract termiology. The focus of thesis is ot trackig a teis ball, however, the algorithms proposed here ca be used for trackig. 1

8 teis court, e.g. hard court or clay court to predict the velocity of ball oce it hits the surface. However, the challege of accurately modellig the impact forces ad the approximatios i model itroduce ucertaity i our system ad we ca oly estimate the positio of the ball. How we accout for the ucertaity itroduced by the approximate model of our complex real life system is cetral theme of this thesis. The ukow oise makes the task eve more challegig, we model oise a Gaussia with kow variace. I this report, we restrict ourselves to the Gaussia oise model ad o-liear dyamical systems. See Sectio for details o scope of the work. A optimal algorithm for a liear dyamical system with Gaussia oise was proposed by Kalma (Kalma, 1960). I such systems, the Gaussiaity allows us to derive the recursive filterig equatios i closed-form. I cotrast, for a o-liear system Gaussia ucertaities may become o-gaussia due to the o-liear trasform. Hece, we require approximatios, such as liearisig the fuctios, e.g. i the Exteded Kalma Filter (EKF) (see 1.4.1), or determiistic samplig, e.g. i the Usceted Kalma Filter (UKF) (see 1.4.2) to approximate a o-gaussia desity by a Gaussia (Julier ad Uhlma, 2004). Such approximatios make the limitig implicit assumptio that the true desities are ui-modal. Filters based o these approximatios ofte severely uder-perform whe true desities are multi-modal. Hece, multi-modal approaches are frequetly eeded. For represetig multi-modal, o-gaussia desities particle filters are a stadard approach (see 1.4.3). They are computatioally demadig sice they ofte require a large umber of particles for good performace, e.g. due to the curse of dimesioality. A isufficiet umber of particles may fail to capture the tails of the desity ad lead to degeerate solutios. I practice, we have to compromise betwee the determiistic ad fast (UKF/EKF) or the computatioally demadig ad more accurate Mote Carlo methods (Doucet et al., 2001). A ideal filter for a o-liear system should allow for multi-modal approximatios, ad at the same time its approximatios should be cosistet to avoid degeerate solutios. I this report, we propose a filterig method (see 2.1.3) that approximates a o-gaussia desity by a Gaussia mixture model (GMM). A GMM allows modellig multi-modality as well as represetig ay desity with arbitrary accuracy give a sufficietly large umber of Gaussias, see (Aderso ad Moore, 2005), Sectio 8.4, for a proof. Ituitively we ca imagie a extreme case of ifiite Gaussias with variaces tedig to zero formig a trai of Dirac deltas, that i asymptotically ca approximate ay fuctio with arbitrary precisio. The GMM presets a elegat determiistic filterig solutio i the form of the Gaussia Sum filter (Alspach ad Soreso, 1972). The Gaussia Sum filter (GSUM-F) was proposed as a solutio to estimatio problems with o-gaussia oise or prior desities. The GSUM-F relies o liear dyamics ad the assumptio that the parameters of the Gaussia mixture approximatio to the o-gaussia oise or prior desities are kow a priori. This liearity assumptio ca be relaxed, e.g. by liearisatio (EKF GSUM-F) (Aderso ad Moore, 2005) or determiistic samplig (UKF GSUM-F) (Luo et al., 2010), but both solutios still require a priori kowledge of the GMM parameters. If, however, the prior ad oise desities are Gaussia, the UKF GSUM-F ad EKF GSUM-F are reduced to the stadard UKF ad EKF, i.e. they become ui-modal filters. To accout for a possible ui-modal to multi-modal trasitio i a o-liear system, we eed to solve two problems: the propagatio of the ucertaity ad the parameter estimatio of the GMM approximatio. Kotecha ad Duric (Kotecha ad Djuric, 2003), proposed radom samplig for ucertaity propagatio ad Expectatio-Maximizatio (EM) to estimate the GMM parameters. I this report, 2 1. Itroductio

9 we propose to propagate ucertaity determiistically usig the Usceted Trasform, which also allows for a closed-form expressio of the GMM parameters. I our proposed multi-modal filter (Kamthe et al., 2013) we preset a approximate solutio to address the possible ui-modal to multi-modal trasitio i a o-liear system by a Gaussia mixture models ad derive closed form expressios to estimate Gaussia mixture model parameters, for details see Sectio While filterig is a olie estimatio, i.e. our curret state estimate is based o all the observatios up to curret time idex, we ca look back ad correct our previous state estimates based o ew observatios, the filter to correct estimates based o future observatios is called Fixed lag Smoother or just smoother. The smoother also eeds to take ito accout the oliear trasitio fuctio. We propose a ovel smoother based o multi-modal filter i the Sectio 2.2. To the best of our kowledge the smoothig algorithm for o-liear estimatio based o Gaussia is ot available. The other Gaussia mixture based smoothers i literature are the Two-filter smoother based o the backward iformatio filter (Kitagawa, 1994), the Gaussia Mixture smoother (Vo et al., 2012) based o the β recursio ad the Expectatio Correctio smoother (Barber, 2006) based o approximatig a Gaussia itegral by evaluatig the itegral at mea of the Gaussia ucertaity. 3

10 1.1 Scope of the Work ad Cotributios As a testimoial to the importace of the o-liear estimatio ad time series models may stadard texts are available, ad i particular excellet descriptio of filterig ad state estimatio problems are covered i texts (Kailath et al., 2000) ad (Bar-Shalom et al., 2004). We iclude stadard Gaussia filters ad particle filters as state of the art o-liear estimators i itroductory chapter. No-liear filters ca be broadly classified as described i 1.2. We itroduce the oly the stadard algorithms for each theme, e.g. Determiisitic uimodal filters (Gaussia beliefs) also iclude Cubature Kalma filters (Arasaratam ad Hayki, 2009), Gauss Hermite Quadrature filters, etc Scope I this work we restrict ourselves to Gaussia priors, Gaussia Noise desities ad use stadard bechmarkig tools like Uivariate No-liear Growth Model (UNGM) to test ew algorithms proposed i this work. The Gaussia sum filter ad Multi-modal filters ca hadle o-gausia desities ad, hece, will outperform ui-modal o-liear filters i o-gaussia oise desity scearios. However, we avoid o-gaussia priors to demostrate that the proposed algorithm ca predict ad hadle o-gaussia desities geerated due to o-liearity of systems. The proposed mulit-modal filter will outperform ui-modal filters uder stadard testig coditios for o-gaussia priors 2, hece, for fair compariso with ui-modal filters we oly cosider a Gaussia oise i the system. No-liear Filters Gibbs Filter Particle Filter Uimodal / Gaussia Multimodal Radom Determiistic Determiistic UKF Gaussia Sum Filter Radom EKF Figure 1.2.: No-liear Filters: The classificatio of o-liear filters is abstract ad its mai purpose to defie scope of this work i cotext of state of the art literature, e.g. all leafs are o-liear filters ad described i this report. Orage circles represet the stadard algorithms o which we establish our cotributios. 2 Multi-modal filter is based o (Alspach ad Sorese, 1972) 4 1. Itroductio

11 Assumptios ad Scope Uless stated otherwise, the followig assumptios are made throughout the report. No-liearity: The system fuctio f ad measuremet fuctio h are o-liear ad assumed to be kow ad cotiuous. Prior: We assume a Gaussia prior o iitial state variables Noise: Noise desity is assumed to be Gaussia ad we assume stadard ormal distributio through out our discussio ad examples to maitai cosistecy. The followig poits are NOT covered i this report. A thorough compariso with Markov-Chai-Mote-Carlo (MCMC) methods. Compariso of proposed smoother with state of the art Gaussia smoothers. We, however, iclude a compariso with Gibbs smoother (Deiseroth ad Ohlsso, 2011) ad leave thorough comparisos to future work Scope of the Work ad Cotributios 5

12 1.1.2 Cotributios The mai cotributios of this thesis are elisted below. 1. We preset closed-form expressios for estimatig the parameters of the Gaussia mixture model to approximate multi-modal predictive desity of a o-liear dyamical system, see Sectio Multi-Modal-Filter (M-MF) (Kamthe et al., 2013), itroduced i Sectio 2.1 is a multimodal approach to filterig i o-liear dyamical systems, where all desities are represeted by Gaussia mixtures. 3. We propose a ovel multi-modal smoothig pass for o-liear dyamical systems, the result from Multi-Modal Filter is used to obtai a Gaussia Mixture based multi-modal smoother, see Sectio 2.2. A smoothig pass based o Gaussia sum approximatios is ot available i literature to the best of our kowledge, ad this algorithm is the most sigificat cotributio of the thesis. Note: Gaussia Sum Approximatios were first itroduced i (Alspach ad Sorese, 1972) ad form the basis of Gaussia sum filter (Aderso ad Moore, 2005). The Gaussia mixture represetatio provides flexibility to approximate complex predictive desities alog with ability to use closed form expressios for filterig ad smoothig Itroductio

13 1.2 Bayesia Filter f x 1 x x +1 h Filterig Smoothig y 1 y y +1 Figure 1.3.: State Space Model (SSM) of a dyamical system. Circles represet the hidde state variables, squares represet observed variables ad subscripts deote time idices. State trasitio fuctio f ad measuremet fuctio h are assumed kow ad o-liear. Arrows idicate iformatio flow i filterig ad smoothig ad both estimators are recursive, i.e. i a typical SSM model the above sequece repeats itself i both directios (left ad right). We ow describe the geeric framework of iferece i state space models ad we itroduce stadard termiology i the cotext of teis ball trackig. Readers familiar with state-space models ca skip to Bayesia filters i Sectio The mathematics or equatios to predict the future state values ad correct our estimate of the state based o the ew measuremets is called filterig. Whe the system is o-statioary ad oise is Gaussia we ca use Kalma Filter (Kalma, 1960) to obtai optimal estimates (Aderso ad Moore, 2005). The state-space models are powerful tools to represet may complex problems elegatly, e.g. whole ball trackig system ca be eatly ad compactly represeted by the state-space diagram show i Figure 1.3. The state-space represetatio is simple eough to uderstad ituitively ad yet it ca represet almost all the estimatio problems of practical sigificace (Thru et al., 2005). I the diagram the states are represeted by circles ad measuremets by squares. The circles are called the latet variables as we ca ascertai their existece oly through our observatios, the orage squares i the Figure 1.3. The velocity of teis ball for example, is a latet variable ad we ca estimate it based o our observatios oly. To facilitate a mathematical formulatio of system we assume Markov property, i.e. the ext state is based etirely o the curret state ad is idepedet of all the other states. The Markov assumptio allows us to write recursive estimatio algorithms. Throughout this thesis we represet state of the system by variable x ad use subscript otatio x to deote values of the state vector at discrete time step idex. To model sequetial data i time series we use SSM as described above. The state trasitio fuctio f ad measuremet(observatio) fuctio h are assumed to be kow. The variable Y j = {y 1,..., y j } represets all the observatios up to time step j. The model ca be represeted as i Figure Bayesia Filter 7

14 1.2.1 Filterig I a recursive Bayesia filter, the curret belief or the state distributio p(x ) is calculated from the previous state distributio p(x 1 ) at a discrete time idex 1. We iitialise the recursive filter with the state x 0 ad a distributio p(x 0 ). As our state is latet, the curret belief is coditioed o observatios up util the curret observatio, i.e. the curret belief is p(x 1 Y 1 ). Filterig is defied as estimatig p(x Y ) give ew observatio y ad ca be divided i two steps, the time update ad the measuremet update. We typically, alterate betwee these two steps, i.e. we predict the state (time update) ad the correct the state estimate (measuremet update) ad use the ew corrected state to predict ext state ad so o. Time Update I the time update we move oe step horizotally alog the chai (horizotal red arrow) i the Figure 1.3. This ca be represeted as estimatig p(x Y 1 ) from p(x 1 Y 1 ). The time update is defied by the itegral p(x Y 1 ) = p(x x 1 )p(x 1 Y 1 ) dx 1. (1.1) This itegral caot be solved i closed form for o-liear systems. All o-liear filters approximate this itegral. Particle filters replace the itegral by sum ad probabilities by radom samples. Measuremet Update I the measuremet update we icorporate the curret observatio y ad estimate the posterior desity p(x Y ). This ca be iterpreted as movig iformatio from observatio to correct the time update predictio accordig to the curret measuremet (vertical red arrow). We use Baye s theorem to write p(x Y ) = p(y x )p(x Y 1 ), (1.2) p(y Y 1 ) where the margial desity p(y Y 1 ) is obtaied by p(y x )p(x Y 1 ) dx Smoothig I smoothig, we wish to estimate the distributio of the state x give all observatios Y N. We assume that all the state predictive desities p(x Y ) are available. We estimate p(x Y ) recursively as (Kitagawa, 1994) p(x+1 Y N )p(x +1 x ) p(x Y N ) = p(x Y ) dx +1. (1.3) p(x +1 Y ) I the Figure 1.3, this ca be iterpreted as movig from right to left Itroductio

15 Teis Ball Trackig We ow cosider Bayesia filterig i the ball trackig cotext. Time Update: We estimate the ext state of the ball. Will it move i a straight lie? Will it hit the groud (i ext time step)? Was there ay spi? Measuremet Update: We correct our estimate to update our curret state, e.g. it hit the groud, there was a top spi imparted o it, etc. Smoothig Update: Smoothig allows us to correct the whole trajectory based o the fial observatios, i.e. ow we kow that there was top spi, does it chage our estimate of the state of the ball at the very begiig (Ideally, it should, if the spi imparted was our latet variable ad ow we kow more about, the type ad the amout, of spi based o the fial result). So if we kow how the system works ad have high speed cameras why do we eed itegrals ad probabilities? Our observatios are oisy or ucertai (capturig 3D world with 2D sesors), eve with stereoscopic visio we ca be tricked (optical illusios). I the Hawk-Eye trackig system the ball ad its eviromet are ucertai, e.g. humidity ad amout of spi imparted decide how the ball moves i air. So to deal with ucertaities we make assumptios, ad aturally the accuracy of estimatio depeds o how close our assumptios are to the reality. I the followig we establish a stadard approximatio used i Bayesia filterig ad smoothig Bayesia Filter 9

16 1.3 Liear Gaussia Filter The Bayesia filterig as defied by the Equatios (1.1) to (1.3), ca be solved i closed form with Liear Gaussia systems (Thru et al., 2005). The derivatio of the Kalma filter from Bayesia filter is well kow ad we do ot attempt to rewrite it. Readers ca fid Kalma filter derivatio based o Wieer filter i (Kailath et al., 2000), a derivatio based o Bayesia filter approach is give i (Thru et al., 2005). The Kalma filter exploits Gaussia coditioig property, i.e. if we have a joitly Gaussia distributio we ca obtai coditioal desity i a closed form. We assume Gaussia ucertaities ad approximate desities or beliefs as Gaussias. The filterig based o the Gaussia assumptio is called Gaussia filterig The Kalma Filter We cosider a dyamical system described by the followig equatios x = f (x 1 ) + w, w (0,Q), (1.4) y = h(x ) + v, v (0, R), (1.5) where f ad h are the (o-liear) trasitio ad measuremet fuctios, respectively. The oise processes w ad v are i.i.d. zero mea Gaussia with covariaces Q ad R, respectively. We deote the D dimesioal state by x, ad y is the E dimesioal observatio. I Gaussia filterig we assume that all the kow desities, system ad measuremet oise ad prior desity are Gaussia. For a liear system this eables us to use liear Gaussia coditioig ad all the equatios (1.1) to (1.3) ca be derived i closed form. We assume that our state ad measuremet variables are joitly Gaussia to exploit the Gaussia coditioig property. The assumptio reduces Kalma filterig to estimatig the auto-covariace ad cross-covariace matrices (Deiseroth ad Ohlsso, 2011). The joit Gaussia matrix for a latet state x ad measuremet variable y ca be writte as x y x y µ µ y Σ x,x, Γ x, y T Γ x, y Σ y, y (1.6) ad for state variables at time idex ad 1 as, x 1 x x 1 x µ 1 µ, Σ x,x 1 Γ x,x T Γ x,x Σ x,x (1.7) Itroductio

17 Algorithm 1 Kalma Filter: Algorithm illustrates Kalma filterig for Liear Gaussia systems. Subscripts defie time idex, superscripts deote latet x or observed y variable, µ, Σ, ad Γ are mea, covariace ad cross covariace respectively of a Gaussia (Liear systems) or approximated as Gaussia (No-liear systems) beliefs 1: fuctio KALMANFILTER( f, h, µ 1, Σ 1, y,q, R) 2: µ = f (µ 1 ) 3: Estimate Σ x,x usig f ad Σ 1 4: Σ = Σ x,x + Q Time update 5: Estimate Σ y, y ad Γ x, y usig h ad Σ Σ y,y 6: K = Γ x, y 7: µ = µ + K y h(µ ) 8: Σ = Σ 1 (Γ x, y 9: retur µ, Σ 10: ed fuctio + R 1 )(Σy, y )(Γx, y T ) Kalma Gai K Measuremet Update We describe Kalma filter algorithm based o the joit Gaussia assumptio i Algorithm 1. For the liear dyamical system with system gai matrix F ad measuremet matrix H the covariaces are give by Σ x,x = F Σ F T, (1.8) Σ y, y = H Σ H T, Γ x,x = Σ F T, Γ x, y = Σ H T. For a o-liear system with a Gaussia assumptio (implicit liearisatio), we approximate these covariaces matrices usig differet methods such as the Usceted Trasform or Gibbs samplig (Deiseroth ad Ohlsso, 2011) Liear Gaussia Filter 11

18 1.3.2 The Rauch-Tug-Striebel (RTS) Smoother I smoothig we move backwards to update our estimate of trajectory. Below we describe a well-kow backward pass called Rauch-Tug-Striebel (RTS) smoother. The derivatio of the backward pass below is similar to the backward pass preseted i (Särkkä, 2008). We describe below a procedure to evaluate itegral (1.3). 1. We reproduce the Equatio (1.3) p(x Y N ) = p(x Y ) p(x+1 Y N )p(x +1 x ) p(x +1 Y ) dx +1. (1.9a) 2. We push the filterig result p(x Y ) iside the the itegral as it a costat w.r.t. itegratio variable x +1 to obtai p(x Y N ) = p(x +1 Y N ) p(x +1 x )p(x Y ) }{{} p(x +1 Y ) dx +1. joit distributio of x ad x +1 (1.9b) 3. We ca obtai coditioal distributio of x give x +1 ad Y usig the joit distributio p(x, x +1 Y ). This coditioal ca be achieved by mere reformulatio of terms i step above p(x, x +1 Y ) p(x Y N ) = p(x +1 Y ) }{{} Coditioal distributio of x p(x +1 Y N ) dx +1. (1.9c) 4. Exploitig the Markov property, we ca write p(x x +1, Y N ) = p(x x +1, Y ) ad, hece, p(x Y N ) = p(x x +1, Y N )p(x +1 Y N ) dx +1. (1.9d) Itroductio

19 Algorithm 2 RTS Smoother: The algorithm describes RTS smoother for liear Gaussia systems. Subscripts defie time idex, superscripts deote latet x or observed y variable, µ, Σ, ad Γ are mea, covariace ad cross covariace respectively of a Gaussia (Liear systems) or approximated as Gaussia (No-liear systems) beliefs. Superscript s is used to deote fial smoothed result, i.e. we defie smoothed mea as µ s ad filtered mea as µ. 1: fuctio RTSSMOOTHER(µ, Σ, µ +1, Σ +1,Q) 2: µ +1 = f (µ ) 3: Σ +1 = F Σ 1 F (µ ) T + Q 4: 5: D = Σ F Σ 1 +1 µ s = µ + D µ+1 µ +1 Smoother Gai D 6: Σ s = Σ + D Σ+1 Σ +1 D T 7: retur µ s, Σs 8: ed fuctio 5. If we assume p(x x +1, Y N ) (x µ, Σ ) ad p(x +1 Y N ) (x µ s +1, Σs +1 ), the the distributio p(x Y N ) is also Gaussia, i.e. p(x x +1, Y N ) (x µ s, Σs ) Based o Step 5 above, we ca use Gaussia coditioig to write smoothig algorithm as Algorithm 2, 1.3. Liear Gaussia Filter 13

20 25 Noliear Fuctio f(x) p(f(x)) 15 p(f(x)) f(x) p(x) p(x) x Figure 1.4.: Example of Multi-modal predictive desity i UNGM with stadard ormal distributio as prior. The blue curve is stadard ormal prior ad the gree curve represets a o-liear fuctio. The predictive desity for a o-liear mappig of stadard ormal distributio is show by red curve. The red curve was geerated usig kerel smoothig desity estimator usig 10 5 samples. The samples are geerated from the Gaussia distributio show i blue ad the mapped by o-liear fuctio f (i gree). 1.4 Noliear Filters The Kalma filter ad RTS smoother allow us to recursively calculate state estimates ad all relatios are give i closed form, however, they make assumptio that the system is liear. Most applicatios of real life examples are oliear i ature. To uderstad it i our ball trackig system, our observatio fuctio, the video cameras implicitly use o-liear trasformatios to measure positio of the ball, e.g. Cartesia to polar, 3 dimesioal world to 2 dimesioal observatio. We caot use Kalma Filter i its form described i Sectio uless we use approximatios to estimate the covariace matrices i Equatios (1.6) ad (1.7). The approximatios are eeded due to the followig challeges i o-liear dyamical systems. 1. Predictive distributios or the itegrals used i our Bayesia framework do ot yield closed form solutios. Approximatios are eeded. 2. No-liear mappig results i o-gaussia distributios, i.e. eve if we were able to somehow evaluate the itegrals, we caot propagate the results forward due to the lack of parametric form. As a illustratio, cosider a hypothetical system that starts as Gaussia Itroductio

21 ad udergoes a oliear fuctio map as show i Figure 1.4. The red curve o left describes o-gaussia desity hece we caot use Kalma filter or RTS smoother. Most o-liear estimators solve both challeges simultaeously i.e by liearisig a o-liear system. The process is described i the followig sectios. We make the above distictio to illustrate our approach to tackle the problem. We treat above challeges idividually, i.e. 1. Provide a Bayesia framework to estimate predictive desity 2. The desity is parametric hece we ca use recursive filters like Gaussia sum filters ad yet it is ot ecessarily liearisatio as we allow a o-gaussia predictive desity to a Gaussia prior. The Figure 1.4 shows approximatio for the same o-liear fuctio we described i Figure 2.2. If we approximate the predictive desity by a Gaussia we implicitly liearise the system. This implicit liearisatio ca lead to large approximatio errors, e.g. see Figure 1.4. I Figure 1.4, ay Gaussia approximatio to the red curve (predictive desity) would result i large errors as a sigle Gaussia caot capture both modes simultaeously Noliear Filters 15

22 Liear Approximatio As described above i the liear approximatio we approximate all the beliefs or state distributios by a Gaussia desity. The Gaussia assumptio allows us to use property of Gaussia coditioig a property that eables us to write state estimatio i closed form. So, if we assume p(x ) (x µ, Σ ) ad p(x 1 ) (x 1 µ 1, Σ 1 ), the our liear approximatio implies x x 1 x x 1 µ µ 1, Σ Γ T Γ Σ 1. (1.10) With the kow trasitio fuctio f we ca evaluate the mea of the predictive Gaussia as µ = f (µ 1 ). I the followig we describe stadard methods classified based o the way we approximate the predictive Covariaces Σ ad Γ. A uified view o the liear Gaussia systems is give i (Deiseroth ad Ohlsso, 2011). We give brief overview of some o-liear estimators that use the liear Gaussia assumptio. Exteded Kalma Filter: Liearise the fuctio usig Taylor series expasio, see Sectio Usceted Kalma Filter: Use determiistic samplig to estimate the Covariaces, see Sectio Gibbs Filter: Radom samplig to estimate the mea ad the Covariaces, see Sectio The list of algorithms is meat to be illustrative ad ot exhaustive. Each type described above demostrates a priciple or cocept used to approximate to tackle itractability iheret i oliear filterig systems (Deiseroth ad Ohlsso, 2011). Several ad-hoc methods are available i literature Itroductio

23 1.4.1 Exteded Kalma Filter As described i Sectio 1.4, we liearise the system ad measuremet fuctios. We use the same model as described by system of Equatios (1.4). With our liear approximatio model (1.10), the parameters of predictive Gaussia are calculated as µ = f (µ 1 ), (1.11) Σ x,x Σ y, y Γ x, y Γ x,x = F 1 (µ 1 )(Σ 1 )F T 1 (µ 1), = H (µ )(Σ )H T (µ ), = Σ 1 H T 1 (µ 1), = Σ 1 F T 1 (µ 1), where F 1 is the Jacobia matrix of f obtaied by F (µ ) k,k = f k(x) k x. (1.12) x=µ where k ad k deote row ad colum of a matrix, similarly we have H (µ ) k,k = h k(y) k y. (1.13) x=µ The matrices i (1.11) ca be used i Algorithm 1 to obtai Exteded Kalma Filter (EKF) Noliear Filters 17

24 Figure 1.5.: Example of the UT for mea ad covariace propagatio. a) actual, b) first-order liearizatio (EKF), c) UT. The Figure is reproduced from (Wa ad va der Merwe, 2000) Usceted Kalma Filter The usceted Kalma filter is based o the Usceted Trasform (UT) (Julier ad Uhlma, 2004) where we use determiistic samplig to estimate the covariace matrices Σ ad Γ. Similar to EKF we eed oly estimate the covariace matrices (Deiseroth ad Ohlsso, 2011) to obtai Usceted Kalma filter. The matrices i (1.14) ca be used i Algorithm 1 to obtai Usceted Kalma Filter (UKF). To illustrate the UT cosider a variable x 1 (x 1 µ 1, Σ 1 ) ad we wat to estimate the distributio of p(x ) giev that x = f (x 1 ) ad f is ay o-liear fuctio. We ca divide the trasform i the followig steps, 1. Calculate sigma poits X of distributio (x 1 µ 1, Σ 1 ). 2. Propagate the sigma poits through o-liear fuctio f. 3. Estimate trasformed mea ad covariaces based o the trasformed sigma poits. 3 3 The Y i (1.14f) ad (1.14g) is a matrix ad ot a vector of observatios Itroductio

25 χ [0] 1 = µ 1, χ [j] 1 = µ 1 + ( Σ 1 ) j, χ [j+d] 1 = µ 1 ( Σ 1 ) j, (1.14a) (1.14b) (1.14c) where ( Σ 1 ) j represets j th row of the Cholesky factor of Σ 1. The poits χ 1 are called sigma poits as they are spread aroud mea at a distace of oe sigma µ = X w µ, (1.14d) Σ x,x = X W X T, (1.14e) Σ y, y = Y W Y T, (1.14f) Γ y,x = Y W X T, (1.14g) where χ 1 is a sigma poit matrix ad Σ 1 is a lower triagular matrix of the Cholesky factorisatio, fuctio f (.) is applied to each colum of argumet matrix. The costat is obtaied by c = α 2 ( + κ) where α ad κ are costats of UT. The vector w µ ad matrix W are defied as w µ = W 0 m... W 2 T m, (1.14h) W = (I [w µ... w µ ]) diag(w 0 c... W 2 c ) (I [w µ... w µ ]) T. (1.14i) Where the weights are obtaied by (1.14j) W 0 m W 0 c W i m W i c = λ/( + λ), (1.14k) = λ/( + λ) + (1 α2 + β), = 1/ {2( + λ)}, i = 1,..., 2., = 1/ {2( + λ)}, i = 1,..., 2. with λ defied as λ = α 2 ( + κ). (1.14l) The hyperparameters λ, α ad κ cotrol the spread of the sigma poits χ (?) 1.4. Noliear Filters 19

26 Algorithm 3 Particle Filter: The followig algorithm describes a geeric particle filter algorithm, adapted from (Thru et al., 2005). The χ represets samples, subscript deotes time idex. The system fuctio ad the measuremet fuctios are give by f ad h respectively. The algorithm returs particles χ that collectively represet filtered estimate of system at time. 1: fuctio PARTICLE FILTER(χ 1, f, h, y ) 2: χ = χ = 3: for m=1 to M do 4: sample x [m] 5: w [m] 6: χ = χ + x [m] f (x [m] 1 ) p(x x [m] 1 ) = p(y h(x [m] ) ) Calculate weight (importace) of particle, w [m] 7: ed for 8: for m=1 to M do 9: sample i with probability w [i] 10: add x [i] to χ 11: ed for 12: retur χ 13: ed fuctio Importace re-samplig Particle Filters Particle filters are o parametric implemetatio of Bayesia filter. I particle filters we represet our beliefs or probabilities desities by samples draw from these desities. See (Thru et al., 2005) for detailed overview o particle filters i terms of Bayesia filterig. As a illustrative example cosider a system with prior probability, p(x 0 Y 0 ) = (x 0 µ 0, Σ 0 ). Subsequetly, we represet p(x 0 Y 0 ) by set χ of radomly draw samples from (x 0 µ 0, Σ 0 ), i.e. where M represets the umber of particles. χ [M] 0 p(x 0 Y 0 ) = (x 0 µ 0, Σ 0 ) For the time update we ca use the fuctio f o set of particles χ [M] 1, i.e. χ [M] p(x Y 1 ) f (χ [M] 1 ) Geeric particle filter is described i algorithm 3 which is adapted from (Thru et al., 2005) for a thorough overview of particle filters ad Markov Chai Mote Carlo (MCMC) methods refer (Doucet et al., 2001) Itroductio

27 Gibbs Filter Ulike geeric particle filters described i Sectio Gibbs filter is parametric. The state desities p(x Y ) are approximated by a parametric distributio ad we use radom samplig (Gibbs sampler) to estimate parameters of these approximate desity. I (Deiseroth ad Ohlsso, 2011), the authors use the Gaussia assumptio for predictive desities ad the state covariace ad cross covariace matrices are calculated usig Gibbs sampler. I this work we use Gibbs filter described i (Deiseroth ad Ohlsso, 2011) to compare performace of the proposed filter Gaussia Sum Filter The o-liear filters we discussed i precedig sectios all share a commo philosophy, approximate a itractable predictive distributio by a Gaussia distributio. Gaussia approximatio also implicitly imposes a uimodal belief o our state estimate, however, as the ame of the report suggests we are iterested i multi-modal beliefs. If we express our multi-modal beliefs as Gaussia mixtures the we ca use Gaussia Sum Filter proposed by (Alspach ad Soreso, 1972). We would like to stress that we view Gaussia sum filter as a algorithm to Kalma Filter like approximatios whe beliefs are expressed as Gaussia Mixture. The differece is subtle ad we give followig examples to support our view. 1. Gaussia sum filter ca be used for o-liear systems if ad oly if we express approximate beliefs as Gaussia Mixtures. Gaussia sum filter has o mechaism to deal with o-liearities. Various ways to approximate fuctios usig optimisatio are discussed by authors i (Alspach ad Soreso, 1972). 2. I Switchig Liear Dyamical Systems (SLDS) Gaussia mixtures arise aturally, Gaussia sum filter ca be expressed as a Assumed Desity Filter [cite Mika, Barber] where the assumed desity is Gaussia Mixture. A excellet discussio ad alterative view o usage of Gaussia sum for o-liear filterig is give i Sectio 8.4 of (Aderso ad Moore, 2005). The chapter covers several pages ad we recommed it as must read to gai isights i the iterpretatios of Gaussia Sum Filter. However, the argumet Gaussia Sum Filter ca be used as a approximate iferece techique with Gaussia Mixture beliefs is the importat take away poit from this Sectio Noliear Filters 21

28 Estimatio Scheme Mea Variace P(X 0) χ 2 distributio Particle Filter 1e3 [1e6] particles [0.9998] [2.0020] 0 Exteded Kalma Filter 0 N/A N/A Usceted Kalma Filter Multi-modal Filter Table 1.1.: Compariso of various determiistic o-liear approximatios: The table shows approximatios for stadard ormal distributio p(x) = (0, 1) mapped through quadratic o-liearity x 2. True distributio is chi-squared with 1 degree of freedom p(x 2 ) = χ 2 1. The usceted Kalma filter ad multi-modal filter use the same optimal hyper parameters, multi-modal filter i additio uses tuig parameter α = 1.2. The EKF diverges, see Figure Summary of State of the Art Filters ad Motivatio for Multi-Modal Filter The Exteded Kalma filter ad Usceted Trasform filter are stadard state of the art algorithms for determiistic iferece with Gaussia beliefs, a similar method for multi-modal beliefs is ot available to the best of our kowledge. The success of EKF ad UKF is based o the ease of implemetatio ad approximatios may result i severe errors eve for most trivial problems. We motivate eed for a fast determiistic method with a simple example of oe dimesioal quadratic map to demostrate. The Sectio is motivated by discussios i (Gustafsso ad Hedeby, 2012). We use quadratic approximatio to test the methods. Large umber of real life applicatios ofte have quadratic map, e.g. the euclidea distace i trackig problems is a quadratic map. Quadratic map is simple eough that we ca calculate desired values i closed form. The probability distributio is ot a Gaussia ad we ca estimate errors i approximatio. We cosider a map f (x) = x 2 for a stadard Gaussia variable x (x 0, 1). The p(f (x)) is the a χ 2 distributio with degrees of freedom, where is dimesio of the variable x. The plots of approximatios for predictio with UKF, EKF, Particle Filter(samplig) ad true χ 2 1 distributio are plotted i Figure 1.6. We discuss this results alog with the proposed Multi-modal filter i Chapter 4. Here, the mai aim of presetig these results is, both EKF ad UKF may result i severe errors, particle filters eed substatially large umber of particles to get good approximatios, a approximatio may ot match true momets of trasformed variable ad yet result i less approximatio errors, see Figure 1.6 ad Table Itroductio

29 Probability CDF of Multi Modal Filter CDF of χ 2 distributio CDF of Usceted Trasform CDF of Exteded Kalma Filter Variable x (a) Cumulative Distributio Fuctio 1 Probability Multi modal Filter χ 2 distributio Usceted Trasform Variable x (b) Probability Distributio Fuctio Figure 1.6.: Compariso of various determiistic o-liear approximatios: The figure shows predictive desities for a quadratic trasformatio (x 2 ) of the stadard ormal distributio by o-liear estimatio methods. The gree curve shows the ideal result, χ 2 distributio with 1 degree of freedom (χ 2 1 ) (a) The plot shows cumulative distributio of the predicted desities, ideally there should be o mass before zero as show by the gree curve of χ 2 distributio. The EKF cocetrates whole mass at x = 0 (b) The plot shows predictive desities. Note, the χ 2 1 distributio has discotiuity at x = Summary of State of the Art 23

30 2 Multi-Modal Filter ad Smoother We discussed shortcomigs of uimodal approximatios i Sectio 1.5. The ideal filter for a o-liear system should be able to represet multi-modal desities ad make cosistet estimates. The multi-modal beliefs give the filter additioal flexibility to reduce approximatio errors. A Gaussia mixture is a ideal approximatio for multi-modal beliefs; it is parametric, ad we ca use Gaussia Sum filter for estimatio. With a sufficietly large umber of Gaussias i the mixture we ca arbitrarily reduce approximatio error (Aderso ad Moore, 2005); (Alspach ad Sorese, 1972). The challege with the Gaussia mixture approximatios is to estimate its parameters. We eed to fid a method to fit a Gaussia mixture to the predictive desity of Gaussia udergoig a o-liear trasform. Give such a method we ca repeat the procedure to obtai a predictive Gaussia mixture for a o-liear trasform of the iput Gaussia mixture. The first step to estimate Gaussia mixture parameters to approximate the o-gaussia, predictive desity for a o-liear map of a Gaussia, is described i Sectio (Kamthe et al., 2013). Give the filterig results ad all the observatios we ca improve our estimatio result by implemetig a backward filter as described i Sectio For multi-modal beliefs the optimal backward filter, Equatio (1.3), caot be evaluated i closed form due to coditioig o p(x +1 Y ) which is a Gaussia a mixture (see Sectio 2.1.3). We propose a approximate solutio to smoothig with Gaussia mixture beliefs i Sectio Multi-Modal Filter I the followig, we devise a closed-form filterig algorithm with multi-modal represetatios of the state distributios. Our algorithm is ispired by the followig observatio made by Julier ad Uhlma (Julier ad Uhlma, 2004): Give oly the mea ad the variace of the uderlyig distributio, ad, i absece of ay a priori iformatio, ay distributio (with the same mea ad variace) used to calculate the trasformed mea ad variace is trivially optimal. This observatio was the basis to derive the Usceted trasform ad the UKF. However, predictios based o the Usceted Trasform ofte uder-estimate the true predictive ucertaity, which ca result i icoheret state estimatio ad diverget trackig performace. To address this issue, we use a differet (optimal) represetatio of the uderlyig distributio, which still matches the mea ad variace: We propose to represet each sigma poit i the Usceted trasform by a Gaussia cetred at this sigma poit. This approximatio of the origial distributio is effectively a GMM with 2D+1 compoets, where D is the dimesioality of x. I Sectio 2.1.1, we derive a optimal GMM represetatio of a state distributio p(x 1 ) of which oly the mea ad variace are kow. I Sectio 2.1.2, we detail how to map this GMM through a o-liear fuctio to obtai a predictive distributio p(x ), which is represeted by a GMM. We geeralise both ucertaity propagatio ad parameter estimatio to the case where p(x 1 ) is give by a GMM. I Sectio 2.1.4, we propose a method for pruig the umber of 24

31 mixture compoets i a GMM to avoid their expoetial icrease i umber of compoets. I Sectio 2.1.3, we propose the resultig filterig algorithm, which exploits the results from Sectios Estimatio of the Gaussia Mixture Parameters Let the mea ad the variace of the state distributio p(x 1 ) be give by µ, Σ, respectively. The, we ca represet p(x 1 ) by a GMM p(x 1 ) = 2D i=0 δ iϕ i (x 1 ), such that the mea ad the variace of the approximate desity p(x 1 ) equal the mea µ ad variace Σ of p(x 1 ). This represetatio is achieved by the closed-form relatios δ i = 1/(2D + 1), µ 0 = µ, µ j = µ + σ j, µ j+d = µ σ j, Σ i = 1 2α Σ, D+1 (2.1) where i = 0,..., 2D ad j = 1,..., D, where D is the dimesioality of the state variable x 1. The variable σ deotes D rows or colums from the matrix square root ± ασ. From (2.1), we ca see that we eed to calculate Σ oly oce for all 2D + 1 Gaussias ϕ i (x 1 ). To esure that Σ i is positive semi-defiite, the scalig factor α should be chose such that 2α (2D + 1), see (2.1). For 2α = 2D+1 i (2.1), the equatios above reduce to scaled sigma poits (Julier ad Uhlma, 2004). Hece, the GMM represetatio i (2.1) ca be cosidered a geeralisatio of the classical sigma poit represetatio of desities employed by the Usceted Trasform, where each sigma poit becomes a improper probability distributio Propagatio of Ucertaity A key step i filterig is the ucertaity propagatio step, i.e. estimatig the probability distributio of radom variable, which has bee trasformed by meas of the trasitio fuctio f. Give p(x 1 ) ad the system dyamics (1.4), we determie p(x ) by evaluatig p(x x 1 )p(x 1 )dx 1. For o-liear fuctios f, the itegral above ca ot be solved i closed form. Thus, approximate solutios are required. Ucertaity propagatio i o-liear systems ca be achieved by approximate methods, employig liearisatio or determiistic samplig as i the EKF ad UKF. I such approaches, the state distributio p(x 1 ) ad the approximate predictive desity p(x ) are well represeted by Gaussias. If the state distributio p(x 1 ) is a Gaussia mixture as i (2.1), we ca estimate the predictive distributio p(x ) similarly, e.g. by applyig such a approximate update to each mixture compoet i the GMM. I the multi-modal filter, we propagate each mixture compoet ϕ i (x 1 ) of the GMM through f ad approximate p(x ) by p(x ) = 2D 2D p(x x 1 ) δ i=0 iϕ i (x 1 )dx 1 j=0 δ jϕ j (x ), (2.2) 2.1. Multi-Modal Filter 25

32 30 20 p(f(x)) p(f(x)) 0 f(x) 0 f(x) p(x) p(x) p(x) p(x) x Figure 2.1.: Demostratio of predictive desity estimatio with multi-modal filter. 1) We approximate Gaussia (blue solid) by a Gaussia mixture (blue dashed) usig (2.1) 2) Gree solid lie represets a o-liear fuctio 3) The solid red lie represets a Gaussia mixture evaluated usig EM. The dashed red Gaussias represet usceted trasform of dashed blue Gaussias. The dashed straight dashed lies represet sigma poits used by a usceted trasform. where the mea ad covariace of each ϕ j (x ) are computed by meas of the Usceted Trasform. The Figure 2.1, shows a example of ucertaity propagatio usig scheme described i Equatio (2.2). If the prior desity is a Gaussia mixture p(x 1 ) = M 1 j=0 β jϕ j (x 1 ), we repeat the procedure above for each mixture compoet i p(x 1 ), i.e. we split each mixture compoet ϕ j ito 2D + 1 compoets δ i ϕ ji, i = 0,..., 2D, ad propagate them forward usig the Usceted Trasform. For otatioal coveiece, we defie this operatio o a Gaussia mixture as (f, p(x 1 )), such that p(x ) = (f, p(x 1 ))= = M 1 j=0 2D i=0 M(2D+1) 1 l=0 β j δ i p(x x 1 )p(x 1 )dx 1 p(x x 1 )ϕ i j (x 1 )dx 1 γ l ϕ l (x ), (2.3) where γ l = δ i β j. We compute the momets of the mixture compoets ϕ l by meas of the Usceted Trasform. From (2.3), it ca be observed that durig a update step the umber of compoets grows by a factor of M. Whe applied multiple times, it will result i a expoetial explosio of Multi-Modal Filter ad Smoother

33 Helliger Distace from Particle Filter UT : GMM UT : Multi modal filter Particle Filter Usceted Trasform Helliger Distace from Particle Filter UT : GMM UT : Multi modal filter Particle Filter Usceted Trasform Helliger Distace from Particle Filter UT : GMM UT : Multi modal filter Particle Filter Usceted Trasform Helliger Distace from Particle Filter UT : GMM UT : Multi modal filter Particle Filter Usceted Trasform Figure 2.2.: Example of a multi-modal predictive desity for a o-liear statioary fuctio with stadard ormal distributio as prior. The blue curve shows predictive distributio with calculated usig EM (10 5 samples, 3 compoets). The solid red curve shows fit with proposed mulit-modal method ad black solid lie represets the Usceted Trasform approximatio of predictive desity. We use Helliger distace(bera, 1977) as a metric to measure goodess of fit which shows that proposed method outperforms stadard Usceted Trasform based method for this o-liear fuctio. We ca also visually cofirm that multi-modal filter is able to capture both modes of the distributio. compoets. We reduce the umber of mixture compoets at each time step, for details, see Sectio Filterig I the followig, we subsume all derivatios i our multi-modal o-liear state estimator, whose time ad measuremet updates are summarised i the followig. Time Update Assume that the filter distributio p(x 1 Y 1 ) is represeted by a GMM with M compoets. The time update, i.e. the oe-step ahead predictive distributio is give by p(x Y 1 ) = p(x x 1 )p(x 1 Y 1 ) dx 1. (2.4) 2.1. Multi-Modal Filter 27

34 This itegral ca be evaluated as (f, p(x 1 Y 1 )), such that we obtai a GMM represetatio of the time update as detailed i (2.3). p(x Y 1 ) = M(2D+1) 1 j=0 γ j ϕ j (x 1 ) (2.5) Measuremet Update The measuremet update ca be approximated up to a ormalisatio costat by p(x Y ) p(y x )p(x Y 1 ), (2.6) where p(x Y 1 ) is the time update (2.5). We ow apply a similar operatio as i (2.3) with the measuremet fuctio h as a o-liear fuctio ad obtai p(y Y 1 ) = (h, p(x Y 1 )). (2.7) Substitutig (2.7) ad (2.5) i (2.6) yields the measuremet update, i.e. distributio the filtered state p(x ) 2D i=0 M(2D+1) 1 δ i ϕ i (y 1 ) γ j ϕ j (x 1 ), M(2D+1) 2 1 l=0 j=0 β l ϕ l (x ). (2.8) We calculate the measuremet update for each pair ϕ i ad ϕ j. Recallig that ϕ l (x ) = (x µ i j, Σi j ) for i = 0,..., 2D ad j = 0,..., (2D+1)2 1, the measuremet updates (Särkkä, 2008) ad weight updates (Gaussia Sum (Alspach ad Soreso, 1972)) are give by K j = Γj j 1, 1 Σ 1 µ i j = µi 1 + K j j y µ Σ i j = Σi 1 K j j Σ K j β i,j = 1 1, δ i γ j (x = y µ j 1, Σj 1 ) k,l δ l γ k (x = y µ k 1, Σk 1 ), T, (2.9) where Γ j 1 is the cross covariace matrix cov(x 1, x ) determied via the Usceted Trasform (Särkkä, 2008). After the measuremet update, we reduce the M(2D + 1) 2 mixture compoets i the GMM, see (2.8), to M accordig to Sectio Multi-Modal Filter ad Smoother

35 2.1.4 Mixture Reductio Up to this poit, we have cosidered the case where a desity with kow mea ad variace has bee represeted by a GMM, which could subsequetly be used to estimate the predicted state distributio. Icorporatig these steps ito a recursive state estimator for time series, there is a expoetial growth i the umber of mixture compoets i (2.3). Oe way to mitigate this effect is to represet the estimated desities by a mixture model with a fixed umber of compoets (Aderso ad Moore, 2005). To keep the umber of mixture compoets costat we ca reduce them at each time step (Aderso ad Moore, 2005). A straightforward ad fast approach is to drop the Gaussia compoets with the lowest weights. Such omissios, however, ca result i poor performace of the filter (Kitagawa, 1994). Kitagawa (Kitagawa, 1994) suggested to repeatedly merge a pair Gaussia compoets. A pair is selected with lowest distace i terms of some distace metric. We evaluated multiple distace metrics, e.g. the L 2 distace (Williams ad Maybeck, 2003), the KL divergece (Rualls, 2007), ad the Cauchy Schwarz divergece (Kampa et al., 2011). I this report, we used the symmetric KL divergece (Kitagawa, 1994), D(p, q) = (K L (p q) + K L (q p))/2, which outperformed aforemetioed distace measures for mixture reductio i filterig. 2.2 Multi-Modal Smoothig The smoothig update is based o the forward-backward filter. The optimal backward pass for a dyamical system described by Figure 1.3 is give by equatio (1.3) (Kitagawa, 1994), we repeat the equatio (1.3) here for quick referece, p(x+1 Y N )p(x +1 x ) p(x Y N ) = p(x Y ) dx +1. (2.10) p(x +1 Y ) Equatio (2.10) is a recursio iitiated by settig p(x +1 Y N ) = p(x +1 Y +1 ) where the idex + 1 ow poits at last observatio, i.e. N = + 1. The quatities we kow are p(x +1 Y N ) = p(x Y ) = p(x +1 x ) = M α j ϕ j (x +1 Y N ), previous smoothig result (2.11) j P δ i ϕ i (x Y ), previous filterig result (2.12) i L α k ϕ k (x +1 x ). forward predictio or time update (2.13) k The deomiator of (2.10) is a Gaussia mixture that ca be evaluated as am itegral p(x+1 x )p(x Y )dx. A Gaussia mixture i the deomiator makes a closed form exact solutio to the equatio (2.10) impossible ad we eed approximatios for a Gaussia sum smoother. We propose a approximate solutio to the Gaussia sum smoother i Sectio The solutio proposed i Sectio is ot available i the literature to the best of our kowl Multi-Modal Smoothig 29

36 edge. I the followig, we list state of the art determiistic approximatios to the smoothig Equatio (2.10). Kitagawa s two filter smoother (Kitagawa, 1994). I the two filter smoother the right had side itegral is replaced by a backward iformatio filter p(y N x ), i.e. we write the smoothig pass as, p(x Y N ) p(x Y ) p(y N x ). Closed form forward-backward smoothig (Vo et al., 2012). The forward-backward smoother uses a recursio to calculate likelihood to approximate smoother as, p(x Y N ) = p(x Y ) Lik(Y N x ), the Lik( ) deotes likelihood fuctio. The forward-backward replaces the iformatio filter i Kitagawa s two filter smoother by the likelihood ad avoids the iversio of trasformatio matrix (Vo et al., 2012) Expectatio Correctio (Barber, 2006). The expectatios correctio algorithm igores the itegral altogether ad evaluates left had side at the mea values of Gaussias i mixture, i.e. we replace Gaussia by its mea. The crude approximatio gives excellet results, uderliig the importace of a Gaussia sum smoother i estimatio. The Gaussia sum smoothers described above attempt to solve the problem of backward pass with Gaussia mixture beliefs, however, the system is assumed to be liear (Switchig Liear Dyamical Systems (SLDS)). There is o smoothig algorithm i literature for iferece i a o-liear dyamical system with multi-modal beliefs, to the best of our kowledge. The proposed multi-modal smoother is, however, ot restricted to o-liear dyamical systems aloe ad i priciple ca be applied to SLDS as well Gaussia Sum Smoother for No-liear systems Propositio 2.1. If we defie probabilities as i Equatios (2.11) to (2.13), the Gaussia mixture approximatio to smoothig equatio (2.10) is give as p(x Y N ) = LM P z=1 β z (x N µ z N, Σz ), (2.14) N where the mea µ z N ad covariace Σz N are give by 1 D k = Γk +1 Σ+1 k, µ i jk N = µi + Dk µ j +1 N µk +1, Σ i jk N = Σi + Dk Σ j +1 N Σk +1 D k T, β i, j,k = δ i α j γ k (x = µ j +1 N µk 1, Σk +1 + Σ j +1 N ) δ p α q γ r (x = µ q +1 N µr +1, Σr +1 + Σq +1 N ). p,q,r (2.15) Multi-Modal Filter ad Smoother

37 Proof. We ca rewrite (2.10) as p(x Y N ) = Step 1 p(x x +1, Y N ) p(x }{{} +1 Y N ) Step 1 } {{ } Step 2 dx +1. (2.16) The coditioig is defied as p(x x +1, Y N ) = p(x +1 x )p(x Y ). (2.17) p(x +1 Y ) The right had side of (2.17) caot be evaluated i closed form as the deomiator p(x +1 Y ) is a Gaussia mixture give by equatio (2.13). We rewrite the deomiator as, p(x x +1, Y N ) = p(x +1 x )p(x Y ) p(x+1 x )p(x Y ) dx. (2.18) The expressio p(x +1 x )p(x Y ) dx i the deomiator is idepedet of x ad ca be treated as ormalisatio costat for the probability distributio p(x x +1, Y N ). We ca obtai p(x x +1, Y N ) p(x +1 x ) p(x Y ) = β p(x +1 x ) p(x Y ) (2.19) where β is a costat idepedet of x ad defied as β = 1 p(x +1 x )p(x Y ) dx. (2.20) It should be oted that β is a fuctio of x +1 ad, hece, caot be treated as a costat whe margialisig over x +1. Substitutig (2.13) i (2.19) yields p(x x +1, Y N ) = β L γ k ϕ k (x +1 x ) k M α j ϕ j (x +1 Y ). (2.21) j This derivatio completes Step 1 of (2.16). Substitutig the result of Step 1 (2.17) i (2.16) we obtai p(x Y N ) = β L γ k ϕ k (x +1 x ) k P M δ i ϕ i (x Y ) α j ϕ j (x +1 Y N ) dx +1. (2.22) i j 2.2. Multi-Modal Smoothig 31

38 Pushig the Gaussia variables ϕ k ad ϕ i iside the sum yields p(x Y N ) = β L k γ k P i δ i M α j ϕ k (x +1 x ) ϕ i (x Y ) ϕ }{{} j (x +1 Y N ) j ϕ i,k (x,x +1 Y ) } {{ } Step 2 dx +1. (2.23) We defie the idividual compoets of Gaussia mixture ϕ as to evaluate the joit distributio ϕ i (x Y ) = (x µ i, Σi ), (2.24) ϕ k (x +1 Y ) = (x +1 µ k +1, Σk +1 ). (2.25) ϕ i,k (x, x +1 Y ) = (x µi,k, Σi,k ), (2.26) where D k = Γk +1 Σ k +1 1, (2.27) µ i,k = µi + D (x +1 µ k +1 ), Σ i,k = Σi + D (Σ k +1 )DT. The covariace Σ ad cross-covariace Γ ca be obtaied by ay method o-liear Gaussia approximatio methods like Usceted Trasform, exteded Kalma filter etc. By defiitio of β i (2.20) we write Gaussia compoet of coditioal p(x, x +1 Y ) as, Substitutig values from(2.28) i (2.23) yields ϕ i,k (x x +1, Y ) = ϕ i,k (x, x +1 Y ) β(x +1 ) (2.28) p(x Y N ) = β(x +1 ) L k γ k P i δ i M α j ϕ i,k (x x +1, Y ) ϕ }{{} j (x +1 Y N ) j Step 1 } {{ } Step 2 dx +1. (2.29) Step 2 We margialise over x +1 to obtai ϕ i, j,k (x x +1, Y ) = (x µ i, j,k j,k, Σi, ), (2.30) Multi-Modal Filter ad Smoother

39 where D k = Γk +1 i, j,k µ Σ k +1 1, (2.31) = µi + D (µ j +1 N µk +1 ), i, j,k Σ = Σi + D Σ j +1 N Σk +1 DT. To evaluate the costat β(x +1 ), we use Theorem 1 to write β(x +1 ) = (x +1 µ k +1, Σk +1 ) (2.32) ad usig (2.29) we obtai β i, j,k = δ i α j γ k (x +1 µ k +1 ), Σk +1 ) (x +1 µ j +1 N, Σj +1 N ) dx +1. (2.33) We use defiitio of product of two Gaussias, ad the fact that margialisatio is over x +1 we obtai β i, j,k = δ i α j γ k (x = µ j +1 N µk +1, Σk +1 + Σ j +1 N ) (x +1, µ +1, Σ +1 ) dx +1, (2.34) = δ i α j γ k (x = µ j +1 N µk +1, Σk +1 + Σ j +1 N ). To esure the Gaussia mixture approximatio is a valid probability distributio we must esure that all weights sum to oe. Therefore, we ormalise β accordig to β i, j,k = δ i α j γ k (x = µ j +1 N µk 1, Σk +1 + Σ j +1 N ) δ p α q γ r (x = µ q +1 N µr +1, Σr +1 + (2.35) Σq ). +1 N p,q,r The right had side of (2.29) is completed by (2.35) ad (2.31) ad completes the proof of Propositio Summary of the Multi-modal filter ad Smoother I this chapter we proposed a multi-modal filter ad multi-modal smoother based o the Gaussia sum approximatio. The importat poits of this chapter are below. We described a closed form solutio for estimatig the mixture parameters of the desity obtaied, whe mappig a Gaussia mixture through a o-liear mappig. The approximatios of a Gaussia by a Gaussia mixture is based o Cholesky factorisatio ad the approximatio ca be described as a geeralisatio of the Usceted Trasform. The relative error values i likelihood per data poit of Gaussia samples, uder Gaussia mixture as a model shows that our approximatio does ot itroduce large errors (see figure ) Summary of the Multi-modal filter ad Smoother 33

40 We used Gaussia sum filter for filterig with multi-modal beliefs, the importat property of Gaussia sum filter is that it is based o the Gaussia Sum Approximatios. A Gaussia sum approximatio allows a filter to match the optimal Bayesia filter as the umber of mixture compoets icreases (Aderso ad Moore, 2005). Hece, our filter based o Gaussia sum filter will coverge to the optimal Bayesia filter as the umber of compoets reaches ifiity. We use the Gaussia sum approximatios to devise a closed form Gaussia sum smoother. Mixture reductio: We use the symmetric KL divergece as a measure for a distace based Gaussia mixture reductio techique. It was foud to perform better tha stadard mixture reductio techiques. A comprehesive study o the various mixture reductio techiques is marked as a future work. The multi-modal filter ad smoother allow us to use the flexibility of Gaussia mixture models over Gaussia models. The parametric form makes filter determiistic ad, hece, the umerical results we obtai i ext ext chapter are reproducible Multi-Modal Filter ad Smoother

41 P(x) Variable X Time (t) a) We geerate 10 6 samples from prior p(x 1 ) propagate desity forward by usig x = f (x 1 ) to obtai 10 6 samples of p(x ) ad use Expectatio Maximisatio (EM) to approximate p(x ) with M=3 compoets P(x) Variable X Time (t) (b) We calculate desity propagatio p(x x 1 ) usig the proposed Multi-Modal filter with α = 1.3 ad the desity p(x ) p(x x 1 ) is represeted by usig M=3 compoets 2.3. Summary of the Multi-modal filter ad Smoother 35

42 Variable X (c) Usceted Trasform approximatio of p(x ) p(x x 1 ) 20 Time (t) 0.1 P EM (x) P MMF (x) Variable X (d) Absolute error betwee plots a) & b) Figure 2.3.: The Figure demostrates the ability of the proposed multi-modal approximatios for o-statioary o-liear fuctio. The plots a) ad b) show EM ad Multi-modal approximatios, to facilitate compariso both plots use same Y axis ad colour codig. Visual ispectio shows that the proposed filter is able to track predictive desities, we cofirm this by plottig the error betwee EM estimate ad multi-modal estimate. The Y axis is differet for plot d) Multi-Modal Filter ad Smoother

43 3 Numerical Results We evaluated our proposed filterig algorithm o data geerated from stadard oedimesioal o-liear dyamical systems. Both the UKF ad the Multi-Modal Filter (M-MF) use the same parameters for the Usceted Trasform, i.e. α = 1, β = 2 ad κ = 2. The prior p(x 0 ) is a stadard Gaussia p(x 0 ) = (x 0, 1). Desities i the M-MF are represeted by a Gaussia mixture with M = 3 compoets. The mea of the filtered state is estimated by the first momet of this Gaussia mixture. The employed Particle filter (PF) is a stadard particle filter with residual re-samplig scheme. The PF-UKF (Va Der Merwe et al., 2000) o the other had uses the Usceted Trasform as proposal distributio ad the UKF for filterig. Ulike our M-MF the PF-UKF uses radom samplig ad the UKF. We use the root mea square error (RMSE) ad the predictive Negative Log-Likelihood (NLL) per observatio as metrics to compare the performace of the differet filters. Lower values idicate better performace. The results i Table 3.1 were obtaied from 100 idepedet simulatios with T = 100 time steps for each of the followig system models. 3.1 Uivariate No-liear Growth Model The the Uivariate No-statioary Growth Model (UNGM) (Doucet et al., 2001) is a stadard bechmark problem for o-liear estimators No-statioary Model We tested the differet filters o a stadard system, the Uivariate No-statioary Growth Model (UNGM) from (Doucet et al., 2001) x = x x x cos(1.2( 1))+w, w (0, 1), (3.1) 1 y = x 2 + v, v (0, 1). 20 The true state desity of the o-statioary model stated above alterated betwee multi-modal ad ui-modal distributios. The switch from ui-modal to a bi-modal desity occurred whe the mea was close to zero. The quadratic measuremet fuctio makes it difficult to distiguish betwee the two modes as they are symmetric aroud zero. This symmetry posed a substatial challege for several filterig algorithms. The PF lost track of the state as N = 500 particles failed to capture the true desity especially i its tails, which led to degeeracy (Doucet et al., 2001). The particle filters i Table 3.1 are i their stadard form ad performace may improve if advaced techiques are used (Doucet et al., 2001), as ca be see from the Gibbs filter (Deiseroth ad Ohlsso, 2011). The proposed M-MF could track both modes ad, hece, led to more cosistet estimates. The RMSE performace of the multi-modal filter (M-MF) 37

44 (a) Multi-modal filter (Multi-modal smoother). RMSE 2.58 (1.28). NLL 1.79 (1.57) (b) Usceted Kalma filter (Usceted Kalma smoother). RMSE (7.734). NLL (13.44) (c) Exteded Kalma Filter (Exteded Kalma smoother). RMSE (13.42). NLL (118.34) (d) Usceted Kalma filter (Multi-modal Smoother). RMSE (1.33). NLL (1.577) Figure 3.1.: A example trajectory from table 3.1 for o-statioary quadratic measuremet fuctio (UNGM). I Figure d) the forward pass is UKF ad the multi-modal smoother is used for backward pass. The true state is represeted by dashed red lie, the shaded regio represets 95% cofidece area for filterig distributio ad solid gree lies represet 95% cofidece area for smoothig distributio Numerical Results

45 Statioary No-Statioary h(x) = 5 si(x) h(x) = x 2 /20 h(x) = 5 si(x) RMSE NLL RMSE NLL RMSE NLL EKF 7.59 ± ± ± ± ± ± UKF ± ± ± ± ± ± 20.6 M-MF 0.97 ± ± ± ± ± ± 8.04 PF 2.4 ± 0.03 N/A 3.5 ± ± ± 3.6 N/A Gibbs Filter 3.62 ± ± ± ± ± ± 0.08 Table 3.1.: Average performaces of the filters are show alog with stadard deviatio. Lower values are better. The M-MF filter performs better tha the stadard filterig methods. Statioary No-Statioary h(x) = 5 si(x) h(x) = x 2 /20 h(x) = 5 si(x) RMSE NLL RMSE NLL RMSE NLL EKS ± ± ± ± ± ± 98.5 UKS ± ± ± ± ± ± 25.6 M-MS 0.92 ± ± ± ± ± ± 9.06 Table 3.2.: Average performaces of the smoothers are show alog with stadard deviatio. Lower values are better. The M-MS filter performs better tha the stadard smoothig methods. is sigificatly better tha the UKF, with the M-MF outperformig the UKF i terms of a lower mea error ad stadard deviatio. The mai advatage of the M-MF is its ability to capture the ucertaity appropriately. The NLL values of the M-MF were sigificatly better tha the UKF eve whe the same parameters are used to calculate the Usceted Trasform, see Table 3.1. We tested the UNGM with a alterative measuremet fuctio h(x) = 5 si(x). For this fuctio, the performace of the EKF is best i terms of RMSE, sice its estimates are more stable. The proposed M-MF could track multiple modes, which resulted i sigificatly better performace i terms of NLL. Moreover, the filter performace was cosistetly stable, idicated by the small stadard deviatio values for the NLL measure Statioary Model The statioary model is based o the Uiform No-statioary Growth Model (UNGM) described above by (Kitagawa, 1996) ad ca be described by the followig equatios: f (x) = x x + w, 1 + x 2 w (0, 1), (3.2) h(x) = 5 si(x) + v, v (0, 1). (3.3) We see from the results i Table 3.1 that the UKF was outperformed by all other determiistic filters. The failures of the UKF ad the PF-UKF are attributed to their overcofidet predic Uivariate No-liear Growth Model 39

46 tios for siusoidal fuctios, which cofirms the results i (Deiseroth et al., 2012). The EKF approximates these siusoidal fuctios better but both the UKF ad EKF fail to capture the multi-modal ature of system dyamics. Thus, the EKF ad UKF are icosistet for the model ad settigs used i this experimet. The proposed M-MF o the other had performed cosistetly better i terms of RMSE ad NLL values (see table 3.1). Moreover, the small stadard deviatio of the NLL suggests that our proposed M-MF is cosistet ad stable Numerical Results

Figure 3.2.: Lorez time series (reproduced from http://math.gmu.edu/~rsachs/tj/ ) 3.

47 Figure 3.2.: Lorez time series (reproduced from ) 3.2 Lorez System The Lorez system is a o-liear dyamical system proposed by Lorez i (Lorez, 1963) as a solutio to the heat covectio flow i atmosphere. This system is a 3 dimesioal, oliear, determiistic dyamical system defied by coupled deferetial equatios (3.4) (Lorez, 1963) (Hilbor, 2000). The system is ustable for certai values of parameters ad ca be described as determiistic chaos (Hilbor, 2000). The chaotic behaviour is characterised by a ustable system highly sesitive to iitial values, ad a small deviatio may result i a completely differet evolutio of system states (Lorez, 1963). We use the parameter values ρ = 28, σ = 10, ad β = 8/3 as described by Lorez i (Lorez, 1963). The system is bistable ad describes a butterfly shape usually associated with o-liear chaos (Hilbor, 2000). The system is described by the differetial equatios x = σ(y x), (3.4) t y = x(ρ z) y, t z t = x y βz. We defie the system fuctio as a solutio to the differetial equatios defied by coupled equatios f (x, y, z) i (3.4). The solutios are obtaied by a solver i M atlab. We discretise 3.2. Lorez System 41

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple