Maximum Likelihood Parameter Estimation in General State-Space Models using Particle Methods. -

Size: px

Start display at page:

Download "Maximum Likelihood Parameter Estimation in General State-Space Models using Particle Methods. -"

Christopher Austen Harris
6 years ago
Views:

1 Maximum Likelihood Parameter Estimatio i Geeral State-Space Models usig Particle Methods George Poyiadjis, 2 Araud Doucet ad Sumeetpal S. Sigh Departmet of Egieerig, Uiversity of Cambridge, Cambridge CB2 PZ, UK. 2 Departmets of Computer Sciece ad Statistics, Uiversity of British Columbia, Vacouver, BC, Caada. gp243,sss4@cam.ac.uk - araud@stat.ubc.ca KEY WORDS: particle filter ; parameter estimatio ; geeral state-space model; filter derivative ; hah-jorda decompositio. Abstract A large umber of time series ca be described by oliear, o-gaussia state-space models. While state estimatio for these models is ow routiely performed usig particle filters, maximum likelihood estimatio of the model parameters is much more challegig. I this paper, we preset ew umerical methods to approximate the derivative of the optimal filter. We use this to perform batch ad recursive maximum likelihood parameter estimatio ad trackig by maximizig the likelihood through a gradiet ascet method. We geeralize the method to iclude the secod derivative of the optimal filter. This provides estimates of the Hessia of the likelihood ad ca be used to accelerate the gradiet ascet method.. Itroductio May time series problems arisig i statistics, egieerig ad applied scieces are cocered with the estimatio of the state of a dyamic model whe oly iaccurate observatios are available. Most real-world problems are oliear ad o-gaussia, therefore optimal state estimatio i such problems does ot admit a closed form solutio. Recetly, there has bee a surge of iterest i Sequetial Mote Carlo SMC methods, also kow as particle filterig methods, to perform sequetial state estimatio i o-liear o-gaussia models [7], [8], [9], [2], [5]. SMC methods are a set of simulatio-based techiques that recursively geerate ad update a set of weighted samples, which provide approximatios to the posterior probability distributios of iterest. Uder the assumptio that the model parameters are kow, umerous SMC algorithms have bee proposed over the last decade; see [8] for a review. I real-world applicatios however, the model parameters deoted θ, are ofte ukow ad also eed to be estimated from the data. Maximum likelihood ML parameter estimatio usig SMC methods still remais a ope problem, despite various earlier attempts i the literature. The majority of the proposed SMC-based parameter estimatio methods rely o augmetig the hidde state to iclude the ukow parameter ad castig the problem as a filterig oe [], [6], [2]. Static parameter estimatio with SMC is the implemeted by either itroducig artificial dyamics for the parameters or MCMC rejuveatio steps. The latter method is more elegat, sice the model of iterest is ot artificially altered. However, the MCMC steps rely o sufficiet statistics that are based o a approximatio of the path posterior desity p θ x : Y : of the hidde process up to time give the observatios up to time. This desity caot be properly approximated usig SMC methods, for a fixed umber of particles, ad the sufficiet statistics degrade over time due to error accumulatio []. I this paper we preset a origial maximum likelihood method that is based o a direct particle approximatio of the derivative of the optimal filter. Previous attempts to approximate the filter derivative usig particle methods - e.g. [5] [] ad [6] - were based implicitly o the sequece of path desities {p θ x : Y : }. As i the case of filterig-based parameter estimatio, the approximatio errors they produce icrease with the data legth. The methods we proposed here to approximate the filter derivative are based o the sequece of margial distributios {p θ x Y : } ad hece do ot suffer from the aforemetioed problem. We use the filter derivative approximatio to compute the log-likelihood gradiet ad we combie it with a gradiet ascet algorithm to geerate maximum likelihood estimates of the model parameters. The approach is geeralized to compute a particle approximatio to the secod derivative of the filter. This leads to a estimate of the Hessia of the likelihood that ca be used to scale the gradiet compoets ad accelerate the covergece of the gradiet algorithm. Additioally it may allow us to compute cofidece regios for the estimated parameters. The rest of the paper is orgaized as follows: I Sectio 2 the statistical model of iterest is preseted ad the optimal filter ad its first ad secod derivatives are described. I Sectio 3 we review the particle filter algorithm ad derive particle methods for the derivatives of the filter. I For the rest of this paper we will adopt the followig otatio: for ay sequece {z k } ad radom process {Z k }, we defie z i:j = z i, z i+,..., z j ad Z i:j = Z i, Z i+,..., Z j, respectively.

2 Sectio 4 we describe how the first ad secod filter derivative approximatios ca be used to perform ML parameter estimatio i a recursive ad a batch maer. Sectio 5 presets simulatio results showig the performace of the proposed algorithm. Fially i Sectio 6 we discuss the results ad provide some cocludig remarks. 2. Optimal Filter ad its Derivatives 2. State-Space Models Let {X } ad {Y } be R x ad R y -valued stochastic processes defied o a measurable space Ω, F. These stochastic processes deped o a parameter θ Θ, where Θ is a ope subset of R θ. The process {X } is a uobserved hidde Markov process of iitial desity µ; i.e. X µ, ad a Markov trasitio desity f θ x x; i.e. X + X = x f θ x. Although {X } is ukow, it is partially observed through the observatio process {Y }. It is assumed that the observatios coditioed upo {X } are idepedet with margial desity g θ y x ; i.e. Y X = x g θ x. 2 This class of models icludes may oliear ad o- Gaussia time series models such as X + = ϕ θ X, V +, Y = ψ θ X, W where {V } ad {W } are mutually idepedet sequeces of idepedet radom variables ad ϕ θ, ψ θ determie the evolutio of the state ad observatio processes. 2.2 Optimal Filter Derivatives Assume for the time beig that θ is kow. I such a case, sequetial iferece about the hidde process X is typically based o the sequece of joit posterior distributios {p θ x : Y : }. This summarizes all the relevat iformatio available about X :, up to time. Usig a importace samplig approach with a arbitrary importat desity q θ x Y, x, whose support icludes the support of g θ Y x f θ x x, it ca be easily show that the joit posterior desity satisfies the recursio p θ x : Y : = α θ x :, Y p θ Y Y :: q θ x Y, x p θ x : Y :, where the importace weights are give by 3 α θ x :, Y = g θ Y x f θ x x. 4 q θ x Y, x I most problems, we are iterested i the margial p θ x Y :, which is kow as the filterig desity. This ca be expressed as p θ x Y : α θ x :, Y q θ x Y, x p θ x Y : dx I applicatios such as parameter estimatio ad stochastic cotrol, we are ofte iterested i optimizig differet performace criteria that require a approximatio of the filter derivatives. I the cotext of parameter estimatio, we will cosider the first two derivatives of the optimal filter with respect to θ, amely p θ x Y : ad 2 p θ x Y :. To simplify the otatio, let p θ x Y : 5 ξx, Y : ξx, Y : dx 6 where ξ θ x, Y : = g θ Y x f θ x x p θ x Y : dx = g θ Y x p θ x Y :. Uder regularity assumptios, the first ad secod derivative of 6 leads to the followig recursios 2, ad 7 p θ x Y : = ξ θx, Y : ξθ x, Y : dx ξθ x, Y : dx p θ x Y : 8 ξθ x, Y : dx 2 p θ x Y : = 2 ξ θ x, Y : 2 p θ x Y : ξθ x, Y : dx ξθ x, Y : dx 2 ξ θ x, Y : dx p θ x Y :, ξθ x, Y : dx ξθ x, Y : dx where ξ θ x, Y : = g θ Y x 9 f θ x x p θ x Y : [ log g θ Y x + log f θ x x ] dx + g θ Y x f θ x x p θ x Y : dx 2 The st derivative p θ x Y : is a θ vector, where the i th etry is give by p θx Y :. The 2d derivative θ 2 p i θ x Y : is a θ θ matrix, where the i, j th etry is give by 2 p θ x Y :. θ i θ j 2

3 ad 2 ξ θ x, Y : = g θ Y x f θ x x { [ log g θ Y x + log f θ x x ] log g θ Y x + 2 log f θ x x } p θ x Y : dx + 2g θ Y x f θ x x [ log g θ Y x + log f θ x x ] p θ x Y : dx + g θ Y x f θ x x 2 p θ x Y : dx. Except i some simple cases, o closed-form expressio ca be obtaied for either of the above recursios ad oe typically resorts to umerical approximatios. Our mai objective i this paper is to derive particle methods to approximate p θ x Y : ad 2 p θ x Y :. 3. Particle Methods for the Filter Derivatives 3. Particle Filters Particle methods are widely used to umerically approximate the filterig recursio i 3 by meas of a weighted empirical distributio of a set of N samples, termed as particles. This empirical distributio is propagated sequetially as follows. [ Assume that at time ] a set of particles X :N : X :,..., XN : with correspodig weights ã :N ã,..., ãn are available, with [ ] j= ãj =. We further assume that this weighted particle set is distributed approximately accordig to the joit desity p θ x : Y :. A stadard way to approximate the joit desity at the ext time step is to exted the path usig X i q θ x Y : = ã i q θ Y, X i. 2 Samplig from 2 is achieved by first samplig the discrete idex i usig a stadard resamplig algorithm, such as stratified or multiomial resamplig. The, the ew particle X i is geerated accordig to X i q θ Y, X ϕi, 3 where ϕ i is the discrete idex obtaied from the resamplig mechaism. Note[ that the ew ] set of equally weighted particles X :N : = X :,..., XN :, with X i : = X ϕi :, Xi will be approximately distributed accordig to the joit desity q θ x Y, x p θ x : Y :. Substitutio of this approximatio ito 3 leads to the updated empirical distributio where p θ x : Y : = a i = α θ X ϕi, Xi, Y ã i δ ad ã i = x : X i : a i j= aj, 4. 5 I practice, a particle approximatio of the filterig desity p θ x Y : i 5 is obtaied by margializatio of Filter derivative approximatios The filter derivatives p θ x Y : ad 2 p θ x Y : are siged measures; i.e. they ca take positive ad egative values ad itegrate to zero. Empirical approximatios of these measures usig particle methods are still possible, provided that oe uses the same set of particles as i the filter approximatio, but with differet weights. This idea that was first itroduced i [5], computes the particle approximatios p θ x Y : ad 2 p θ x Y : by propagatig the weighted particles o the path space ad margializig the expressios p θ x : Y : = 2 p θ x : Y : = ã i β i δ x : X i :, 6 ã i λ i δ x : X i :, 7 where β i, λ i ca be positive or egative. As already metioed, the drawback of this approach stems from the fact that it relies o the path, which is a space of growig dimesios. Cosequetly, as the legth of the path icreases, the variace of pθ x : Y :, 2 p θ x Y : will icrease ad the approximatios of their margials will degrade severely. Aother effect that deteriorates the performace of a path-based particle algorithm results from the arragemet of the particle mass o the state space. The derivative of a probability measure is a siged measure ν that ca be expressed as ν = c π π 2, where π, π 2 are two probability measures ad c is a oegative costat. This approach, kow as weak derivative decompositio, allows a arbitrary umber of possible decompositios for a give siged measure. The path-based particle method discussed i the previous sectio decomposes the derivative siged measures by approximatig the two probability measures π ad π 2 by a set of positively ad egatively weighted particles o overlappig regios 3

4 of the state space. To see this, cosider a poit x ad a eighborhood of it, B x, for which the sig measure satisfies ν >. If we use the particle represetatio of 6 to approximate ν, a estimate of I Bx x ν x dx becomes equal to ãi β i I Bx X i. While this is a valid approximatio, we may have that for two particles k ad l belogig to B x, the weights are ot of the same sig, i.e., ã k β k < while ã l β l >. I such a case we say that the particles mix, as illustrated i the top plot of Figure for the case where R x = R θ = R. This implies that may particles with opposite sigs ca ed up approximatig regios of the state space that have low total mass see for example low mass regio aroud the value x = i the top plot of Figure. This effect builds up due to the sequetial ature of the algorithm ad the implemetatio becomes less accurate ad iefficiet as the data legth icreases. We propose here a origial method to approximate the optimal filter derivatives that is based o a direct poitwise approximatio of 8 ad 9 ad hece does ot suffer from the limitatios discussed i the previous paragraphs. This method essetially itegrates aalytically a discrete approximatio of the latet variable ad will therefore have a lower variace. From a weak derivatives poit of view, this is equivalet to a a particle implemetatio of a Hah- Jorda decompositio, which esures that the probability measures of the decompositio are cocetrated o disjoit regios of the state. As a result, the algorithm does ot suffer from mixig of the positively ad egatively weighted particles, as illustrated i the bottom plot of Figure. 2 Path based particle approximatio of siged measure True siged measure 2 Particle Approx.+ve Particle Approx. ve Poitwise particle approximatio of siged measure Hah Jorda decompositio Figure : Top plot: Histogram represetatio of a pathbased particle approximatio of pθ x Y : w.r.t. a oedimesioal parameter θ. Bottom plot: Poit-wise particle approximatio of the same siged measure that maitais the positive ad egative weights o separate regios of the state support Hah-Jorda decompositio. 3.3 Particle algorithm I this sectio we describe the proposed sequetial method to approximate the first two derivatives of the optimal filter. Assume that at time, we have particle approximatios of p θ x Y :, p θ x Y : ad 2 p θ x Y : of the form p θ x Y : = p θ x Y : = 2 p θ x Y : = ã i δ x X i, 8 ã i βi δ x X i, 9 ã i λi δ x X i. 2 Substitutio of these ito 7, ad leads to the followig poit-wise approximatios ad ξ θ x, Y : = ξ θ x, Y : = k= ã k g θy x f θ x X k, 2 k= ã k g θy x f θ x X k ] [ log g θ Y x + log f θ x X k + β k, 22 2 ξ θ x, Y : = k= ã k g θy x f θ x X k { [ ] 2 log g θ Y x + log f θ x X k + 2 log g θ Y x + 2 log f θ x X k +2β k [ log g θ Y x + log f θ x X k ] } + λ k 23 As i the stadard particle filter, we geerate a set of particles X i, for i =,..., N, usig 2. Evaluatig the poit-wise approximatios i 2, 22 ad 23 at poits yields the followig particle approximatios X i ξ θ x, Y : = N ξ θ x, Y : = N a i δ i X ρ i δ i X x, 24 x ad 25 4

5 where a i 2 ξ θ X i,y : q θ X i 2 ξ θ x, Y : = N = ξ θx i,y: q θ X i, ρ i Y : π i δ i X = ξ θ X i,y: q θ X i x, 26 Y : ad π i =. Substitutio of the last three approximatios ito 6, 8 ad 9 Y : gives where ã i = p θ x Y : = p θ x Y : = 2 p θ x Y : = a i j= aj ã i δ i X ã i β i, ã i β i = ã i λ i π i = 2ã i j= aj β i ã i δ i X λ i δ i X ρ i j= aj j= ρj j= aj x, 27 x ad 28 x, 29 ã i ã i j= ρj j= aj j= πj j= aj Note that 27 gives a filterig byproduct of the algorithm. Compared to stadard path-based particle filters, the poit-wise particle filter p θ x Y : requires O N 2 operatios istead of O N. However, for a fixed umber of particles N, it will outperform path-based methods due to the aalytical itegratio ivolved. 4. ML Parameter Estimatio Let us ow cosider the geeral state space model of sectio 2., where the model parameter θ is ukow. We will assume that the model that geerates the observatio sequece {Y } evolves accordig to a true but ukow static parameter θ, i.e. X X = x f θ. x 3 Y X = x g θ. x. 3 Our objective is to idetify θ based o {Y }. We propose here two gradiet algorithms to perform maximum likelihood estimatio. These are based o a gradiet ascet method that utilizes the estimates of the derivatives of the filter that were preseted i the previous sectio. The first method is a recursive algorithm that updates the parameter estimate as soo as a ew observatio is received. This is based o the maximizatio of a average log-likelihood criterio ad requires a large umber of observatios to be available. A batch versio of the algorithm is also preseted. This directly maximizes the log-likelihood of some available set of observatios Y :.,. 4. Recursive ML A stadard approach to Recursive ML RML estimatio cosiders a series of log-likelihood fuctios {log p θ Y :k } k, where log p θ Y :k = k = log p θy Y : [7]. The expressio p θ Y Y : is kow as the predictive likelihood ad ca be writte as p θ Y Y : = g θ Y x f θ x x p θ x Y : dx : 32 Uder suitable regularity coditios described i [2] it ca be show that the average log-likelihood coverges to the followig limit lim k k + k log p θ Y Y : = l θ, 33 = where l θ is give by l θ = log R y PR x g θ y x µ x dx λ θ,θ dy, dµ. Here P R x is the space of probability distributios o R x ad λ θ,θ dy, dµ is the joit ivariat distributio of the couple Y, p θ x Y :. Note that λ θ,θ is a fuctio of both θ ad θ, sice the observatio compoet evolves accordig to the true parameter θ, while the predictio filter compoet evolves accordig to θ. Followig the approach used i [4] for fiite state space models, it ca be show that l θ admits θ as a global maximum. The fuctio l θ does ot have a aalytical expressio ad we do ot have access to it. Nevertheless, idetificatio of θ ca still be achieved based o the ergodicity property i 33, which provides us with a set of accessible fuctios log p θ Y Y : that coverge to l θ. Oe way to exploit this i order to maximize l θ, is to use a Stochastic Approximatio SA algorithm to update the parameter estimate at time usig the recursio θ = θ + γ log p θ: Y Y :, 34 where θ is the parameter estimate at time ad log p θ Y Y : deotes the gradiet of log p θ Y Y : 3. Provided that the step size θ is a positive o-icreasig sequece, such that γ = ad γ 2 <, it ca be show that θ will coverge to the set of global or local maxima of l θ. 3 The SA requires a estimate of log p θ Y Y : with θ held fixed. I our problem, θ caot be fixed, sice we are estimatig it recursively. However, sice θ chages slowly, a stadard approach [8] is to reuse the previous particle calculatios that were based o θ : 2 ad use the parameter estimate θ at time. 5

6 The remaiig step i the developmet of the algorithm is to obtai a umerical approximatio to log p θ Y Y :. This follows directly from the the expressios for p θ x Y : ad p θ x Y : i Sectio 3.3, sice compariso of 7 ad 32 gives p θ Y Y : = ξθ x, Y : dx. Usig the particle approximatios of ξ θ x, Y : ad ξ θ x, Y : i 24 ad 25 we obtai log p θ Y Y : = p θ Y Y : p θ Y Y : 4.. Adaptive SA = ξθ x, Y : dx ξθ x, Y : dx = j= ρj j= aj Adaptive steps The SA of 34 ca be thought of as a stochastic geeralizatio of the steepest descet method. Faster covergece ca be achieved if oe employs a Newtoia method that is based o a estimate of the Hessia of the objective fuctio ad leads to a asymptotically optimal search directio [3]. I geeral, estimatio of the Hessia is o-trivial ad fiite differece approximatios are typically used to approximate it [9]. I our framework, the Hessia of the log-likelihood ca be straightforwardly estimated usig the particle approximatios of the optimal filter ad its first ad secod derivatives i 27, 28 ad 29. More specifically, the θ θ Hessia matrix estimate at time will be give by 2 log p θ Y Y : = 2 p θ Y Y : p θ Y Y : = j= πi j= aj 2 pθ Y Y : p θ Y Y : 2 j= ρj. j= aj This allows oe to compute the asymptotic value of the Hessia usig, for example, a recursio of the form H = H + Ĥ H, 35 + where Ĥ = 2 log p θ Y Y :. This is simply a recursive calculatio of the sample mea H up to time. By costructio, the true Hessia is a egative defiite, symmetric matrix whose iverse ca provide a adaptive step i 34. Direct iversio of the estimated value H will be possible oly if this matrix is egative defiite. I practice this is usually esured by projectig H oto the set of egative defiite matrices - see [4] ad [9] for details. The Newto-type SA recursio that ca replace 34 will take the form. This adaptive SA is particularly attractive, i terms of covergece acceleratio, i the termial phase of the algorithm, where the steepest descet-type method slows dow. Cofidece Regios From a practical poit of view, it is ofte desirable to assess the accuracy of the parameter estimate by meas of cofidece itervals or more geerally cofidece regios. I priciple, cetral limit theorems that have bee established for a umber of stadard SA algorithms allow the computatio of cofidece regios for the estimates - see [3] for detailed results. Oe of the difficulties is that the covariace matrix of the limitig multivariate ormal distributio depeds o the iverse of the Hessia of the objective fuctio. The proposed method provides estimates for these quatity through RML Parameter Estimatio Algorithm The Recursive ML estimatio is summarized as follows:. Samplig Step For i =,..., N, sample 4 X i q θ: Y : = N ãi q θ 2. Weight Calculatio Compute a i ρ i π i = ξ θ: X i,y:, q θ: X i Y : = ξ θ: X i,y : q θ: X i = 2 ξ θ: X i,y: q θ: X i Y : ad Compute the weights ã i ã i β i ã i λ i = ρi j= aj = πi j= aj Y, X i Y : usig 2, 22 ad 23. = ai j= aj ã i j= ρj j= aj 2ã i 3. Parameter Update Step Ĥ = j= πi j= aj H = H + + θ = θ + γ H N β i 2 j= ρj j= aj, ad j= ρj j= aj ã i j= πj j= aj Ĥ H, H = Ψ H j= ρj j= aj Remark : The fuctio Ψ H is a mappig to the set of egative defiite matrices, based o diagoal modificatios to the Hessia. Remark 2: This algorithm guaratees that αi β i = αi λ i =. Remark 3: Eve if θ Θ, it is possible that for the updated value we have θ / Θ. A stadard approach to θ = θ + γ H log p θ: Y Y : Note that i this approach the resamplig step is icluded whe we sample from q θ x Y :. 6

7 prevet such divergece is to reproject the parameter value iside Θ = θ [ ] µ= θ mi µ, θµ max. 4.2 Batch ML I cases where a set of observatios Y : is available, we describe here a batch off-lie versio BML of the previous algorithm. This algorithm maximizes the log-likelihood log p θ Y : usig a SA recursio at iteratio m give by θ m = θ m + γ m log p θm Y :, 37 where log p θm Y : is a estimate of the derivative of the log-likelihood evaluated at poit θ m. This estimate ca be obtaied usig a modified versio of the RML method as follows: At iteratio m, we ru the RML algorithm from time to time by omittig the parameter update step ad keepig the parameter value fixed at the curret estimated value θ m. At the ed of the ru, a Mote Carlo estimate of the derivative of the log-likelihood ca be computed as log p θm Y : = k= p θm Y k Y :k = p θm Y k Y :k k= j= ρj k j= aj k where Y : =. This is used to update the parameter to θ m, as give by Numerical Study The RML ad BML algorithms were tested, based o artificial ad real observatios. 5. Liear Gaussia State Space Model We first cosider the followig scalar liear Gaussia state space model σ 2 V X + = φx + σ V V +, X N, φ 2 Y = X + σ W W i.i.d. i.i.d. where V N, ad W N,. We are iterested i estimatig the parameter θ φ, σ V, σ W. I such a model, the optimal filter is give by the Kalma filter ad exact expressios for the first ad secod derivative of the filter ca be obtaied. This allows us to compare our umerical methods with the groud truth. The RML algorithm was implemeted usig the optimal importace desity q θ x Y, x g θ Y x f θ x x ad N = particles. Figure 2 displays the aalytical posterior desity ad its derivatives with respect to φ ad compares them with the particle approximatios we obtaied. The aalytical ad umerical values.4.2 px Y : φ px Y : φ px Y : Particle Approx. ve Particle Approx. ve True measure Figure 2: Liear Gaussia state space example: Aalytical optimal filter ad its first ad secod derivative w.r.t. φ ad the particle approximatios obtaied usig the proposed method. of the score vector log p θ Y Y : ad the Hessia matrix 2 log p θ Y Y : were compared up to =. These were almost idistiguishable. A example of the compariso results obtaied for the compoet 2 log p θ Y Y : φ 2 is show i Figure log p φ Y Y : Aalytical Results Particle Approxim Figure 3: Aalytical ad umerical results for 2 log p θ Y Y : / φ 2 for the liear Gaussia state space model usig N =. 7

8 5.2 Stochastic Volatility Model The RML algorithm was implemeted usig the followig Stochastic Volatility model σ 2 X + = φx + σv +, X N, φ 2 Y = β exp X /2 W i.i.d. i.i.d. where V N, ad W N,. We are iterested i estimatig the true parameter θ σ, φ, β =.35,.85,.65 from simulated data, where Θ =, Ξ,, Ξ with Ξ =. We use q θ x Y, x = f θ x x ad N = particles. As it ca be see for the results i Figure 4, the estimate coverged to a value θ i the eighborhood of the true parameter. We the applied our BML method to the poud/dollar daily exchage rates; see [9]. This time series cosists of 945 data poits. The parameter estimates for M = iteratios usig N = particles are show i Figure 5. Our results are cosistet with results obtaied i [9] Time steps iteratios Figure 5: Sequece of BML parameter estimates for θ m = σ m, φ m, β m ad N =. From top to bottom: φ m, β m ad σ m True φ Estimate.5.5 Time steps x Figure 4: Sequece of RML parameter estimates for θ = σ, φ, β ad N =. From top to bottom: φ, β ad σ. The true values were θ =.35,.85, Parameter trackig A uique advatage of the RML algorithm of sectio 4., is its ability to track variatios i θ. A stadard approach to track a time-varyig parameter is to set the step-size to a small positive umber γ, istead of a decreasig sequece γ [7]. The choice of value for γ will be a trade-off betwee trackig capability large γ ad low estimatio oise aroud the parameter small γ. A example of the trackig performace of the RML algorithm based o the liear Gaussia state space model, havig a time-varyig drift parameter φ, is show i Figure 6. x 4 Figure 6: RML algorithm trackig performace for timevaryig φ usig N = particles. 6. Discussio This paper has preseted origial particle methods to estimate the first ad secod derivative of the optimal filter i geeral state-space models. The methods use o-stadard particle methods to approximate the Hah-Jorda decompositio of the resultat siged measures. This allows the calculatio of accurate approximatios to the score vector ad the Hessia matrix of the log-likelihood with respect to the model parameters. Based o this, we propose a recursive ad a batch algorithm to perform ML parameter estimatio usig a gradiet ascet method. The Hessia estimate ca be used as a adaptive step-size i the gradiet ascet recursio to provide faster covergece of the algorithm. 8

9 The computatioal cost of the proposed particle methods for the filter derivatives is quadratic i the umber of particles. Fast computatio methods ca however be employed to address this issue [3]. The proposed methods ca also be exteded to the case where it is possible to itegrate aalytically a subset of the state variables, such as the class of partially observed liear Gaussia state-space models ad coditioally liear Gaussia state-space models [2], [8]. Such extesios ca provide efficiet particle methods that reduce the variace of the Mote Carlo estimates. Refereces [] Adrieu C., Doucet A. ad Tadic V.B. 25 Olie parameter estimatio i geeral state space models, Proc. IEEE CDC/ECC [2] Adrieu, C. ad Doucet, A. 22. Particle filterig for partially observed gaussia state space models. J. Royal Statist. Soc. B, 64, [3] Beveiste, A., Métivier, M. ad Priouret, P. 99. Adaptive Algorithms ad Stochastic Approximatio. New York: Spriger-Verlag. [4] Bertsekas D. 999, Noliear Programmig, 2d Editio, Athea Scietific. [5] Cérou F., LeGlad F. ad Newto N.J. 2 Stochastic particle methods for liear taget equatios. i Optimal Cotrol ad PDE s - Iovatios ad Applicatios eds. J. Mealdi, E. Rofma & A. Sulem, pp , IOS Press, Amsterdam. [6] Doucet, A. ad Tadić, V.B. 23. Parameter estimatio i geeral state-space models usig particle methods. A. Ist. Stat. Math., 55, [7] Doucet A., Godsill S.J. ad Adrieu C. 2 O sequetial Mote Carlo samplig methods for Bayesia filterig, Statist. Comput., vol., pp [8] Doucet, A., de Freitas, J.F.G. ad Gordo N.J. eds. 2. Sequetial Mote Carlo Methods i Practice. New York: Spriger-Verlag. [] Guyader, A., LeGlad, F. ad Oudjae, N. 23 A particle implemetatio of the recursive MLE for partially observed diffusios. Proceedigs of the 3th IFAC Symposium o System Idetificatio, [2] Kitagawa G. 996 Mote Carlo filter ad smoother for o-gaussia oliear state space models. J. Comput. Graph. Statist., 5, -25. [3] Klaas, M., Lag, D., Hamze, F. ad de Freitas, N. 24 Fast probability propagatio: Beyod beliefs. Techical report, CS Departmet, Uiversity of British Columbia. [4] LeGlad F. ad Mevel L. 997 Recursive idetificatio i hidde Markov models, Proc. 36th IEEE Cof. Decisio ad Cotrol, pp [5] Liu J.S. ad Che R. 998 Sequetial Mote Carlo methods for dyamic systems, J. Am. Statist. Ass., vol. 93, pp [6] Liu J. ad West M. 2 Combied parameter ad state estimatio i simulatio-based filterig, I Sequetial Mote Carlo Methods i Practice eds Doucet A., de Freitas J.F.G. ad Gordo N.J. New York: Spriger-Verlag,. [7] Ljug L. ad Söderström T. 983, Theory ad Practice of Recursive Idetificatio, MIT Press, Cambridge. [8] Pflug G.C. 996, Optimizatio of Stochastic Models. Kluwer. [9] Spall J. C. 2, Adaptive stochastic approximatio by the simultaeous perturbatio method, IEEE Tras. Autom. Cotr., vol. 45, pp [2] Storvik G. 22, Particle filters i state space models with the presece of ukow static parameters, IEEE. Tras. Sigal Processig, vol. 5, pp [2] Tadić V.B. ad Doucet A. 25 Expoetial forgettig ad geometric ergodicity for optimal filterig i geeral state-space models. Stochastic Processes ad Their Applicatios, vol. 5, pp [9] Durbi, J. ad Koopma, S. J. 2. Time series aalysis of o-gaussia observatios based o state space models from both classical ad Bayesia perspectives with discussio. J. R. Statist. Soc. B, 62, [] Fearhead P. 22 MCMC, sufficiet statistics ad particle filter, J. Comp. Graph. Stat., vol., pp

Gradient-free Maximum Likelihood Parameter Estimation with Particle Filters

Gradient-free Maximum Likelihood Parameter Estimation with Particle Filters Proceedigs of the 2006 America Cotrol Coferece Mieapolis, Miesota, USA, Jue 4-6, 2006 ThB082 Gradiet-free Maximum Likelihood Parameter Estimatio with Particle Filters George Poyiadjis, Sumeetpal S Sigh