SMC 2 : an efficient algorithm for sequential analysis of state-space models

Similar documents
arxiv: v1 [stat.co] 1 Jun 2015

arxiv: v3 [stat.co] 27 Jan 2012

Controlled sequential Monte Carlo

An introduction to Sequential Monte Carlo

Inference in state-space models with multiple paths from conditional SMC

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Pseudo-marginal MCMC methods for inference in latent variable models

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods (for DSGE Models)

Kernel adaptive Sequential Monte Carlo

Kernel Sequential Monte Carlo

Notes on pseudo-marginal methods, variational Bayes and ABC

The Hierarchical Particle Filter

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Bayesian Computations for DSGE Models

Inexact approximations for doubly and triply intractable problems

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

LTCC: Advanced Computational Methods in Statistics

Sequential Monte Carlo Samplers for Applications in High Dimensions

Evidence estimation for Markov random fields: a triply intractable problem

Divide-and-Conquer Sequential Monte Carlo

LECTURE 15 Markov chain Monte Carlo

Sequential Monte Carlo Methods

Auxiliary Particle Methods

Sequential Monte Carlo methods for system identification

Monte Carlo Approximation of Monte Carlo Filters

STA 4273H: Statistical Machine Learning

arxiv: v1 [stat.co] 1 Jan 2014

Adaptive Monte Carlo methods

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Sequential Bayesian Updating

An efficient stochastic approximation EM algorithm using conditional particle filters

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Monte Carlo methods for sampling-based Stochastic Optimization

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

17 : Markov Chain Monte Carlo

Approximate Bayesian Computation and Particle Filters

An ABC interpretation of the multiple auxiliary variable method

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Computational statistics

A new iterated filtering algorithm

An Brief Overview of Particle Filtering

Bayesian Parameter Inference for Partially Observed Stopped Processes

Computer Intensive Methods in Mathematical Statistics

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Introduction to Machine Learning

On the conjugacy of off-line and on-line Sequential Monte Carlo Samplers. Working Paper Research. by Arnaud Dufays. September 2014 No 263

Tutorial on ABC Algorithms

A Note on Auxiliary Particle Filters

Answers and expectations

Perfect simulation algorithm of a trajectory under a Feynman-Kac law

Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution

On Markov chain Monte Carlo methods for tall data

Lecture 4: Dynamic models

Sequential Monte Carlo samplers

An Introduction to Particle Filtering

Basic math for biology

Learning Static Parameters in Stochastic Processes

Flexible Particle Markov chain Monte Carlo methods with an application to a factor stochastic volatility model

Sampling from complex probability distributions

Sequential Monte Carlo Methods in High Dimensions

POMP inference via iterated filtering

On some properties of Markov chain Monte Carlo simulation methods based on the particle filter

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Contrastive Divergence

Particle Filters: Convergence Results and High Dimensions

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

arxiv: v1 [stat.me] 30 Sep 2009

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Particle Markov chain Monte Carlo methods

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computer Intensive Methods in Mathematical Statistics

Sequential Monte Carlo Methods

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Nonparametric Drift Estimation for Stochastic Differential Equations

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

Markov Chain Monte Carlo Methods for Stochastic

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Lecture 7 and 8: Markov Chain Monte Carlo

State-Space Methods for Inferring Spike Trains from Calcium Imaging

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

Sequential Monte Carlo Algorithms for Bayesian Sequential Design

STA 294: Stochastic Processes & Bayesian Nonparametrics

MCMC algorithms for fitting Bayesian models

Particle MCMC for Bayesian Microwave Control

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Sequential Monte Carlo Methods (for DSGE Models)

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Statistical Inference for Stochastic Epidemic Models

On-Line Parameter Estimation in General State-Space Models

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

An intro to particle methods for parameter inference in state-space models

Introduction to Bayesian methods in inverse problems

Computer Intensive Methods in Mathematical Statistics

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Transcription:

SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra & Ramón y Cajal N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 1/ 48

Statement of the problem N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 2/ 48

State Space Models A system of equations Hidden states (Markov): p(x 1 θ) = µ θ (x 1 ) and for t 1 Observations: p(x t+1 x 1:t, θ) = p(x t+1 x t, θ) = f θ (x t+1 x t ) p(y t y 1:t 1, x 1:t 1, θ) = p(y t x t, θ) = g θ (y t x t ) Parameter: θ Θ, prior p(θ). We observe y 1:T = (y 1,... y T ), T might be large ( 10 4 ). x and θ also of several dimensions. Many models where f θ or g θ cannot be written in closed form N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 3/ 48

State Space Models Some interesting distributions Bayesian inference focuses on: static: p(θ y 1:T ) dynamic: p(θ y 1:t ), t 1 : T Filtering (traditionally) focuses on: t [1, T ] p θ (x t y 1:t ) Smoothing (traditionally) focuses on: t [1, T ] p θ (x t y 1:T ) Prediction: p(y t+1 y 1:t ) = g θ (y t+1 x t+1 )f θ (x t+1 x t ) p θ (x t y 1:t )p(θ y 1:t )dx t dθ N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 4/ 48

An example Stochastic Volatility (Lévy-driven models) Observations ( log returns ): y t = µ + βv t + v 1/2 t ɛ t, t 1 Hidden states ( actual volatility - integrated process): v t+1 = 1 λ (z t z t+1 + k e j ) j=1 N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 5/ 48

An example... where the process z t is the spot volatility : k z t+1 = e λ z t + e λ(t+1 c j ) e j k Poi ( λξ 2 /ω 2) iid iid ( c 1:k U(t, t + 1) ei:k Exp ξ/ω 2 ) The parameter is θ (µ, β, ξ, ω 2, λ), and x t = (v t, z t ). j=1 See the results N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 6/ 48

Why are those models challenging?... It is effectively impossible to compute the likelihood [ ] p(y 1:T θ) = p(y 1:T x 1:T, θ)p(x 1:T θ)dx 1:T X T Similarly, all other inferential quantities are impossible to compute. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 7/ 48

Problems with MCMC and IS approaches We cannot compute the likelihood or sample from p(θ y 1:T ) directly Importance Sampling (IS) results in polynomial or exponential in t increase of variance; e.g. Section 4 in Kong, Liu, Wong (1994, JASA); Bengtsson, T., Bickel, P., and Li, B. (2008); Theorem 4 of Chopin (2004; AoS) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 8/ 48

An MCMC approach would be to sample the parameters and states jointly. But: The high-dimensional autocorrelated state process is difficult to simulate efficiently conditionally on the observations High dependence between parameters and state variables cause poor performance of Gibbs sampler Furthermore, MCMC methods are not designed to recover the whole sequence π(x 1:t, θ y 1:t ) computationally efficiently, instead they sample from the static distribution π(x 1:T, θ y 1:T ) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 9/ 48

Methodology Pt I N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 10/ 48

Monte Carlo in Bayesian statistics: the new generation This problem is an instance of situation frequently encountered in modern applications: statistical inference with intractable densities, which however can be estimated using Monte Carlo. This is harder version of the main problem in Bayesian statistics (intractable integrals) which traditionally has been addressed by data augmentation, simulated likelihood, EM, etc. A new generation of methods has emerged, a common feature of which is the interplay between unbiased estimation and auxiliary variables with aim at constructing exact MC N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 11/ 48

Some characteristic references in that respect include: ABC methods, e.g Pritchard et. al (1999), Molec. Biol. Evol MCMC for models with intractable normalizing constants, e.g Møller et. al (2006), Biometrika Pseudo-marginal MCMC, Andrieu and Roberts (2009), Ann. Stat. Random weight particle filters, e.g Fearnhead et. al (2010), J. Roy. Stat. Soc. B Particle MCMC methods, e.g Andrieu, Doucet and Holenstein (2010), J. Roy. Stat. Soc. B Our approach fits in this framework, allowing fully Bayesian sequential inference (whereas aforementioned approaches deal with static MCMC or inference for dynamic models with fixed parameters) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 12/ 48

Main tools of our approach Particle filter algorithms for state-space models Iterated Batch Importance Sampling for sequential Bayesian inference for parameters Both are sequential Monte Carlo (SMC) methods N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 13/ 48

1. Particle filters Consider the simplified problem of targeting for a given value of θ. p θ (x t+1 y 1:t+1 ) This sequence of distributions is approximated by a sequence of weighted particles which are properly weighted using importance sampling, mutated/propagated according to the system dynamics, and resampled to control the variance. Below we give a pseudo-code version. Any operation involving the superscript n must be understood as performed for n = 1 : N x, where N x is the total number of particles. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 14/ 48

Step 1: At iteration t = 1, (a) Sample x n 1 q 1,θ( ). (b) Compute and normalise weights w 1,θ (x n 1 ) = µ θ(x n 1 )g θ(y 1 x n 1 ) q 1,θ (x n 1 ), W n 1,θ = w 1,θ (x n 1 ) N i=1 w 1,θ(x i 1 ). Step 2: At iteration t = 2 : T (a) Sample the index a n t 1 (b) Sample x n t q t,θ ( x an t 1 t 1 ). (c) Compute and normalise weights 1:Nx M(Wt 1,θ ) of the ancestor w t,θ (x an t 1 t 1, x t n ) = f θ(xt n x an t 1 t 1 )g θ(y t xt n ), W n q t,θ (xt n x an t 1 t 1 ) t,θ = w t,θ (x an t 1 t 1, x t n ) Nx i=1 w t,θ(x ai t 1 t 1, x t) i N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 15/ 48

Observations At each t, (w (i) t, x (i) 1:t )Nx i=1 is a particle approximation of p θ (x t y 1:t ). Resampling to avoid degeneracy. In principle (w (i) t, x (i) 1:t )Nx i=1 is also a particle approximation of p θ (x 1:t y 1:t ) (bad notation! careful with genealogy) Resampling makes this a very poor approximation for large t, known as the path degeneracy problem Taking q θ = f θ simplifies weights, but mainly yields a feasible algorithm when f θ can only be simulated Notation: ψ t,θ the distribution that all variables are drawn from upto time t (particles and ancestors) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 16/ 48

Unbiased likelihood estimator A by-product of PF output is that ( T Ẑt N 1 = t=1 N x N x w (i) t i=1 ) is an unbiased estimator of the likelihood Z t = p(y 1:t θ) for all t. Whereas consistency of the estimator is immediate to check, unbiasedness is subtle, see e.g Proposition 7.4.1 in Del Moral. The variance of this estimator grows typically linealy with T (and not exponentially) because of dependence of the factors in the product. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 17/ 48

2. IBIS SMC method for particle approximation of the sequence p(θ y 1:t ) for t = 1 : T, for models where likelihood is tractable; see e.g. Chopin (2002, Bmka) for details Again, the sequence of parameter posterior distribution is approximated by N θ weighted particles, (θ m, ω m ) N θ m=1 By product: consistent estimators of the predictive densities L t = p(y t y 1:t 1, θ)p(θ)dθ hence of the model evidence In the next slide we give the pseudo-code of the IBIS algorithm. Operations with superscript m must be understood as operations performed for all m 1 : N θ N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 18/ 48

Sample θ m from p(θ) and set ω m 1. Then, at time t = 1,..., T (a) Compute the incremental weights and their weighted average u t (θ m ) = p(y t y 1:t 1, θ m ), ˆLt = (b) Update the importance weights, 1 N θ Nθ m=1 ωm m=1 ω m u t (θ m ), ω m ω m u t (θ m ). (1) (c) If some degeneracy criterion is fulfilled, sample θ m independently from the mixture distribution 1 Nθ m=1 ωm N θ m=1 ω m K t (θ m, ). Finally, replace the current weighted particle system: (θ m, ω m ) ( θ m, 1). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 19/ 48

Observations Cost of lack of ergodicity in θ: the occasional MCMC move Still, in regular problems resampling happens at diminishing frequency (logarithmically) K t is an MCMC kernel invariant wrt π(θ y 1:t ). Its parameters can be chosen using information from current population of θ-particles ˆL t can be used to obtain an estimator of the model evidence Infeasible to implement for state-space models: intractable incremental weights, and MCMC kernel N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 20/ 48

Our algorithm: SMC 2 We provide a generic (black box) algorithm for recovering the sequence of parameter posterior distributions, but as well filtering, smoothing and predictive. We give next a pseudo-code; the code seems to only track the parameter posteriors, but actually it does all other jobs. Superficially, it looks an approximation of IBIS, but in fact it does not produce any systematic errors (unbiased MC) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 21/ 48

Sample θ m from p(θ) and set ω m 1. Then, at time t = 1,..., T, (a) For each particle θ m, perform iteration t of the PF: If 1:Nx,m t = 1, sample independently x1 from ψ 1,θ m, and compute ˆp(y 1 θ m ) = 1 N x ( If t > 1, sample ( 1:Nx,m xt, a conditional on 1:Nx,m x 1:t 1 N x n=1 1:Nx,m t 1, a1:nx,m 1:t 2 w 1,θ (x n,m 1 ); ) from ψ t,θ m ), and compute ˆp(y t y 1:t 1, θ m ) = 1 N x N x n=1 w t,θ (x an,m t 1,m t 1, x n,m t ). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 21/ 48

(b) Update the importance weights, ω m ω mˆp(y t y 1:t 1, θ m ) (c) If some degeneracy) criterion is fulfilled, sample ( θ m 1:Nx,m, x 1:t, ã 1:Nx 1:t 1 independently from 1 Nθ m=1 ωm N θ m=1 {( ω m K t θ m 1:Nx,m, x1:t, a 1:Nx,m 1:t 1 ) }, Finally, replace current weighted particle system: (θ m 1:Nx,m 1:Nx,m, x1:t, a1:t 1, ωm ) ( θ m 1:Nx,m 1:Nx,m, x 1:t, ã1:t 1, 1) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 22/ 48

Observations It appears as approximation to IBIS. For N x = it is IBIS. However, no approximation is done whatsoever. This algorithm really samples from p(θ y 1:t ) and all other distributions of interest. One would expect an increase of MC variance over IBIS. The validity of algorithm is essentially based on two results: i) the particles are properly weighted due to unbiasedness of PF estimator of likelihood; ii) the MCMC kernel is appropriately constructed to maintain invariance wrt to an expanded distribution which admits those of interest as marginals; it is a Particle MCMC kernel. The algorithm does not suffer from the path degeneracy problem due to the MCMC updates N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 22/ 48

Theory Pt I N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 23/ 48

Proposition The probability density π t may be written as: π t (θ, x 1:Nx 1:t, a 1:Nx 1 N x N x n=1 t 1:t 1 ) = p(θ y 1:t) p(x n 1:t θ, y 1:t) N t 1 x N x W s=2 i=1 i h n t (s) N x i q 1,θ(x i=1 i h n t (1) a i s 1 s 1,θ q s,θ(x i s x ai s 1 s 1 ) 1) Intuition when t = 1 N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 24/ 48

Methodology Pt II N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 25/ 48

Principled framework for algorithmic improvements Elaborating on Proposition 1 we propose several formal ways for the following algorithmic operations: MCMC rejuvenation step within the PMCMC Sampling from the smoothing distributions Automatic calibration of N x Dealing with intractable g θ SMC 2 can be used for more general sequences of distributions, e.g. obtained by tempering N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 26/ 48

The MCMC rejuvenation step (a) Sample θ from proposal kernel, θ T (θ, d θ). (b) Run a new PF for θ: sample independently ( x 1:Nx 1:t, ã 1:Nx 1:t 1 ) from ψ t, θ, and compute Ẑ t ( θ, x 1:Nx 1:t, ã 1:Nx 1:t 1 ). (c) Accept the move with probability 1 p( θ)ẑ t ( θ, x 1:Nx 1:t, ã 1:Nx 1:t 1 )T ( θ, θ) p(θ)ẑ t (θ, x 1:Nx 1:t, a 1:Nx 1:t 1 )T (θ, θ). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 27/ 48

It directly follows from the Proposition that this algorithm defines a standard Hastings-Metropolis kernel with proposal distribution q θ ( θ, x 1:Nx 1:t, ã 1:Nx 1:t ) = T (θ, θ)ψ t, θ( x 1:Nx 1:t, ã 1:Nx 1:t ) and admits as invariant distribution the extended distribution π t (θ, x 1:Nx 1:t, a 1:Nx 1:t 1 ). This is precisely a particle MCMC step, as in Andrieu et al. (2010, JRSSB). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 28/ 48

Dynamic increase of N x Why increasing N x is necessary? Our framework allows the dynamic and automatic increase of N x using the generalized importance sampling strategy of Del Moral et al. (2006). We propose two approaches; a particle exchange and a conditional SMC, the latter being more efficient (in terms of minimizing variance of weights) but memory intensive They get triggered when a degeneracy criterion is fulfilled N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 29/ 48

Particle exchange Exchange importance sampling step Launch a new SMC for each θ-particle, with Ñ x x-particles. Joint distribution: π t (θ, x 1:Nx 1:t, a 1:Nx 1:t 1 )ψ t,θ( x 1:Ñ x 1:t, ã 1:Ñ x 1:t 1 ) Retain the new x-particles and drop the old ones, updating the θ-weights with: ( ) ut exch θ, x 1:Nx 1:t, a 1:Nx 1:Ñ 1:t 1, x x 1:t, ã 1:Ñ Ẑ t (θ, x 1:Ñ x x 1:t, ã 1:Ñ x 1:t 1 1:t 1 = ) Ẑ t (θ, x 1:Nx 1:t, a 1:Nx 1:t 1 ) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 30/ 48

Numerics Extensive study of numerical performance and comparisons, both in the paper and its Supplement, both available at ArXiv. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 31/ 48

Numerical illustrations: SV 1.0 800 Squared observations 8 6 4 2 Acceptance rates 0.8 0.6 0.4 0.2 Nx 700 600 500 400 300 200 0 0.0 100 200 400 600 800 1000 Time (a) 0 200 400 600 800 1000 Iterations (b) 0 200 400 600 800 1000 Iterations (c) Figure: Squared observations (synthetic data set), acceptance rates, and illustration of the automatic increase of N x. See the model N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 32/ 48

Numerical illustrations: SV 7 6 250 500 1000 density 5 4 3 2 Run 1 2 3 4 5 1 0 0.5 0.0 0.5 1.0 0.5 0.0 0.5 1.0 µ 0.5 0.0 0.5 1.0 250 500 1000 3.0 2.5 density 2.0 1.5 1.0 Run 1 2 3 4 5 0.5 0.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.0 0.5 0.0 0.5 1.0 β 1.5 1.0 0.5 0.0 0.5 1.0 5 250 500 1000 4 density 3 2 Run 1 2 3 4 5 1 0 1 2 3 4 5 6 1 2 3 4 5 6 ξ 1 2 3 4 5 6 N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 33/ 48

Numerical illustrations: SV Run 1 2 3 4 5 Run 1 2 3 4 5 L&W SMC2 Computing time (s) 25000 20000 15000 10000 5000 200 400 600 800 1000 Time 200 400 600 800 1000 (a) L&W SMC2 PMCMC 3.5 3.0 density 2.5 2.0 1.5 1.0 0.5 0.0 2 4 6 8 2 4 6 8 ξ (b) 2 4 6 8 L&W SMC2 log evidence minus reference 2 1 0 1 2 Run 1 2 3 4 5 200 400 600 800 200 400 600 800 Time N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 34/ 48

Numerical illustrations: SV Squared observations 20 15 10 5 Evidence compared to the one factor model 4 2 0 2 variable Multi factor without leverage Multi factor with leverage 100 200 300 400 500 600 700 Time (d) 100 200 300 400 500 600 700 Iterations (e) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 35/ 48

Theory Pt II N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 36/ 48

Stability of the algorithm and computational complexity The algorithm refreshes at each MCMC step, but at the expense of re-visiting all available data so far. The computational complexity of the algorithm is determined by the frequency at which needs to resort to this step. If after each resampling the particles are simulated from π t, then for a time t + p such that no resampling has happened since t, the inverse of the second moment of the normalized weights in SMC 2 and IBIS is given by { [Ẑt+p t Et,t+p Nx = (θ, x 1:Nx 1:t+p, a1:nx 1:t+p 1 )2 E πt,t+p p(y t+1:t+p y 1:t ) 2 { [ Et,t+p p(θ y1:t+p ) 2 ]} 1 = E p(θ y1:t ) p(θ y 1:t ) 2 ]} 1 N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 37/ 48

Proposition 1 Under Assumptions (H1a) and (H1b) in Appendix there exists a constant η > 0 such that for any p, if N x > ηp, E Nx t,t+p 1 2 E t,t+p. (2) 2 Under Assumptions (H2a)-(H2d) in Appendix for any γ > 0 there exist τ, η > 0 and t 0 <, such that for t t 0, E Nx t,t+p γ, for p = τt, N x = ηt. This suggests that the computational cost of the algorithm up to time t is O(N θ t 2 ) which should be contrasted with the O(N θ t) cost of IBIS under the Assumptions (H2a)-(H2d). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 38/ 48

Final Remarks A powerful framework The article is available on arxiv and our web pages A package is available: http://code.google.com/p/py-smc2/. Parallel computing implementation using GPUs by Fulop and Li (2011) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 39/ 48

Appendix N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 40/ 48

Some challenges for short-term future Theoretical: justify the orange claim Numerical: combined with GPU implementation, try the algorithm on extremely hard problems Algorithmic: find a better diagnostic for increase N x N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 41/ 48

Why does it work? - Intuition for t = 1 At time t = 1, the algorithm generates variables θ m from the prior p(θ), and for each θ m 1:Nx,m, the algorithm generates vectors x1 of particles, from ψ 1,θ m(x 1:Nx 1 ). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 42/ 48

Thus, the sampling space is Θ X Nx, and the actual particles of the algorithm are N θ independent and identically distributed copies of the random variable (θ, x 1:Nx 1 ), with density: p(θ)ψ 1,θ (x 1:Nx 1 ) = p(θ) N x n=1 q 1,θ (x n 1 ). N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 43/ 48

Then, these particles are assigned importance weights corresponding to the incremental weight function Ẑ 1 (θ, x 1:Nx 1 ) = Nx 1 Nx n=1 w 1,θ(x1 n). This means that, at iteration 1, the target distribution of the algorithm should be defined as: 1 ) π 1 (θ, x 1:Nx 1 ) = p(θ)ψ 1,θ (x 1:Nx 1 ) Ẑ1(θ, x 1:Nx p(y 1 ) where the normalising constant p(y 1 ) is easily deduced from the property that Ẑ 1 (θ, x 1:Nx 1 ) is an unbiased estimator of p(y 1 θ)., N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 44/ 48

Direct substitutions yield { } π 1 (θ, x 1:Nx 1 ) = p(θ) N x q 1,θ (x i 1 N x µ θ (x1 n p(y 1 ) 1) )g θ(y 1 x1 n) N x q i=1 n=1 1,θ (x1 n ) = 1 N x p(θ) N x p(y 1 ) µ N x θ(x1 n )g θ (y 1 x1 n ) q 1,θ (x1) i n=1 i=1,i n and noting that, for the triplet (θ, x 1, y 1 ) of random variables, p(θ)µ θ (x 1 )g θ (y 1 x 1 ) = p(θ, x 1, y 1 ) = p(y 1 )p(θ y 1 )p(x 1 y 1, θ) one finally gets that: π 1 (θ, x 1:Nx 1 ) = p(θ y 1) N x N x n=1 p(x1 n y 1, θ) N x i=1,i n q 1,θ (x1) i. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 45/ 48

By a simple induction, one sees that the target density π t at iteration t 2 should be defined as: π t (θ, x 1:Nx 1:t, a 1:Nx 1:t 1 ) = p(θ)ψ t,θ(x 1:Nx 1:t, a 1:Nx 1:t 1 ) Ẑ t (θ, x 1:N x 1:t, a 1:Nx p(y 1:t ) and similarly we obtain Proposition 1 1:t 1 ) N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 46/ 48

Assumptions (H1a) For all θ Θ, and x, x, x X, f θ (x x ) f θ (x x ) β. (H1b) For all θ Θ, x, x X, y Y, g θ (y x) g θ (y x ) δ. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 47/ 48

(H2a) The MLE ˆθ t (the mode of function l t (θ)) exists and converges to θ 0 as n +. (H2b) The observed information matrix defined as Σ t = 1 t l t (ˆθ t ) θ θ is positive definite and converges to I (θ 0 ), the Fisher information matrix. (H2c) There exists such that, for δ (0, ), lim sup t + 1 t { )} sup l t (θ) l t (ˆθ t < 0. θ ˆθ t >δ (H2d) The function l t /t is six-times continuously differentiable, and its derivatives of order six are bounded relative to t over any compact set Θ Θ. N. CHOPIN, P.E. JACOB, & O. PAPASPILIOPOULOS SMC 2 48/ 48