Sequential Monte Carlo Methods for Bayesian Computation

Size: px
Start display at page:

Download "Sequential Monte Carlo Methods for Bayesian Computation"

Transcription

1 Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept A. Doucet (MLSS Sept. 2012) Sept / 136

2 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). A. Doucet (MLSS Sept. 2012) Sept / 136

3 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). We observe a realization of y of Y which is assumed to satisfy Y (X = x) g ( x) ; i.e. the likelihood function is g (y x). A. Doucet (MLSS Sept. 2012) Sept / 136

4 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). We observe a realization of y of Y which is assumed to satisfy Y (X = x) g ( x) ; i.e. the likelihood function is g (y x). Bayesian inference on X relies on the posterior of X given Y = y: p (x y) = µ (x) g (y x) p (y) where the marginal likelihood/evidence satisfies p (y) = µ (x) g (y x) dx. A. Doucet (MLSS Sept. 2012) Sept / 136

5 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). We observe a realization of y of Y which is assumed to satisfy Y (X = x) g ( x) ; i.e. the likelihood function is g (y x). Bayesian inference on X relies on the posterior of X given Y = y: p (x y) = µ (x) g (y x) p (y) where the marginal likelihood/evidence satisfies p (y) = µ (x) g (y x) dx. Machine learning examples: Latent Dirichlet Allocation, (Hiearchical) Dirichlet processes... A. Doucet (MLSS Sept. 2012) Sept / 136

6 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). A. Doucet (MLSS Sept. 2012) Sept / 136

7 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). A. Doucet (MLSS Sept. 2012) Sept / 136

8 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). Let z i :j := (z i, z i+1,..., z j ) then Bayesian inference on X 1:t relies on the posterior of X 1:t given Y = y 1:t : p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) where the marginal likelihood/evidence satisfies p (y 1:t ) = p (x 1:t, y 1:t ) dx 1:t. A. Doucet (MLSS Sept. 2012) Sept / 136

9 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). Let z i :j := (z i, z i+1,..., z j ) then Bayesian inference on X 1:t relies on the posterior of X 1:t given Y = y 1:t : p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) where the marginal likelihood/evidence satisfies p (y 1:t ) = p (x 1:t, y 1:t ) dx 1:t. Machine learning examples: Biochemical network models, Dynamic topic models, Neuroscience models etc. A. Doucet (MLSS Sept. 2012) Sept / 136

10 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. A. Doucet (MLSS Sept. 2012) Sept / 136

11 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach A. Doucet (MLSS Sept. 2012) Sept / 136

12 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; A. Doucet (MLSS Sept. 2012) Sept / 136

13 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; A. Doucet (MLSS Sept. 2012) Sept / 136

14 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; A. Doucet (MLSS Sept. 2012) Sept / 136

15 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; uncertainties over model order; A. Doucet (MLSS Sept. 2012) Sept / 136

16 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; uncertainties over model order; model parameters and predictions are technically straightforward to compute; A. Doucet (MLSS Sept. 2012) Sept / 136

17 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; uncertainties over model order; model parameters and predictions are technically straightforward to compute; The cost to pay is that approximate inference techniques are necessary to approximate the resulting posterior distributions for all but trivial models. A. Doucet (MLSS Sept. 2012) Sept / 136

18 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. A. Doucet (MLSS Sept. 2012) Sept / 136

19 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. A. Doucet (MLSS Sept. 2012) Sept / 136

20 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. Expectation-Propagation. A. Doucet (MLSS Sept. 2012) Sept / 136

21 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. Expectation-Propagation. Markov chain Monte Carlo (MCMC) methods. A. Doucet (MLSS Sept. 2012) Sept / 136

22 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. Expectation-Propagation. Markov chain Monte Carlo (MCMC) methods. Sequential Monte Carlo (SMC) methods. A. Doucet (MLSS Sept. 2012) Sept / 136

23 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. A. Doucet (MLSS Sept. 2012) Sept / 136

24 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. A. Doucet (MLSS Sept. 2012) Sept / 136

25 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. MCMC are the tools of choice in Bayesian computation for over 20 years whereas SMC have been widely used for 15 years in vision and robotics. A. Doucet (MLSS Sept. 2012) Sept / 136

26 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. MCMC are the tools of choice in Bayesian computation for over 20 years whereas SMC have been widely used for 15 years in vision and robotics. The development of new methodology combined to the emergence of cheap multicore architectures makes now SMC a powerful alternative/complementary approach to MCMC to address general Bayesian computational problems. A. Doucet (MLSS Sept. 2012) Sept / 136

27 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. MCMC are the tools of choice in Bayesian computation for over 20 years whereas SMC have been widely used for 15 years in vision and robotics. The development of new methodology combined to the emergence of cheap multicore architectures makes now SMC a powerful alternative/complementary approach to MCMC to address general Bayesian computational problems. The aim of these lectures is to provide an introduction to this active research field and discuss some open research problems. A. Doucet (MLSS Sept. 2012) Sept / 136

28 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, A. Doucet (MLSS Sept. 2012) Sept / 136

29 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, A. Doucet (MLSS Sept. 2012) Sept / 136

30 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, O. Cappé, E. Moulines & T. Ryden, Hidden Markov Models, Springer-Verlag: New York, A. Doucet (MLSS Sept. 2012) Sept / 136

31 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, O. Cappé, E. Moulines & T. Ryden, Hidden Markov Models, Springer-Verlag: New York, Webpage with links to papers and codes: A. Doucet (MLSS Sept. 2012) Sept / 136

32 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, O. Cappé, E. Moulines & T. Ryden, Hidden Markov Models, Springer-Verlag: New York, Webpage with links to papers and codes: Thousands of papers on the subject appear every year. A. Doucet (MLSS Sept. 2012) Sept / 136

33 Organization of Lectures State-Space Models (approx.4 hours) A. Doucet (MLSS Sept. 2012) Sept / 136

34 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing A. Doucet (MLSS Sept. 2012) Sept / 136

35 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference A. Doucet (MLSS Sept. 2012) Sept / 136

36 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference A. Doucet (MLSS Sept. 2012) Sept / 136

37 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) A. Doucet (MLSS Sept. 2012) Sept / 136

38 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions A. Doucet (MLSS Sept. 2012) Sept / 136

39 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions SMC samplers. A. Doucet (MLSS Sept. 2012) Sept / 136

40 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions SMC samplers. Approximate Bayesian Computation. A. Doucet (MLSS Sept. 2012) Sept / 136

41 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions SMC samplers. Approximate Bayesian Computation. Optimal design, optimal control. A. Doucet (MLSS Sept. 2012) Sept / 136

42 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). A. Doucet (MLSS Sept. 2012) Sept / 136

43 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an Y-valued Markov observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). A. Doucet (MLSS Sept. 2012) Sept / 136

44 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an Y-valued Markov observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). General class of time series models aka Hidden Markov Models (HMM) including X t = Ψ (X t 1, V t ), Y t = Φ (X t, W t ) where V t, W t are two sequences of i.i.d. random variables. A. Doucet (MLSS Sept. 2012) Sept / 136

45 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an Y-valued Markov observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). General class of time series models aka Hidden Markov Models (HMM) including X t = Ψ (X t 1, V t ), Y t = Φ (X t, W t ) where V t, W t are two sequences of i.i.d. random variables. Aim: Infer {X t } given observations {Y t } on-line or off-line. A. Doucet (MLSS Sept. 2012) Sept / 136

46 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). A. Doucet (MLSS Sept. 2012) Sept / 136

47 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). Finite State-space HMM: X is a finite space, i.e. {X t } is a finite Markov chain Y t (X t = x) g ( x) A. Doucet (MLSS Sept. 2012) Sept / 136

48 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). Finite State-space HMM: X is a finite space, i.e. {X t } is a finite Markov chain Y t (X t = x) g ( x) Linear Gaussian state-space model X t = AX t 1 + BV t, V t i.i.d. N (0, I ) Y t = CX t + DW t, W t i.i.d. N (0, I ) A. Doucet (MLSS Sept. 2012) Sept / 136

49 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). Finite State-space HMM: X is a finite space, i.e. {X t } is a finite Markov chain Y t (X t = x) g ( x) Linear Gaussian state-space model X t = AX t 1 + BV t, V t i.i.d. N (0, I ) i.i.d. Y t = CX t + DW t, W t N (0, I ) Switching Linear Gaussian state-space model: X t = ( Xt 1, Xt 2 ) where { Xt 1 } is a finite Markov chain, Xt 2 = A ( Xt 1 ) X 2 t 1 + B ( Xt 1 ) i.i.d. Vt, V t N (0, I ) Y t = C ( X 1 t ) X 2 t + D ( Xt 1 ) Wt, W t i.i.d. N (0, I ) A. Doucet (MLSS Sept. 2012) Sept / 136

50 State-Space Models Stochastic Volatility model X t = φx t 1 + σv t, V t i.i.d. N (0, 1) Y t = β exp (X t /2) W t, W t i.i.d. N (0, 1) A. Doucet (MLSS Sept. 2012) Sept / 136

51 State-Space Models Stochastic Volatility model X t = φx t 1 + σv t, V t i.i.d. N (0, 1) Y t = β exp (X t /2) W t, W t i.i.d. N (0, 1) Biochemical Network model Pr ( Xt+dt 1 =x t 1 +1, Xt+dt 2 =x t 2 xt 1, xt 2 ) = α x 1 t dt + o (dt), Pr ( Xt+dt 1 =x t 1 1, Xt+dt 2 =x t 2 +1 xt 1, xt 2 ) = β x 1 t xt 2 dt + o (dt), Pr ( Xt+dt 1 =x t 1, Xt+dt 2 =x t 2 1 xt 1, xt 2 ) = γ x 2 t dt + o (dt), with Y k = Xk 1 T + W i.i.d. k with W k N ( 0, σ 2). A. Doucet (MLSS Sept. 2012) Sept / 136

52 State-Space Models Stochastic Volatility model X t = φx t 1 + σv t, V t i.i.d. N (0, 1) Y t = β exp (X t /2) W t, W t i.i.d. N (0, 1) Biochemical Network model Pr ( Xt+dt 1 =x t 1 +1, Xt+dt 2 =x t 2 xt 1, xt 2 ) = α x 1 t dt + o (dt), Pr ( Xt+dt 1 =x t 1 1, Xt+dt 2 =x t 2 +1 xt 1, xt 2 ) = β x 1 t xt 2 dt + o (dt), Pr ( Xt+dt 1 =x t 1, Xt+dt 2 =x t 2 1 xt 1, xt 2 ) = γ x 2 t dt + o (dt), with Y k = Xk 1 T + W i.i.d. k with W k N ( 0, σ 2). Nonlinear Diffusion model dx t = α (X t ) dt + β (X t ) dv t, V t Brownian motion Y k = γ (X k T ) +W k, W k i.i.d. N ( 0, σ 2). A. Doucet (MLSS Sept. 2012) Sept / 136

53 Inference in State-Space Models Given observations y 1:t := (y 1, y 2,..., y t ), inference about X 1:t := (X 1,..., X t ) relies on the posterior where p (x 1:t, y 1:t ) = µ (x 1 ) p (y 1:t ) = p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) t k=2 f (x k x k 1 ) }{{}}{{} p(x 1:t ) p( y 1:t x 1:t ) p (x 1:t, y 1:t ) dx 1:t t k=1 g (y k x k ), A. Doucet (MLSS Sept. 2012) Sept / 136

54 Inference in State-Space Models Given observations y 1:t := (y 1, y 2,..., y t ), inference about X 1:t := (X 1,..., X t ) relies on the posterior where p (x 1:t, y 1:t ) = µ (x 1 ) p (y 1:t ) = p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) t k=2 f (x k x k 1 ) }{{}}{{} p(x 1:t ) p( y 1:t x 1:t ) p (x 1:t, y 1:t ) dx 1:t t k=1 g (y k x k ), When X is finite & linear Gaussian models, {p (x t y 1:t )} t 1 can be computed exactly. For non-linear models, approximations are required: EKF, UKF, Gaussian sum filters, etc. A. Doucet (MLSS Sept. 2012) Sept / 136

55 Inference in State-Space Models Given observations y 1:t := (y 1, y 2,..., y t ), inference about X 1:t := (X 1,..., X t ) relies on the posterior where p (x 1:t, y 1:t ) = µ (x 1 ) p (y 1:t ) = p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) t k=2 f (x k x k 1 ) }{{}}{{} p(x 1:t ) p( y 1:t x 1:t ) p (x 1:t, y 1:t ) dx 1:t t k=1 g (y k x k ), When X is finite & linear Gaussian models, {p (x t y 1:t )} t 1 can be computed exactly. For non-linear models, approximations are required: EKF, UKF, Gaussian sum filters, etc. Approximations of {p (x t y 1:t )} T t=1 provide approximation of p (x 1:T y 1:T ). A. Doucet (MLSS Sept. 2012) Sept / 136

56 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

57 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 Integration is straightforward. ϕt (x 1:t ) p (x 1:t y 1:t ) dx 1:t ϕ t (x 1:t ) p ((x 1:t ) y 1:t ) dx 1:t = 1 N N i=1 ϕ X (i) 1:t A. Doucet (MLSS Sept. 2012) Sept / 136

58 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 Integration is straightforward. ϕt (x 1:t ) p (x 1:t y 1:t ) dx 1:t ϕ t (x 1:t ) p ((x 1:t ) y 1:t ) dx 1:t = 1 N N i=1 ϕ Marginalization is straightforward. X (i) 1:t p (x k y 1:t ) = p (x 1:t y 1:t ) dx 1:k 1 dx k+1:t = 1 N N δ (i) X (x k ). k i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

59 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 Integration is straightforward. ϕt (x 1:t ) p (x 1:t y 1:t ) dx 1:t ϕ t (x 1:t ) p ((x 1:t ) y 1:t ) dx 1:t = 1 N N i=1 ϕ Marginalization is straightforward. X (i) 1:t p (x k y 1:t ) = p (x 1:t y 1:t ) dx 1:k 1 dx k+1:t = 1 N [ ( )] Basic and key property: V 1 N N i=1 ϕ = X (i) 1:t N δ (i) X (x k ). k i=1 C (t dim(x )) N, i.e. rate of convergence to zero is independent of dim (X ) and t. A. Doucet (MLSS Sept. 2012) Sept / 136

60 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. A. Doucet (MLSS Sept. 2012) Sept / 136

61 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). A. Doucet (MLSS Sept. 2012) Sept / 136

62 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). Typical solution to problem 1 is to generate approximate samples using MCMC methods but these methods are not recursive. A. Doucet (MLSS Sept. 2012) Sept / 136

63 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). Typical solution to problem 1 is to generate approximate samples using MCMC methods but these methods are not recursive. SMC Methods solves partially Problem 1 and Problem 2 by breaking the problem of sampling from p (x 1:t y 1:t ) into a collection of simpler subproblems. First approximate p (x 1 y 1 ) and p (y 1 ) at time 1, then p (x 1:2 y 1:2 ) and p (y 1:2 ) at time 2 and so on. A. Doucet (MLSS Sept. 2012) Sept / 136

64 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). Typical solution to problem 1 is to generate approximate samples using MCMC methods but these methods are not recursive. SMC Methods solves partially Problem 1 and Problem 2 by breaking the problem of sampling from p (x 1:t y 1:t ) into a collection of simpler subproblems. First approximate p (x 1 y 1 ) and p (y 1 ) at time 1, then p (x 1:2 y 1:2 ) and p (y 1:2 ) at time 2 and so on. Each target distribution is approximated by a cloud of random samples termed particles evolving according to importance sampling and resampling steps. A. Doucet (MLSS Sept. 2012) Sept / 136

65 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. A. Doucet (MLSS Sept. 2012) Sept / 136

66 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. Prediction step p (x t y 1:t 1 ) = p (x t 1, x t y 1:t 1 ) dx t 1 = p (x t y 1:t 1, x t 1 ) p (x t 1 y 1:t 1 ) dx t 1 = f (x t x t 1 ) p (x t 1 y 1:t 1 ) dx t 1. A. Doucet (MLSS Sept. 2012) Sept / 136

67 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. Prediction step p (x t y 1:t 1 ) = p (x t 1, x t y 1:t 1 ) dx t 1 = p (x t y 1:t 1, x t 1 ) p (x t 1 y 1:t 1 ) dx t 1 = f (x t x t 1 ) p (x t 1 y 1:t 1 ) dx t 1. Bayes Updating step where p (x t y 1:t ) = g (y t x t ) p (x t y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x t y 1:t 1 ) dx t A. Doucet (MLSS Sept. 2012) Sept / 136

68 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. Prediction step p (x t y 1:t 1 ) = p (x t 1, x t y 1:t 1 ) dx t 1 = p (x t y 1:t 1, x t 1 ) p (x t 1 y 1:t 1 ) dx t 1 = f (x t x t 1 ) p (x t 1 y 1:t 1 ) dx t 1. Bayes Updating step where p (x t y 1:t ) = g (y t x t ) p (x t y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x t y 1:t 1 ) dx t This is the recursion implemented by Wonham and Kalman filters... A. Doucet (MLSS Sept. 2012) Sept / 136

69 Bayesian Recursion on Path Space SMC approximate directly {p (x 1:t y 1:t )} t 1 not {p (x t y 1:t )} t 1 and relies on p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1, y 1:t 1 ) p (y 1:t ) p (y t y 1:t 1 ) p (y 1:t 1 ) where = g (y t x t ) predictive p( x 1:t y 1:t 1 ) {}}{ f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t A. Doucet (MLSS Sept. 2012) Sept / 136

70 Bayesian Recursion on Path Space SMC approximate directly {p (x 1:t y 1:t )} t 1 not {p (x t y 1:t )} t 1 and relies on p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1, y 1:t 1 ) p (y 1:t ) p (y t y 1:t 1 ) p (y 1:t 1 ) where = g (y t x t ) predictive p( x 1:t y 1:t 1 ) {}}{ f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t This can be alternatively written as Prediction p (x 1:t y 1:t 1 ) = f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ), Update p (x 1:t y 1:t ) = g ( y t x t )p( x 1:t y 1:t 1 ) p( y t y 1:t 1. ) A. Doucet (MLSS Sept. 2012) Sept / 136

71 Bayesian Recursion on Path Space SMC approximate directly {p (x 1:t y 1:t )} t 1 not {p (x t y 1:t )} t 1 and relies on p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1, y 1:t 1 ) p (y 1:t ) p (y t y 1:t 1 ) p (y 1:t 1 ) where = g (y t x t ) predictive p( x 1:t y 1:t 1 ) {}}{ f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t This can be alternatively written as Prediction p (x 1:t y 1:t 1 ) = f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ), Update p (x 1:t y 1:t ) = g ( y t x t )p( x 1:t y 1:t 1 ) p( y t y 1:t 1. ) SMC is a simple and natural simulation-based implementation of this recursion. A. Doucet (MLSS Sept. 2012) Sept / 136

72 Monte Carlo Implementation of Prediction Step Assume you have at time t 1 p (x 1:t 1 y 1:t 1 ) = 1 N N δ (i) X (x 1:t 1 ). 1:t 1 i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

73 Monte Carlo Implementation of Prediction Step Assume you have at time t 1 p (x 1:t 1 y 1:t 1 ) = 1 N N δ (i) X (x 1:t 1 ). 1:t 1 i=1 ( ) ( ) By sampling X (i) t f x t X (i) t 1 and setting X (i) 1:t = X (i) 1:t 1, X (i) t then p (x 1:t y 1:t 1 ) = 1 N N δ X (i) (x 1:t ). 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

74 Monte Carlo Implementation of Prediction Step Assume you have at time t 1 p (x 1:t 1 y 1:t 1 ) = 1 N N δ (i) X (x 1:t 1 ). 1:t 1 i=1 ( ) ( ) By sampling X (i) t f x t X (i) t 1 and setting X (i) 1:t = X (i) 1:t 1, X (i) t then p (x 1:t y 1:t 1 ) = 1 N N δ X (i) (x 1:t ). 1:t i=1 Sampling from f (x t x t 1 ) is usually straightforward and can be done even if f (x t x t 1 ) does not admit any analytical expression; e.g. biochemical network models. A. Doucet (MLSS Sept. 2012) Sept / 136

75 Importance Sampling Implementation of Updating Step Our target at time t is p (x 1:t y 1:t ) = g (y t x t ) p (x 1:t y 1:t 1 ) p (y t y 1:t 1 ) so by substituting p (x 1:t y 1:t 1 ) to p (x 1:t y 1:t 1 ) we obtain p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t = 1 N N ( ) g y t X (i) t. i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

76 Importance Sampling Implementation of Updating Step Our target at time t is p (x 1:t y 1:t ) = g (y t x t ) p (x 1:t y 1:t 1 ) p (y t y 1:t 1 ) so by substituting p (x 1:t y 1:t 1 ) to p (x 1:t y 1:t 1 ) we obtain p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t We now have = 1 N N ( ) g y t X (i) t. i=1 p (x 1:t y 1:t ) = g (y t x t ) p (x 1:t y 1:t 1 ) = p (y t y 1:t 1 ) ( ) with W (i) t g y t X (i) t, N i=1 W (i) t = 1. N i=1 W (i) t δ X (i) (x 1:t ). 1:t A. Doucet (MLSS Sept. 2012) Sept / 136

77 Multinomial Resampling We have a weighted approximation p (x 1:t y 1:t ) of p (x 1:t y 1:t ) p (x 1:t y 1:t ) = N i=1 W (i) t δ X (i) (x 1:t ). 1:t A. Doucet (MLSS Sept. 2012) Sept / 136

78 Multinomial Resampling We have a weighted approximation p (x 1:t y 1:t ) of p (x 1:t y 1:t ) p (x 1:t y 1:t ) = N i=1 W (i) t δ X (i) (x 1:t ). 1:t To obtain N samples X (i) 1:t approximately distributed according to p (x 1:t y 1:t ), resample N times with replacement to obtain X (i) 1:t p (x 1:t y 1:t ) N δ (i) X (x 1:t ) = 1:t i=1 p (x 1:t y 1:t ) = 1 N { } [ where N (i) t follow a multinomial with E [ ] ( ) V N (1) t = NW (i) t 1 W (i) t. N i=1 N (i) t N (i) t N δ X (i) 1:t ] (x 1:t ) = NW (i) t, A. Doucet (MLSS Sept. 2012) Sept / 136

79 Multinomial Resampling We have a weighted approximation p (x 1:t y 1:t ) of p (x 1:t y 1:t ) p (x 1:t y 1:t ) = N i=1 W (i) t δ X (i) (x 1:t ). 1:t To obtain N samples X (i) 1:t approximately distributed according to p (x 1:t y 1:t ), resample N times with replacement to obtain X (i) 1:t p (x 1:t y 1:t ) N δ (i) X (x 1:t ) = 1:t i=1 p (x 1:t y 1:t ) = 1 N { } [ where N (i) t follow a multinomial with E [ ] ( ) V N (1) t = NW (i) t 1 W (i) t. This can be achieved in O (N). N i=1 N (i) t N (i) t N δ X (i) 1:t ] (x 1:t ) = NW (i) t, A. Doucet (MLSS Sept. 2012) Sept / 136

80 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

81 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 A. Doucet (MLSS Sept. 2012) Sept / 136

82 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 A. Doucet (MLSS Sept. 2012) Sept / 136

83 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 At time t 2 Sample X (i) t f p (x 1:t y 1:t ) = ( ) ( ) x t X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and N i=1 ( W (i) t δ X (i) (x 1:t ), W (i) t g 1:t ) y t X (i) t. A. Doucet (MLSS Sept. 2012) Sept / 136

84 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 At time t 2 Sample X (i) t f p (x 1:t y 1:t ) = ( ) ( ) x t X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and N i=1 ( W (i) t δ X (i) (x 1:t ), W (i) t g 1:t Resample X (i) 1:t p (x 1:t y 1:t ) to obtain p (x 1:t y 1:t ) = 1 N N i=1 δ (i) X (x 1:t ). 1:t ) y t X (i) t. A. Doucet (MLSS Sept. 2012) Sept / 136

85 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

86 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 The marginal likelihood estimate is given by ( t 1 p (y 1:t ) = p (y k y 1:k 1 ) = N k=1 t k=1 N ( g i=1 ) ) y k X (i) k. A. Doucet (MLSS Sept. 2012) Sept / 136

87 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 The marginal likelihood estimate is given by ( t 1 p (y 1:t ) = p (y k y 1:k 1 ) = N k=1 t k=1 N ( g i=1 ) ) y k X (i) k. Computational complexity is O (N) at each time step and memory requirements O (tn). A. Doucet (MLSS Sept. 2012) Sept / 136

88 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 The marginal likelihood estimate is given by ( t 1 p (y 1:t ) = p (y k y 1:k 1 ) = N k=1 t k=1 N ( g i=1 ) ) y k X (i) k. Computational complexity is O (N) at each time step and memory requirements O (tn). If we are only interested in p (x t y 1:t ) or p (s t (x 1:t ) y 1:t ) where s t (x 1:t ) = Ψ t (x t, s t 1 (x 1:t 1 )) - e.g. s t (x 1:t ) = t k=1 x 2 k - is fixed-dimensional then memory requirements O (N). A. Doucet (MLSS Sept. 2012) Sept / 136

89 state state Figure: p ( x 1 y 1 ) and Ê [ X 1 y 1 ] (top) and particle approximation of p ( x 1 y 1 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136 SMC on Path-Space - figures by Olivier Cappė time index time index

90 state state time index time index Figure: p ( x 1 y 1 ), p ( x 2 y 1:2 )and Ê [ X 1 y 1 ], Ê [ X 2 y 1:2 ] (top) and particle approximation of p ( x 1:2 y 1:2 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136

91 state state time index time index Figure: p ( x t y 1:t ) and Ê [ X t y 1:t ] for t = 1, 2, 3 (top) and particle approximation of p ( x 1:3 y 1:3 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136

92 state state time index time index Figure: p ( x t y 1:t ) and Ê [ X t y 1:t ] for t = 1,..., 10 (top) and particle approximation of p ( x 1:10 y 1:10 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136

93 state state time index time index Figure: p ( x t y 1:t ) and Ê [ X t y 1:t ] for t = 1,..., 24 (top) and particle approximation of p ( x 1:24 y 1:24 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136

94 Remarks Empirically this SMC strategy performs well in terms of estimating the marginals {p (x t y 1:t )} t 1. This is what is only necessary in many applications thankfully. A. Doucet (MLSS Sept. 2012) Sept / 136

95 Remarks Empirically this SMC strategy performs well in terms of estimating the marginals {p (x t y 1:t )} t 1. This is what is only necessary in many applications thankfully. However, the joint distribution p (x 1:t y 1:t ) is poorly estimated when t is large; i.e. we have in the previous example p (x 1:11 y 1:24 ) = δ X 1:11 (x 1:11 ). A. Doucet (MLSS Sept. 2012) Sept / 136

96 Remarks Empirically this SMC strategy performs well in terms of estimating the marginals {p (x t y 1:t )} t 1. This is what is only necessary in many applications thankfully. However, the joint distribution p (x 1:t y 1:t ) is poorly estimated when t is large; i.e. we have in the previous example p (x 1:11 y 1:24 ) = δ X 1:11 (x 1:11 ). Degeneracy problem. For any N and any k, there exists t (k, N) such that for any t t (k, N) p (x 1:k y 1:t ) = δ X 1:k (x 1:k ) ; p (x 1:t y 1:t ) is an unreliable approximation of p (x 1:t y 1:t ) as t. A. Doucet (MLSS Sept. 2012) Sept / 136

97 Another Illustration of the Degeneracy Phenomenon For the linear Gaussian state-space model described before, we can compute exactly S t /t where ( ) t S t = xk 2 p (x 1:t y 1:t ) dx 1:t k=1 using Kalman techniques. A. Doucet (MLSS Sept. 2012) Sept / 136

98 Another Illustration of the Degeneracy Phenomenon For the linear Gaussian state-space model described before, we can compute exactly S t /t where ( ) t S t = xk 2 p (x 1:t y 1:t ) dx 1:t k=1 using Kalman techniques. We compute the SMC estimate of this quantity using Ŝ t /t where ( ) t Ŝ t = xk 2 p (x 1:t y 1:t ) dx 1:t k=1 can be computed sequentially. A. Doucet (MLSS Sept. 2012) Sept / 136

99 Another Illustration of the Degeneracy Phenomenon Figure: S t /t obtained through the Kalman smoother (blue) and its SMC estimate Ŝ t /t (red). A. Doucet (MLSS Sept. 2012) Sept / 136

100 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). A. Doucet (MLSS Sept. 2012) Sept / 136

101 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). Let ϕ t : X t R and consider ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t, ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t = 1 N N ( ϕ t i=1 X (i) 1:t ). A. Doucet (MLSS Sept. 2012) Sept / 136

102 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). Let ϕ t : X t R and consider ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t, ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t = 1 N N ( ϕ t i=1 X (i) 1:t We can prove that for any bounded function ϕ and any p 1 E [ ϕ t ϕ t p ] 1/p B (t) c (p) ϕ, N lim N ( ϕt ϕ N t ) N ( 0, σ 2 ) t. ). A. Doucet (MLSS Sept. 2012) Sept / 136

103 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). Let ϕ t : X t R and consider ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t, ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t = 1 N N ( ϕ t i=1 X (i) 1:t We can prove that for any bounded function ϕ and any p 1 E [ ϕ t ϕ t p ] 1/p B (t) c (p) ϕ, N lim N ( ϕt ϕ N t ) N ( 0, σ 2 ) t. Very weak results: B (t) and σ 2 t can increase with t and will for a path-dependent ϕ t (x 1:t ) as the degeneracy problem suggests. A. Doucet (MLSS Sept. 2012) Sept / 136 ).

104 Stronger Convergence Results Assume the following exponentially stability assumption: For any x 1, x 1 1 p (x t y 2:t, X 1 = x 1 ) p ( x t y 2:t, X 1 = x ) 1 dx t α t for 0 α < 1. 2 A. Doucet (MLSS Sept. 2012) Sept / 136

105 Stronger Convergence Results Assume the following exponentially stability assumption: For any x 1, x 1 1 p (x t y 2:t, X 1 = x 1 ) p ( x t y 2:t, X 1 = x ) 1 dx t α t for 0 α < 1. 2 Marginal distribution. For ϕ t (x 1:t ) = ϕ (x t L:t ), there exists B 1, B 2 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N, lim N N ( ϕt ϕ t ) N ( 0, σ 2 ) t where σ 2 t B 2, i.e. there is no accumulation of numerical errors over time. A. Doucet (MLSS Sept. 2012) Sept / 136

106 Stronger Convergence Results Assume the following exponentially stability assumption: For any x 1, x 1 1 p (x t y 2:t, X 1 = x 1 ) p ( x t y 2:t, X 1 = x ) 1 dx t α t for 0 α < 1. 2 Marginal distribution. For ϕ t (x 1:t ) = ϕ (x t L:t ), there exists B 1, B 2 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N, lim N N ( ϕt ϕ t ) N ( 0, σ 2 ) t where σ 2 t B 2, i.e. there is no accumulation of numerical errors over time. L1 distance. If p (x 1:t y 1:t ) = E ( p (x 1:t y 1:t )), there exists B 3 < s.t. p (x 1:t y 1:t ) p (x 1:t y 1:t ) dx 1:t B 3 t N ; i.e. the bias only increases in t. A. Doucet (MLSS Sept. 2012) Sept / 136

107 Stronger Convergence Results Unbiasedness. The marginal likelihood estimate is unbiased E ( p (y 1:t )) = p (y 1:t ). A. Doucet (MLSS Sept. 2012) Sept / 136

108 Stronger Convergence Results Unbiasedness. The marginal likelihood estimate is unbiased E ( p (y 1:t )) = p (y 1:t ). Relative Variance Bound. There exists B 4 < ( ) ) ( p (y1:t ) 2 E p (y 1:t ) 1 B 4 t N A. Doucet (MLSS Sept. 2012) Sept / 136

109 Stronger Convergence Results Unbiasedness. The marginal likelihood estimate is unbiased E ( p (y 1:t )) = p (y 1:t ). Relative Variance Bound. There exists B 4 < ( ) ) ( p (y1:t ) 2 E p (y 1:t ) 1 B 4 t N Central Limit Theorem. There exists B 5 < s.t. N (log p (y1:t ) log p (y 1:t )) N ( 0, σ 2 ) t with σ 2 t B 5 t. lim N A. Doucet (MLSS Sept. 2012) Sept / 136

110 Basic Idea Used to Establish Uniform Lp Bounds We denote η k (x k ) = p (x k y 1:k 1 ) and η k (x k ) = p (x k y 1:k 1 ) its particle approximation. A. Doucet (MLSS Sept. 2012) Sept / 136

111 Basic Idea Used to Establish Uniform Lp Bounds We denote and η k (x k ) = p (x k y 1:k 1 ) η k (x k ) = p (x k y 1:k 1 ) its particle approximation. Let Φ k,t be the measure-valued mapping such that η t = Φ k,t (η k ), which satifies Φ k,t (η k ) (x t ) = η k (x k ).p (y k :t 1 x k ) p (x t x k, y k+1:t 1 ) dx k. ηk (x k ) p (y k :t 1 x k ) dx k }{{} p(x k y 1:t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136

112 Key Decomposition Formula η 1 η 2 = Φ 1,2 (η 1 ) η t = Φ 1,t (η 1 ) η 1 Φ 1,2 ( η 1 ) Φ 1,t ( η 1 ) η 2 Φ 2,t ( η 2 ) η t 1 ) Φ t 1,t ( ηt 1 Decomposition of the error η t η t = η t t [ ( ))] Φk,t ( η k ) Φ k,t Φk 1,k ( ηk 1 k=1 A. Doucet (MLSS Sept. 2012) Sept / 136

113 Stability Properties We have p (x t x k, y k+1:t 1 ) = p (x k+1:t x k, y k+1:t 1 ) dx k+1:t 1 where p (x k+1:t x k, y k+1:t 1 ) = t p (x m x m 1, y m:t 1 ) m=k+1 A. Doucet (MLSS Sept. 2012) Sept / 136

114 Stability Properties We have p (x t x k, y k+1:t 1 ) = p (x k+1:t x k, y k+1:t 1 ) dx k+1:t 1 where p (x k+1:t x k, y k+1:t 1 ) = To summarize, we have Φ k,t (η k ) (x t ) = t p (x m x m 1, y m:t 1 ) m=k+1 η k (x k ).p (y k :t 1 x k ) ηk (x k ) p (y k :t 1 x k ) dx k }{{} p(x k y 1:t 1 ) t m=k+1 p (x m x m 1, y m:t 1 ) dx k :t 1 A. Doucet (MLSS Sept. 2012) Sept / 136

115 Stability Properties Assume there exists ɛ > 0 s.t. for any x, x and for any y, x, ɛ 1 ν ( x ) f ( x x ) ɛν ( x ) 0 < g g (y x) g < then there exists 0 λ < 1 1 ( Φ k,k+t (η) (x) Φ k,k+t η ) (x) dx λ t 2 A. Doucet (MLSS Sept. 2012) Sept / 136

116 Stability Properties Assume there exists ɛ > 0 s.t. for any x, x and for any y, x, ɛ 1 ν ( x ) f ( x x ) ɛν ( x ) 0 < g g (y x) g < then there exists 0 λ < 1 1 ( Φ k,k+t (η) (x) Φ k,k+t η ) (x) dx λ t 2 Hence we have as (t k). Φ k,t (η k ) (x t ) Φ k,t ( η k ) (xt ) A. Doucet (MLSS Sept. 2012) Sept / 136

117 Putting Everything Together Under such strong mixing assumptions η t η t = t k=1 [ Φk,t ( η k ) Φ k,t ( Φk 1,k ( ηk 1 ))] } {{ } 1 λ t k+1 for 0 λ 1 N A. Doucet (MLSS Sept. 2012) Sept / 136

118 Putting Everything Together Under such strong mixing assumptions η t η t = t k=1 [ Φk,t ( η k ) Φ k,t ( Φk 1,k ( ηk 1 ))] } {{ } 1 λ t k+1 for 0 λ 1 N We can then obtain results such as there exists B 1 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N A. Doucet (MLSS Sept. 2012) Sept / 136

119 Putting Everything Together Under such strong mixing assumptions η t η t = t k=1 [ Φk,t ( η k ) Φ k,t ( Φk 1,k ( ηk 1 ))] } {{ } 1 λ t k+1 for 0 λ 1 N We can then obtain results such as there exists B 1 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N Much work has been done recently on removing such strong mixing assumptions; e.g. Whiteley (2012) for much weaker and realistic assumptions. A. Doucet (MLSS Sept. 2012) Sept / 136

120 Summary SMC provide consistent estimates under weak assumptions. A. Doucet (MLSS Sept. 2012) Sept / 136

121 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. A. Doucet (MLSS Sept. 2012) Sept / 136

122 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. A. Doucet (MLSS Sept. 2012) Sept / 136

123 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. Even under stability assumptions, one cannot expect to obtain uniform in time stability for SMC estimates of {p (x 1:t y 1:t )} t 1 ; this is due to the degeneracy problem. A. Doucet (MLSS Sept. 2012) Sept / 136

124 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. Even under stability assumptions, one cannot expect to obtain uniform in time stability for SMC estimates of {p (x 1:t y 1:t )} t 1 ; this is due to the degeneracy problem. Is it possible to Q1: eliminate, Q2: mitigate the degeneracy problem? A. Doucet (MLSS Sept. 2012) Sept / 136

125 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. Even under stability assumptions, one cannot expect to obtain uniform in time stability for SMC estimates of {p (x 1:t y 1:t )} t 1 ; this is due to the degeneracy problem. Is it possible to Q1: eliminate, Q2: mitigate the degeneracy problem? Answer: Q1: no, Q2: yes. A. Doucet (MLSS Sept. 2012) Sept / 136

126 Is Resampling Really Necessary? Resampling is the source of the degeneracy problem and might appear wasteful. A. Doucet (MLSS Sept. 2012) Sept / 136

127 Is Resampling Really Necessary? Resampling is the source of the degeneracy problem and might appear wasteful. The resampling step is an unbiased operation E [ p (x 1:t y 1:t ) p (x 1:t y 1:t )] = p (x 1:t y 1:t ) but clearly it introduces some errors locally in time. That is for any test function, we have [ ] [ ] V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t A. Doucet (MLSS Sept. 2012) Sept / 136

128 Is Resampling Really Necessary? Resampling is the source of the degeneracy problem and might appear wasteful. The resampling step is an unbiased operation E [ p (x 1:t y 1:t ) p (x 1:t y 1:t )] = p (x 1:t y 1:t ) but clearly it introduces some errors locally in time. That is for any test function, we have [ ] [ ] V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t What about eliminating the resampling step? A. Doucet (MLSS Sept. 2012) Sept / 136

129 Sequential Importance Samping: SMC Without Resampling In this case, the estimate of the posterior is p SIS (x 1:t y 1:t ) = where X (i) 1:t p (x 1:t) and W (i) t ( p N i=1 ) y 1:t X (i) 1:t W (i) t δ (i) X (x 1:t ) 1:t t ( g k=1 ) y k X (i) t. A. Doucet (MLSS Sept. 2012) Sept / 136

130 Sequential Importance Samping: SMC Without Resampling In this case, the estimate of the posterior is p SIS (x 1:t y 1:t ) = where X (i) 1:t p (x 1:t) and W (i) t ( p N i=1 ) y 1:t X (i) 1:t W (i) t δ (i) X (x 1:t ) 1:t t ( g k=1 In this case, the marginal likelihood estimate is p SIS (y 1:t ) = 1 N ) y k X (i) t. N ( ) p y 1:t X (i) 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136

131 Sequential Importance Samping: SMC Without Resampling In this case, the estimate of the posterior is p SIS (x 1:t y 1:t ) = where X (i) 1:t p (x 1:t) and W (i) t ( p N i=1 ) y 1:t X (i) 1:t W (i) t δ (i) X (x 1:t ) 1:t t ( g k=1 In this case, the marginal likelihood estimate is p SIS (y 1:t ) = 1 N ) y k X (i) t. N ( ) p y 1:t X (i) 1:t i=1 ( ) Relative variance of p y 1:t X (i) t 1:t = g k=1 exponentially fast... ( ) y k X (i) t is increasing A. Doucet (MLSS Sept. 2012) Sept / 136

132 SIS For Stochastic Volatility Model Figure: Histograms of log 10 ( t = 100 (bottom) Importance Weights (base 10 logarithm) W (i ) t ) for t = 1 (top), t = 50 (middle) and The algorithm performance collapse as t increases as expected. A. Doucet (MLSS Sept. 2012) Sept / 136

133 Central Limit Theorems For both SIS and SMC, we have a CLT for the estimates of the marginal likelihood ) ( psis (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,sis), ) ( psmc (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,smc). A. Doucet (MLSS Sept. 2012) Sept / 136

134 Central Limit Theorems For both SIS and SMC, we have a CLT for the estimates of the marginal likelihood ) ( psis (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,sis), ) ( psmc (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,smc). The variance expressions are σ 2 t,sis = p 2 ( x 1:t y 1:t ) p(x 1:t dx ) 1:t 1 = σ 2 t,smc = p 2 ( x 1 y 1:t ) µ(x 1 dx ) 1 + t k=2 g = 2 ( y 1 x 1 )µ(x 1 )dx 1 p 2 (y 1 + ) t k=2 p 2 ( y 1:t x 1:t )p(x 1:t )dx 1:t p 2 (y 1:t ) 1 p 2 ( x 1:k y 1:t ) p( x 1:k 1 y 1:k 1 )f ( x k x k 1 ) dx 1:k t p 2 ( y k :t x k )p( x k y 1:k 1 )dx k p 2 ( y k :t y 1:k 1 ) t A. Doucet (MLSS Sept. 2012) Sept / 136

135 Central Limit Theorems For both SIS and SMC, we have a CLT for the estimates of the marginal likelihood ) ( psis (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,sis), ) ( psmc (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,smc). The variance expressions are σ 2 t,sis = p 2 ( x 1:t y 1:t ) p(x 1:t dx ) 1:t 1 = σ 2 t,smc = p 2 ( x 1 y 1:t ) µ(x 1 dx ) 1 + t k=2 g = 2 ( y 1 x 1 )µ(x 1 )dx 1 p 2 (y 1 + ) t k=2 p 2 ( y 1:t x 1:t )p(x 1:t )dx 1:t p 2 (y 1:t ) 1 p 2 ( x 1:k y 1:t ) p( x 1:k 1 y 1:k 1 )f ( x k x k 1 ) dx 1:k t p 2 ( y k :t x k )p( x k y 1:k 1 )dx k p 2 ( y k :t y 1:k 1 ) SMC breaks the integral over X t into t integrals over X. t A. Doucet (MLSS Sept. 2012) Sept / 136

136 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. A. Doucet (MLSS Sept. 2012) Sept / 136

137 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. Assume we observe y 1 = = y t = 0 then we have ) [ ( ( psis (y 1:t ) V = σ2 t,sis p (y 1:t ) N = 1 ) σ 4 t/2 N 2σ 2 1], 1 ) [ ( ( psmc (y 1:t ) V σ2 t,smc = t ) σ 4 1/2 p (y 1:t ) N N 2σ 2 1]. 1 A. Doucet (MLSS Sept. 2012) Sept / 136

138 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. Assume we observe y 1 = = y t = 0 then we have ) [ ( ( psis (y 1:t ) V = σ2 t,sis p (y 1:t ) N = 1 ) σ 4 t/2 N 2σ 2 1], 1 ) [ ( ( psmc (y 1:t ) V σ2 t,smc = t ) σ 4 1/2 p (y 1:t ) N N 2σ 2 1]. 1 If select σ 2 = 1.2 then it is necessary to use N particles to obtain σ2 t,sis N = 10 2 for t = A. Doucet (MLSS Sept. 2012) Sept / 136

139 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. Assume we observe y 1 = = y t = 0 then we have ) [ ( ( psis (y 1:t ) V = σ2 t,sis p (y 1:t ) N = 1 ) σ 4 t/2 N 2σ 2 1], 1 ) [ ( ( psmc (y 1:t ) V σ2 t,smc = t ) σ 4 1/2 p (y 1:t ) N N 2σ 2 1]. 1 If select σ 2 = 1.2 then it is necessary to use N particles to obtain σ2 t,sis N = 10 2 for t = To obtain σ2 t,smc N = 10 2, SMC requires only N 10 4 particles: improvement by 19 orders of magnitude! A. Doucet (MLSS Sept. 2012) Sept / 136

140 Better Resampling Schemes [ Better resampling steps can be designed such that E [ ] ( but V < NW (i) t 1 W (i) t entropy resampling etc. (Cappé et al., 2005). N (i) t N (i) t ] = NW (i) t ) ; residual resampling, minimal A. Doucet (MLSS Sept. 2012) Sept / 136

141 Better Resampling Schemes [ Better resampling steps can be designed such that E [ ] ( but V < NW (i) t 1 W (i) t entropy resampling etc. (Cappé et al., 2005). N (i) t Residual Resampling. Set Ñ (i) t = NW (i) t ( multinomial of parameters N, W (1:N ) ) t where W (i) t W (i) t N 1 Ñ (i) t then set N (i) t N (i) t ] = NW (i) t ) ; residual resampling, minimal, sample N 1:N t = Ñ (i) t + N (i) t. from a A. Doucet (MLSS Sept. 2012) Sept / 136

142 Better Resampling Schemes [ Better resampling steps can be designed such that E [ ] ( but V < NW (i) t 1 W (i) t entropy resampling etc. (Cappé et al., 2005). N (i) t Residual Resampling. Set Ñ (i) t = NW (i) t ( multinomial of parameters N, W (1:N ) ) t where N (i) t ] = NW (i) t ) ; residual resampling, minimal, sample N 1:N t from a W (i) t W (i) t N 1 Ñ (i) t then set N (i) t = Ñ (i) t + N (i) t. Systematic Resampling. Sample U 1 U [ 0, 1 ] N and define U i = U { 1 + i 1 N for i = 2,..., N, then set } Nt i = U j : i 1 k=1 W (k) t U j i k=1 W (k) t with the convention 0 k=1 := 0. A. Doucet (MLSS Sept. 2012) Sept / 136

143 Measuring Variability of the Weights To measure the variation of the weights, we can use the Effective Sample Size (ESS) ( N ( ESS = i=1 W (i) t ) 2 ) 1 A. Doucet (MLSS Sept. 2012) Sept / 136

144 Measuring Variability of the Weights To measure the variation of the weights, we can use the Effective Sample Size (ESS) We have ESS = N if W (i) t and W (j) t = 1 for j = i. ( N ( ESS = i=1 W (i) t ) 2 ) 1 = 1/N for any i and ESS = 1 if W (i) t = 1 A. Doucet (MLSS Sept. 2012) Sept / 136

145 Measuring Variability of the Weights To measure the variation of the weights, we can use the Effective Sample Size (ESS) We have ESS = N if W (i) t ( N ( ESS = i=1 W (i) t ) 2 ) 1 = 1/N for any i and ESS = 1 if W (i) t = 1 and W (j) t = 1 for j = i. Liu (1996) showed that for simple importance sampling for ϕ regular enough V ( N i=1 ( W (i) t ϕ X (i) t ) ) V p( x1:t y 1:t ) ( 1 ESS ESS ( ϕ i=1 X (i) t ) ) ; i.e. the estimate is roughly as accurate as using an iid sample of size ESS from p (x 1:t y 1:t ). A. Doucet (MLSS Sept. 2012) Sept / 136

146 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. A. Doucet (MLSS Sept. 2012) Sept / 136

147 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. Dynamic Resampling: If the variation of the weights as measured by ESS is too high, e.g. ESS < N/2, then resample the particles. A. Doucet (MLSS Sept. 2012) Sept / 136

148 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. Dynamic Resampling: If the variation of the weights as measured by ESS is too high, e.g. ESS < N/2, then resample the particles. We can also use the entropy Ent = N i=1 W (i) t log 2 ( W (i) t ) A. Doucet (MLSS Sept. 2012) Sept / 136

149 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. Dynamic Resampling: If the variation of the weights as measured by ESS is too high, e.g. ESS < N/2, then resample the particles. We can also use the entropy Ent = N i=1 W (i) t log 2 ( W (i) t We have Ent = log 2 (N) if W (i) t = 1/N for any i. We have Ent = 0 if W (i) t = 1 and W (j) t = 1 for j = i. ) A. Doucet (MLSS Sept. 2012) Sept / 136

150 Improving the Sampling Step Bootstrap filter. Sample particles blindly according to the prior without taking into account the observation Very ineffi cient for vague prior/peaky likelihood. A. Doucet (MLSS Sept. 2012) Sept / 136

151 Improving the Sampling Step Bootstrap filter. Sample particles blindly according to the prior without taking into account the observation Very ineffi cient for vague prior/peaky likelihood. Optimal proposal/perfect adaptation. Implement the following alternative update-propagate Bayesian recursion where Update p (x 1:t 1 y 1:t ) = p( y t x t 1 )p( x 1:t 1 y 1:t 1 ) p( y t y 1:t 1 ) Propagate p (x 1:t y 1:t ) = p (x 1:t 1 y 1:t ) p (x t y t, x t 1 ) p (x t y t, x t 1 ) = f (x t x t 1 ) g (y t x t 1 ) p (y t x t 1 ) Much more effi cient when applicable; e.g. f (x t x t 1 ) = N (x t ; ϕ (x t 1 ), Σ v ), g (y t x t ) = N (y t ; x t, Σ w ). A. Doucet (MLSS Sept. 2012) Sept / 136

152 A General Bayesian Recursion Introduce an arbitrary proposal distribution q (x t y t, x t 1 ); i.e. an approximation to p (x t y t, x t 1 ). A. Doucet (MLSS Sept. 2012) Sept / 136

153 A General Bayesian Recursion Introduce an arbitrary proposal distribution q (x t y t, x t 1 ); i.e. an approximation to p (x t y t, x t 1 ). We have seen that so clearly p (x 1:t y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (x 1:t y 1:t ) = w (x t 1, x t, y t ) q (x t y t, x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) where w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ) q (x t y t, x t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136

154 A General Bayesian Recursion Introduce an arbitrary proposal distribution q (x t y t, x t 1 ); i.e. an approximation to p (x t y t, x t 1 ). We have seen that so clearly where p (x 1:t y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (x 1:t y 1:t ) = w (x t 1, x t, y t ) q (x t y t, x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ) q (x t y t, x t 1 ) This suggests a more general SMC algorithm. A. Doucet (MLSS Sept. 2012) Sept / 136

155 A General SMC Algorithm { } Assume we have N weighted particles W (i) t 1, X (i) 1:t 1 approximating p (x 1:t 1 y 1:t 1 ) then at time t, ( ) ( ) Sample X (i) t q x t y t, X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and p (x 1:t y 1:t ) = N i=1 W (i) t W (i) f t 1 W (i) t δ X (i) (x 1:t ), 1:t ( X (i) t q ) ( ) X (i) t 1 g y t X (i) t ( ) X (i) t y t, X (i). t 1 A. Doucet (MLSS Sept. 2012) Sept / 136

156 A General SMC Algorithm { } Assume we have N weighted particles W (i) t 1, X (i) 1:t 1 approximating p (x 1:t 1 y 1:t 1 ) then at time t, ( ) ( ) Sample X (i) t q x t y t, X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and p (x 1:t y 1:t ) = N i=1 W (i) t W (i) f t 1 W (i) t δ X (i) (x 1:t ), 1:t ( X (i) t q ) ( ) X (i) t 1 g y t X (i) t ( ) X (i) t y t, X (i). If ESS< N/2 resample X (i) 1:t p (x 1:t y 1:t ) and set W (i) t 1 N to obtain p (x 1:t y 1:t ) = 1 N N i=1 δ (i) X (x 1:t ). 1:t t 1 A. Doucet (MLSS Sept. 2012) Sept / 136

157 Building Proposals Our aim is to select q (x t y t, x t 1 ) as close as possible to p (x t y t, x t 1 ) as this minimizes the variance of w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ). q (x t y t, x t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136

158 Building Proposals Our aim is to select q (x t y t, x t 1 ) as close as possible to p (x t y t, x t 1 ) as this minimizes the variance of w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ). q (x t y t, x t 1 ) Example - EKF proposal: Let X t = ϕ (X t 1 ) + V t, Y t = Ψ (X t ) + W t, with V t N (0, Σ v ), W t N (0, Σ w ). We perform local linearization Ψ (x) Y t Ψ (ϕ (X t 1 )) + (X t ϕ (X t 1 )) + W t x and use as a proposal. ϕ(xt 1 ) q (x t y t, x t 1 ) ĝ (y t x t ) f (x t x t 1 ). A. Doucet (MLSS Sept. 2012) Sept / 136

159 Building Proposals Our aim is to select q (x t y t, x t 1 ) as close as possible to p (x t y t, x t 1 ) as this minimizes the variance of w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ). q (x t y t, x t 1 ) Example - EKF proposal: Let X t = ϕ (X t 1 ) + V t, Y t = Ψ (X t ) + W t, with V t N (0, Σ v ), W t N (0, Σ w ). We perform local linearization Ψ (x) Y t Ψ (ϕ (X t 1 )) + (X t ϕ (X t 1 )) + W t x and use as a proposal. ϕ(xt 1 ) q (x t y t, x t 1 ) ĝ (y t x t ) f (x t x t 1 ). Any standard suboptimal filtering methods can be used: Unscented Particle filter, Gaussan Quadrature particle filter etc. A. Doucet (MLSS Sept. 2012) Sept / 136

160 Implicit Proposals Proposed recently by Chorin (2012). Let F (x t 1, x t ) = log g (y t x t ) + log f (x t x t 1 ) and xt = arg max F (x t 1, x t ) = arg max p (x t y t, x t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136

161 Implicit Proposals Proposed recently by Chorin (2012). Let and F (x t 1, x t ) = log g (y t x t ) + log f (x t x t 1 ) x t = arg max F (x t 1, x t ) = arg max p (x t y t, x t 1 ) We sample Z N (0, I nx ), then we solve in X t F (x t 1, x t ) F (x t 1, X t ) = 1 2 Z T Z, Z N (0, I nx ) so if there is a unique solution q (x t y t, x t 1 ) = p Z (z) det z/ x t exp ( F (x t 1, xt )) g (y t x t ) f (x t x t 1 ) det x t / z A. Doucet (MLSS Sept. 2012) Sept / 136

162 Implicit Proposals Proposed recently by Chorin (2012). Let and F (x t 1, x t ) = log g (y t x t ) + log f (x t x t 1 ) x t = arg max F (x t 1, x t ) = arg max p (x t y t, x t 1 ) We sample Z N (0, I nx ), then we solve in X t F (x t 1, x t ) F (x t 1, X t ) = 1 2 Z T Z, Z N (0, I nx ) so if there is a unique solution q (x t y t, x t 1 ) = p Z (z) det z/ x t The incremental weight is g (y t x t ) f (x t x t 1 ) q (x t y t, x t 1 ) exp ( F (x t 1, xt )) g (y t x t ) f (x t x t 1 ) det x t / z det x t / z exp (F (x t 1, x t )) A. Doucet (MLSS Sept. 2012) Sept / 136

163 Auxiliary Particle Filters Popular variation introduced by (Pitt & Shephard, 1999). A. Doucet (MLSS Sept. 2012) Sept / 136

164 Auxiliary Particle Filters Popular variation introduced by (Pitt & Shephard, 1999). This corresponds to a standard SMC algorithm (Johansen & D., 2008) where we target p (x 1:t y 1:t+1 ) p (x 1:t y 1:t ) p (y t+1 x t ) where p (y t+1 x t ) p (y t+1 x t ) using a proposal p (x t y t, x t 1 ). A. Doucet (MLSS Sept. 2012) Sept / 136

165 Auxiliary Particle Filters Popular variation introduced by (Pitt & Shephard, 1999). This corresponds to a standard SMC algorithm (Johansen & D., 2008) where we target p (x 1:t y 1:t+1 ) p (x 1:t y 1:t ) p (y t+1 x t ) where p (y t+1 x t ) p (y t+1 x t ) using a proposal p (x t y t, x t 1 ). When p (y t+1 x t ) = p (y t+1 x t ) and p (x t+1 y t+1, x t ) = p (x t+1 y t+1, x t ) then we are back to perfect adaptation. A. Doucet (MLSS Sept. 2012) Sept / 136

166 Block Sampling Proposals Problem: we only sample X t at time t so, even if you use p (x t y t, x t 1 ), the SMC estimates could have high variance if V p( xt 1 y 1:t 1 ) [p (y t x t 1 )] is high. A. Doucet (MLSS Sept. 2012) Sept / 136

167 Block Sampling Proposals Problem: we only sample X t at time t so, even if you use p (x t y t, x t 1 ), the SMC estimates could have high variance if V p( xt 1 y 1:t 1 ) [p (y t x t 1 )] is high. Block sampling idea: allows yourself to sample again X t L+1:t 1 as well as X t in light of y t. Optimally we would like at time t to sample ( ) X (i) t L+1:t p x t L+1:t y t L+1:t, X (i) t L and W (i) t W (i) ( p ( ) y 1:t ) X (i) t L+1:t y t L+1:t, X (i) t L ) X (i) 1:t t 1 ( ) p X (i) 1:t L y 1:t 1 p ( W (i) t 1 p y t y t L+1:t 1, X (i) t L A. Doucet (MLSS Sept. 2012) Sept / 136

168 Block Sampling Proposals Problem: we only sample X t at time t so, even if you use p (x t y t, x t 1 ), the SMC estimates could have high variance if V p( xt 1 y 1:t 1 ) [p (y t x t 1 )] is high. Block sampling idea: allows yourself to sample again X t L+1:t 1 as well as X t in light of y t. Optimally we would like at time t to sample ( ) X (i) t L+1:t p x t L+1:t y t L+1:t, X (i) t L and W (i) t W (i) ( p ( ) y 1:t ) X (i) t L+1:t y t L+1:t, X (i) t L ) X (i) 1:t t 1 ( ) p X (i) 1:t L y 1:t 1 p ( W (i) t 1 p y t y t L+1:t 1, X (i) t L When p (x t L+1:t y t L+1:t, x t L ) and p (y t y t L+1:t 1, x t L ) are not available, we can use analytical approximations of them and still have consistent estimates (D., Briers & Senecal, 2006). A. Doucet (MLSS Sept. 2012) Sept / 136

169 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? A. Doucet (MLSS Sept. 2012) Sept / 136

170 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? Consider the ideal scenario where X t = X t 1 + V t Y t = X t + W t where X 1 N (0, 1) and V t, W t i.i.d. N (0, 1). A. Doucet (MLSS Sept. 2012) Sept / 136

171 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? Consider the ideal scenario where X t = X t 1 + V t Y t = X t + W t where X 1 N (0, 1) and V t, W t i.i.d. N (0, 1). In this case, we have p(y t y t L+1:t 1, x t L ) p(y t y t L+1:t 1, x t L) < c x t L x t L /2 L where the rate of exponential convergence depends upon the signal-to-noise ratio if more general Gaussian AR are considered. A. Doucet (MLSS Sept. 2012) Sept / 136

172 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? Consider the ideal scenario where X t = X t 1 + V t Y t = X t + W t where X 1 N (0, 1) and V t, W t i.i.d. N (0, 1). In this case, we have p(y t y t L+1:t 1, x t L ) p(y t y t L+1:t 1, x t L) < c x t L x t L /2 L where the rate of exponential convergence depends upon the signal-to-noise ratio if more general Gaussian AR are considered. We can obtain an analytic expression of the variance of the (normalized) weight. A. Doucet (MLSS Sept. 2012) Sept / 136

173 Block Sampling Proposals Variance of incremental weight w.r.t. p ( x1:t A. Doucet (MLSS Sept. 2012) L j y1:t 1 ). Sept / 136

An Brief Overview of Particle Filtering

An Brief Overview of Particle Filtering 1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Particle Filters: Convergence Results and High Dimensions

Particle Filters: Convergence Results and High Dimensions Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline

More information

An introduction to particle filters

An introduction to particle filters An introduction to particle filters Andreas Svensson Department of Information Technology Uppsala University June 10, 2014 June 10, 2014, 1 / 16 Andreas Svensson - An introduction to particle filters Outline

More information

Lecture Particle Filters. Magnus Wiktorsson

Lecture Particle Filters. Magnus Wiktorsson Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Lecture Particle Filters

Lecture Particle Filters FMS161/MASM18 Financial Statistics November 29, 2010 Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

The Kalman Filter ImPr Talk

The Kalman Filter ImPr Talk The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Particle Filtering for Data-Driven Simulation and Optimization

Particle Filtering for Data-Driven Simulation and Optimization Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Sensor Fusion: Particle Filter

Sensor Fusion: Particle Filter Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Sequential Monte Carlo Samplers for Applications in High Dimensions

Sequential Monte Carlo Samplers for Applications in High Dimensions Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex

More information

Learning Static Parameters in Stochastic Processes

Learning Static Parameters in Stochastic Processes Learning Static Parameters in Stochastic Processes Bharath Ramsundar December 14, 2012 1 Introduction Consider a Markovian stochastic process X T evolving (perhaps nonlinearly) over time variable T. We

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

Sequential Monte Carlo Methods (for DSGE Models)

Sequential Monte Carlo Methods (for DSGE Models) Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

An efficient stochastic approximation EM algorithm using conditional particle filters

An efficient stochastic approximation EM algorithm using conditional particle filters An efficient stochastic approximation EM algorithm using conditional particle filters Fredrik Lindsten Linköping University Post Print N.B.: When citing this work, cite the original article. Original Publication:

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Sensitivity analysis in HMMs with application to likelihood maximization

Sensitivity analysis in HMMs with application to likelihood maximization Sensitivity analysis in HMMs with application to likelihood maximization Pierre-Arnaud Coquelin, Vekia, Lille, France pacoquelin@vekia.fr Romain Deguest Columbia University, New York City, NY 10027 rd2304@columbia.edu

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Particle Filters. Outline

Particle Filters. Outline Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Bayesian Monte Carlo Filtering for Stochastic Volatility Models Bayesian Monte Carlo Filtering for Stochastic Volatility Models Roberto Casarin CEREMADE University Paris IX (Dauphine) and Dept. of Economics University Ca Foscari, Venice Abstract Modelling of the financial

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand

More information

Seminar: Data Assimilation

Seminar: Data Assimilation Seminar: Data Assimilation Jonas Latz, Elisabeth Ullmann Chair of Numerical Mathematics (M2) Technical University of Munich Jonas Latz, Elisabeth Ullmann (TUM) Data Assimilation 1 / 28 Prerequisites Bachelor:

More information

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC Background Data Assimilation Iterative process Forecast Analysis Background

More information

Introduction to Particle Filters for Data Assimilation

Introduction to Particle Filters for Data Assimilation Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

An Introduction to Sequential Monte Carlo for Filtering and Smoothing

An Introduction to Sequential Monte Carlo for Filtering and Smoothing An Introduction to Sequential Monte Carlo for Filtering and Smoothing Olivier Cappé LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/ cappe/ Acknowlegdment: Eric Moulines (TELECOM ParisTech)

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014

More information

Patterns of Scalable Bayesian Inference Background (Session 1)

Patterns of Scalable Bayesian Inference Background (Session 1) Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles

More information

A State Space Model for Wind Forecast Correction

A State Space Model for Wind Forecast Correction A State Space Model for Wind Forecast Correction Valrie Monbe, Pierre Ailliot 2, and Anne Cuzol 1 1 Lab-STICC, Université Européenne de Bretagne, France (e-mail: valerie.monbet@univ-ubs.fr, anne.cuzol@univ-ubs.fr)

More information

The chopthin algorithm for resampling

The chopthin algorithm for resampling The chopthin algorithm for resampling Axel Gandy F. Din-Houn Lau Department of Mathematics, Imperial College London Abstract Resampling is a standard step in particle filters and more generally sequential

More information

Blind Equalization via Particle Filtering

Blind Equalization via Particle Filtering Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP Advanced Monte Carlo integration methods P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP MCQMC 2012, Sydney, Sunday Tutorial 12-th 2012 Some hyper-refs Feynman-Kac formulae,

More information

A Backward Particle Interpretation of Feynman-Kac Formulae

A Backward Particle Interpretation of Feynman-Kac Formulae A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint

More information

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France. AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER Qi Cheng and Pascal Bondon CNRS UMR 8506, Université Paris XI, France. August 27, 2011 Abstract We present a modified bootstrap filter to draw

More information

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting

More information

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models $ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

A new class of interacting Markov Chain Monte Carlo methods

A new class of interacting Markov Chain Monte Carlo methods A new class of interacting Marov Chain Monte Carlo methods P Del Moral, A Doucet INRIA Bordeaux & UBC Vancouver Worshop on Numerics and Stochastics, Helsini, August 2008 Outline 1 Introduction Stochastic

More information

Sequential Monte Carlo methods for system identification

Sequential Monte Carlo methods for system identification Technical report arxiv:1503.06058v3 [stat.co] 10 Mar 2016 Sequential Monte Carlo methods for system identification Thomas B. Schön, Fredrik Lindsten, Johan Dahlin, Johan Wågberg, Christian A. Naesseth,

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

PROBABILISTIC REASONING OVER TIME

PROBABILISTIC REASONING OVER TIME PROBABILISTIC REASONING OVER TIME In which we try to interpret the present, understand the past, and perhaps predict the future, even when very little is crystal clear. Outline Time and uncertainty Inference:

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

On-Line Parameter Estimation in General State-Space Models

On-Line Parameter Estimation in General State-Space Models On-Line Parameter Estimation in General State-Space Models Christophe Andrieu School of Mathematics University of Bristol, UK. c.andrieu@bris.ac.u Arnaud Doucet Dpts of CS and Statistics Univ. of British

More information

Sequential Monte Carlo Methods in High Dimensions

Sequential Monte Carlo Methods in High Dimensions Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,

More information

SMC 2 : an efficient algorithm for sequential analysis of state-space models

SMC 2 : an efficient algorithm for sequential analysis of state-space models SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra

More information

28 : Approximate Inference - Distributed MCMC

28 : Approximate Inference - Distributed MCMC 10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,

More information

Financial Econometrics

Financial Econometrics Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta

More information

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras 1 Motivation Recall: Discrete filter Discretize the

More information

Autonomous Navigation for Flying Robots

Autonomous Navigation for Flying Robots Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München Motivation Bayes filter is a useful tool for state

More information

Divide-and-Conquer Sequential Monte Carlo

Divide-and-Conquer Sequential Monte Carlo Divide-and-Conquer Joint work with: John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick, Fredrik Lindsten, Christian Næsseth, Thomas Schön University of Warwick a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks/

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Rao-Blackwellized Particle Filter for Multiple Target Tracking Rao-Blackwellized Particle Filter for Multiple Target Tracking Simo Särkkä, Aki Vehtari, Jouko Lampinen Helsinki University of Technology, Finland Abstract In this article we propose a new Rao-Blackwellized

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Bayesian Computations for DSGE Models

Bayesian Computations for DSGE Models Bayesian Computations for DSGE Models Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 This Lecture is Based on Bayesian Estimation of DSGE Models Edward P. Herbst &

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information