Sequential Monte Carlo Methods for Bayesian Computation
|
|
- Sherilyn Moody
- 5 years ago
- Views:
Transcription
1 Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept A. Doucet (MLSS Sept. 2012) Sept / 136
2 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). A. Doucet (MLSS Sept. 2012) Sept / 136
3 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). We observe a realization of y of Y which is assumed to satisfy Y (X = x) g ( x) ; i.e. the likelihood function is g (y x). A. Doucet (MLSS Sept. 2012) Sept / 136
4 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). We observe a realization of y of Y which is assumed to satisfy Y (X = x) g ( x) ; i.e. the likelihood function is g (y x). Bayesian inference on X relies on the posterior of X given Y = y: p (x y) = µ (x) g (y x) p (y) where the marginal likelihood/evidence satisfies p (y) = µ (x) g (y x) dx. A. Doucet (MLSS Sept. 2012) Sept / 136
5 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter of interest with an associated prior µ; i.e. X µ ( ). We observe a realization of y of Y which is assumed to satisfy Y (X = x) g ( x) ; i.e. the likelihood function is g (y x). Bayesian inference on X relies on the posterior of X given Y = y: p (x y) = µ (x) g (y x) p (y) where the marginal likelihood/evidence satisfies p (y) = µ (x) g (y x) dx. Machine learning examples: Latent Dirichlet Allocation, (Hiearchical) Dirichlet processes... A. Doucet (MLSS Sept. 2012) Sept / 136
6 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). A. Doucet (MLSS Sept. 2012) Sept / 136
7 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). A. Doucet (MLSS Sept. 2012) Sept / 136
8 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). Let z i :j := (z i, z i+1,..., z j ) then Bayesian inference on X 1:t relies on the posterior of X 1:t given Y = y 1:t : p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) where the marginal likelihood/evidence satisfies p (y 1:t ) = p (x 1:t, y 1:t ) dx 1:t. A. Doucet (MLSS Sept. 2012) Sept / 136
9 Motivating Example 2: State-Space Models Let {X t } t 1 be a latent/hidden Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). Let z i :j := (z i, z i+1,..., z j ) then Bayesian inference on X 1:t relies on the posterior of X 1:t given Y = y 1:t : p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) where the marginal likelihood/evidence satisfies p (y 1:t ) = p (x 1:t, y 1:t ) dx 1:t. Machine learning examples: Biochemical network models, Dynamic topic models, Neuroscience models etc. A. Doucet (MLSS Sept. 2012) Sept / 136
10 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. A. Doucet (MLSS Sept. 2012) Sept / 136
11 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach A. Doucet (MLSS Sept. 2012) Sept / 136
12 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; A. Doucet (MLSS Sept. 2012) Sept / 136
13 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; A. Doucet (MLSS Sept. 2012) Sept / 136
14 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; A. Doucet (MLSS Sept. 2012) Sept / 136
15 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; uncertainties over model order; A. Doucet (MLSS Sept. 2012) Sept / 136
16 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; uncertainties over model order; model parameters and predictions are technically straightforward to compute; A. Doucet (MLSS Sept. 2012) Sept / 136
17 Bayesian Inference and Machine Learning Bayesian approaches have been adopted by a large part of the ML community. Bayesian inference offers a number of attractive advantages over conventional approach flexibility in constructing complex models from simple parts; the incorporation of prior knowledge is very natural; all modelling assumptions are made explicit; uncertainties over model order; model parameters and predictions are technically straightforward to compute; The cost to pay is that approximate inference techniques are necessary to approximate the resulting posterior distributions for all but trivial models. A. Doucet (MLSS Sept. 2012) Sept / 136
18 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. A. Doucet (MLSS Sept. 2012) Sept / 136
19 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. A. Doucet (MLSS Sept. 2012) Sept / 136
20 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. Expectation-Propagation. A. Doucet (MLSS Sept. 2012) Sept / 136
21 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. Expectation-Propagation. Markov chain Monte Carlo (MCMC) methods. A. Doucet (MLSS Sept. 2012) Sept / 136
22 Approximate Inference Methods Gaussian/Laplace approximation, local linearization, Extended Kalman filters. Variational methods, density assumed filters. Expectation-Propagation. Markov chain Monte Carlo (MCMC) methods. Sequential Monte Carlo (SMC) methods. A. Doucet (MLSS Sept. 2012) Sept / 136
23 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. A. Doucet (MLSS Sept. 2012) Sept / 136
24 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. A. Doucet (MLSS Sept. 2012) Sept / 136
25 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. MCMC are the tools of choice in Bayesian computation for over 20 years whereas SMC have been widely used for 15 years in vision and robotics. A. Doucet (MLSS Sept. 2012) Sept / 136
26 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. MCMC are the tools of choice in Bayesian computation for over 20 years whereas SMC have been widely used for 15 years in vision and robotics. The development of new methodology combined to the emergence of cheap multicore architectures makes now SMC a powerful alternative/complementary approach to MCMC to address general Bayesian computational problems. A. Doucet (MLSS Sept. 2012) Sept / 136
27 Monte Carlo Methods Variational and EP methods are computationally cheap but perform functional approximations of the posteriors of interest. Both MCMC and SMC are asymptotically (as you increase computational efforts) bias-free but computationally expensive. MCMC are the tools of choice in Bayesian computation for over 20 years whereas SMC have been widely used for 15 years in vision and robotics. The development of new methodology combined to the emergence of cheap multicore architectures makes now SMC a powerful alternative/complementary approach to MCMC to address general Bayesian computational problems. The aim of these lectures is to provide an introduction to this active research field and discuss some open research problems. A. Doucet (MLSS Sept. 2012) Sept / 136
28 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, A. Doucet (MLSS Sept. 2012) Sept / 136
29 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, A. Doucet (MLSS Sept. 2012) Sept / 136
30 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, O. Cappé, E. Moulines & T. Ryden, Hidden Markov Models, Springer-Verlag: New York, A. Doucet (MLSS Sept. 2012) Sept / 136
31 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, O. Cappé, E. Moulines & T. Ryden, Hidden Markov Models, Springer-Verlag: New York, Webpage with links to papers and codes: A. Doucet (MLSS Sept. 2012) Sept / 136
32 Some References and Resources A.D., J.F.G. De Freitas & N.J. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer-Verlag: New York, P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer-Verlag: New York, O. Cappé, E. Moulines & T. Ryden, Hidden Markov Models, Springer-Verlag: New York, Webpage with links to papers and codes: Thousands of papers on the subject appear every year. A. Doucet (MLSS Sept. 2012) Sept / 136
33 Organization of Lectures State-Space Models (approx.4 hours) A. Doucet (MLSS Sept. 2012) Sept / 136
34 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing A. Doucet (MLSS Sept. 2012) Sept / 136
35 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference A. Doucet (MLSS Sept. 2012) Sept / 136
36 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference A. Doucet (MLSS Sept. 2012) Sept / 136
37 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) A. Doucet (MLSS Sept. 2012) Sept / 136
38 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions A. Doucet (MLSS Sept. 2012) Sept / 136
39 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions SMC samplers. A. Doucet (MLSS Sept. 2012) Sept / 136
40 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions SMC samplers. Approximate Bayesian Computation. A. Doucet (MLSS Sept. 2012) Sept / 136
41 Organization of Lectures State-Space Models (approx.4 hours) SMC filtering and smoothing Maximum likelihood parameter inference Bayesian parameter inference Beyond State-Space Models (approx. 2 hours) SMC methods for generic sequence of target distributions SMC samplers. Approximate Bayesian Computation. Optimal design, optimal control. A. Doucet (MLSS Sept. 2012) Sept / 136
42 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). A. Doucet (MLSS Sept. 2012) Sept / 136
43 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an Y-valued Markov observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). A. Doucet (MLSS Sept. 2012) Sept / 136
44 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an Y-valued Markov observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). General class of time series models aka Hidden Markov Models (HMM) including X t = Ψ (X t 1, V t ), Y t = Φ (X t, W t ) where V t, W t are two sequences of i.i.d. random variables. A. Doucet (MLSS Sept. 2012) Sept / 136
45 State-Space Models Let {X t } t 1 be a latent/hidden X -valued Markov process with X 1 µ ( ) and X t (X t 1 = x) f ( x). Let {Y t } t 1 be an Y-valued Markov observation process such that observations are conditionally independent given {X t } t 1 and Y t (X t = x) g ( x). General class of time series models aka Hidden Markov Models (HMM) including X t = Ψ (X t 1, V t ), Y t = Φ (X t, W t ) where V t, W t are two sequences of i.i.d. random variables. Aim: Infer {X t } given observations {Y t } on-line or off-line. A. Doucet (MLSS Sept. 2012) Sept / 136
46 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). A. Doucet (MLSS Sept. 2012) Sept / 136
47 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). Finite State-space HMM: X is a finite space, i.e. {X t } is a finite Markov chain Y t (X t = x) g ( x) A. Doucet (MLSS Sept. 2012) Sept / 136
48 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). Finite State-space HMM: X is a finite space, i.e. {X t } is a finite Markov chain Y t (X t = x) g ( x) Linear Gaussian state-space model X t = AX t 1 + BV t, V t i.i.d. N (0, I ) Y t = CX t + DW t, W t i.i.d. N (0, I ) A. Doucet (MLSS Sept. 2012) Sept / 136
49 State-Space Models State-space models are ubiquitous in control, data mining, econometrics, geosciences, system biology etc. Since Jan. 2012, more than 13,500 papers have already appeared (source: Google Scholar). Finite State-space HMM: X is a finite space, i.e. {X t } is a finite Markov chain Y t (X t = x) g ( x) Linear Gaussian state-space model X t = AX t 1 + BV t, V t i.i.d. N (0, I ) i.i.d. Y t = CX t + DW t, W t N (0, I ) Switching Linear Gaussian state-space model: X t = ( Xt 1, Xt 2 ) where { Xt 1 } is a finite Markov chain, Xt 2 = A ( Xt 1 ) X 2 t 1 + B ( Xt 1 ) i.i.d. Vt, V t N (0, I ) Y t = C ( X 1 t ) X 2 t + D ( Xt 1 ) Wt, W t i.i.d. N (0, I ) A. Doucet (MLSS Sept. 2012) Sept / 136
50 State-Space Models Stochastic Volatility model X t = φx t 1 + σv t, V t i.i.d. N (0, 1) Y t = β exp (X t /2) W t, W t i.i.d. N (0, 1) A. Doucet (MLSS Sept. 2012) Sept / 136
51 State-Space Models Stochastic Volatility model X t = φx t 1 + σv t, V t i.i.d. N (0, 1) Y t = β exp (X t /2) W t, W t i.i.d. N (0, 1) Biochemical Network model Pr ( Xt+dt 1 =x t 1 +1, Xt+dt 2 =x t 2 xt 1, xt 2 ) = α x 1 t dt + o (dt), Pr ( Xt+dt 1 =x t 1 1, Xt+dt 2 =x t 2 +1 xt 1, xt 2 ) = β x 1 t xt 2 dt + o (dt), Pr ( Xt+dt 1 =x t 1, Xt+dt 2 =x t 2 1 xt 1, xt 2 ) = γ x 2 t dt + o (dt), with Y k = Xk 1 T + W i.i.d. k with W k N ( 0, σ 2). A. Doucet (MLSS Sept. 2012) Sept / 136
52 State-Space Models Stochastic Volatility model X t = φx t 1 + σv t, V t i.i.d. N (0, 1) Y t = β exp (X t /2) W t, W t i.i.d. N (0, 1) Biochemical Network model Pr ( Xt+dt 1 =x t 1 +1, Xt+dt 2 =x t 2 xt 1, xt 2 ) = α x 1 t dt + o (dt), Pr ( Xt+dt 1 =x t 1 1, Xt+dt 2 =x t 2 +1 xt 1, xt 2 ) = β x 1 t xt 2 dt + o (dt), Pr ( Xt+dt 1 =x t 1, Xt+dt 2 =x t 2 1 xt 1, xt 2 ) = γ x 2 t dt + o (dt), with Y k = Xk 1 T + W i.i.d. k with W k N ( 0, σ 2). Nonlinear Diffusion model dx t = α (X t ) dt + β (X t ) dv t, V t Brownian motion Y k = γ (X k T ) +W k, W k i.i.d. N ( 0, σ 2). A. Doucet (MLSS Sept. 2012) Sept / 136
53 Inference in State-Space Models Given observations y 1:t := (y 1, y 2,..., y t ), inference about X 1:t := (X 1,..., X t ) relies on the posterior where p (x 1:t, y 1:t ) = µ (x 1 ) p (y 1:t ) = p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) t k=2 f (x k x k 1 ) }{{}}{{} p(x 1:t ) p( y 1:t x 1:t ) p (x 1:t, y 1:t ) dx 1:t t k=1 g (y k x k ), A. Doucet (MLSS Sept. 2012) Sept / 136
54 Inference in State-Space Models Given observations y 1:t := (y 1, y 2,..., y t ), inference about X 1:t := (X 1,..., X t ) relies on the posterior where p (x 1:t, y 1:t ) = µ (x 1 ) p (y 1:t ) = p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) t k=2 f (x k x k 1 ) }{{}}{{} p(x 1:t ) p( y 1:t x 1:t ) p (x 1:t, y 1:t ) dx 1:t t k=1 g (y k x k ), When X is finite & linear Gaussian models, {p (x t y 1:t )} t 1 can be computed exactly. For non-linear models, approximations are required: EKF, UKF, Gaussian sum filters, etc. A. Doucet (MLSS Sept. 2012) Sept / 136
55 Inference in State-Space Models Given observations y 1:t := (y 1, y 2,..., y t ), inference about X 1:t := (X 1,..., X t ) relies on the posterior where p (x 1:t, y 1:t ) = µ (x 1 ) p (y 1:t ) = p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) p (y 1:t ) t k=2 f (x k x k 1 ) }{{}}{{} p(x 1:t ) p( y 1:t x 1:t ) p (x 1:t, y 1:t ) dx 1:t t k=1 g (y k x k ), When X is finite & linear Gaussian models, {p (x t y 1:t )} t 1 can be computed exactly. For non-linear models, approximations are required: EKF, UKF, Gaussian sum filters, etc. Approximations of {p (x t y 1:t )} T t=1 provide approximation of p (x 1:T y 1:T ). A. Doucet (MLSS Sept. 2012) Sept / 136
56 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
57 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 Integration is straightforward. ϕt (x 1:t ) p (x 1:t y 1:t ) dx 1:t ϕ t (x 1:t ) p ((x 1:t ) y 1:t ) dx 1:t = 1 N N i=1 ϕ X (i) 1:t A. Doucet (MLSS Sept. 2012) Sept / 136
58 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 Integration is straightforward. ϕt (x 1:t ) p (x 1:t y 1:t ) dx 1:t ϕ t (x 1:t ) p ((x 1:t ) y 1:t ) dx 1:t = 1 N N i=1 ϕ Marginalization is straightforward. X (i) 1:t p (x k y 1:t ) = p (x 1:t y 1:t ) dx 1:k 1 dx k+1:t = 1 N N δ (i) X (x k ). k i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
59 Monte Carlo Methods Basics Assume you can generate X (i) 1:t p (x 1:t y 1:t ) where i = 1,..., N then MC approximation is p (x 1:t y 1:t ) = 1 N N δ (i) X (x 1:t ) 1:t i=1 Integration is straightforward. ϕt (x 1:t ) p (x 1:t y 1:t ) dx 1:t ϕ t (x 1:t ) p ((x 1:t ) y 1:t ) dx 1:t = 1 N N i=1 ϕ Marginalization is straightforward. X (i) 1:t p (x k y 1:t ) = p (x 1:t y 1:t ) dx 1:k 1 dx k+1:t = 1 N [ ( )] Basic and key property: V 1 N N i=1 ϕ = X (i) 1:t N δ (i) X (x k ). k i=1 C (t dim(x )) N, i.e. rate of convergence to zero is independent of dim (X ) and t. A. Doucet (MLSS Sept. 2012) Sept / 136
60 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. A. Doucet (MLSS Sept. 2012) Sept / 136
61 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). A. Doucet (MLSS Sept. 2012) Sept / 136
62 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). Typical solution to problem 1 is to generate approximate samples using MCMC methods but these methods are not recursive. A. Doucet (MLSS Sept. 2012) Sept / 136
63 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). Typical solution to problem 1 is to generate approximate samples using MCMC methods but these methods are not recursive. SMC Methods solves partially Problem 1 and Problem 2 by breaking the problem of sampling from p (x 1:t y 1:t ) into a collection of simpler subproblems. First approximate p (x 1 y 1 ) and p (y 1 ) at time 1, then p (x 1:2 y 1:2 ) and p (y 1:2 ) at time 2 and so on. A. Doucet (MLSS Sept. 2012) Sept / 136
64 Monte Carlo Methods Problem 1: We cannot typically generate exact samples from p (x 1:t y 1:t ) for non-linear non-gaussian models. Problem 2: Even if we could, algorithms to generate samples from p (x 1:t y 1:t ) will have at least complexity O (t). Typical solution to problem 1 is to generate approximate samples using MCMC methods but these methods are not recursive. SMC Methods solves partially Problem 1 and Problem 2 by breaking the problem of sampling from p (x 1:t y 1:t ) into a collection of simpler subproblems. First approximate p (x 1 y 1 ) and p (y 1 ) at time 1, then p (x 1:2 y 1:2 ) and p (y 1:2 ) at time 2 and so on. Each target distribution is approximated by a cloud of random samples termed particles evolving according to importance sampling and resampling steps. A. Doucet (MLSS Sept. 2012) Sept / 136
65 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. A. Doucet (MLSS Sept. 2012) Sept / 136
66 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. Prediction step p (x t y 1:t 1 ) = p (x t 1, x t y 1:t 1 ) dx t 1 = p (x t y 1:t 1, x t 1 ) p (x t 1 y 1:t 1 ) dx t 1 = f (x t x t 1 ) p (x t 1 y 1:t 1 ) dx t 1. A. Doucet (MLSS Sept. 2012) Sept / 136
67 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. Prediction step p (x t y 1:t 1 ) = p (x t 1, x t y 1:t 1 ) dx t 1 = p (x t y 1:t 1, x t 1 ) p (x t 1 y 1:t 1 ) dx t 1 = f (x t x t 1 ) p (x t 1 y 1:t 1 ) dx t 1. Bayes Updating step where p (x t y 1:t ) = g (y t x t ) p (x t y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x t y 1:t 1 ) dx t A. Doucet (MLSS Sept. 2012) Sept / 136
68 Standard Bayesian Recursion In most textbooks, you will find the following recursion for {p (x t y 1:t )} t 1. Prediction step p (x t y 1:t 1 ) = p (x t 1, x t y 1:t 1 ) dx t 1 = p (x t y 1:t 1, x t 1 ) p (x t 1 y 1:t 1 ) dx t 1 = f (x t x t 1 ) p (x t 1 y 1:t 1 ) dx t 1. Bayes Updating step where p (x t y 1:t ) = g (y t x t ) p (x t y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x t y 1:t 1 ) dx t This is the recursion implemented by Wonham and Kalman filters... A. Doucet (MLSS Sept. 2012) Sept / 136
69 Bayesian Recursion on Path Space SMC approximate directly {p (x 1:t y 1:t )} t 1 not {p (x t y 1:t )} t 1 and relies on p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1, y 1:t 1 ) p (y 1:t ) p (y t y 1:t 1 ) p (y 1:t 1 ) where = g (y t x t ) predictive p( x 1:t y 1:t 1 ) {}}{ f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t A. Doucet (MLSS Sept. 2012) Sept / 136
70 Bayesian Recursion on Path Space SMC approximate directly {p (x 1:t y 1:t )} t 1 not {p (x t y 1:t )} t 1 and relies on p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1, y 1:t 1 ) p (y 1:t ) p (y t y 1:t 1 ) p (y 1:t 1 ) where = g (y t x t ) predictive p( x 1:t y 1:t 1 ) {}}{ f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t This can be alternatively written as Prediction p (x 1:t y 1:t 1 ) = f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ), Update p (x 1:t y 1:t ) = g ( y t x t )p( x 1:t y 1:t 1 ) p( y t y 1:t 1. ) A. Doucet (MLSS Sept. 2012) Sept / 136
71 Bayesian Recursion on Path Space SMC approximate directly {p (x 1:t y 1:t )} t 1 not {p (x t y 1:t )} t 1 and relies on p (x 1:t y 1:t ) = p (x 1:t, y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1, y 1:t 1 ) p (y 1:t ) p (y t y 1:t 1 ) p (y 1:t 1 ) where = g (y t x t ) predictive p( x 1:t y 1:t 1 ) {}}{ f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t This can be alternatively written as Prediction p (x 1:t y 1:t 1 ) = f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ), Update p (x 1:t y 1:t ) = g ( y t x t )p( x 1:t y 1:t 1 ) p( y t y 1:t 1. ) SMC is a simple and natural simulation-based implementation of this recursion. A. Doucet (MLSS Sept. 2012) Sept / 136
72 Monte Carlo Implementation of Prediction Step Assume you have at time t 1 p (x 1:t 1 y 1:t 1 ) = 1 N N δ (i) X (x 1:t 1 ). 1:t 1 i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
73 Monte Carlo Implementation of Prediction Step Assume you have at time t 1 p (x 1:t 1 y 1:t 1 ) = 1 N N δ (i) X (x 1:t 1 ). 1:t 1 i=1 ( ) ( ) By sampling X (i) t f x t X (i) t 1 and setting X (i) 1:t = X (i) 1:t 1, X (i) t then p (x 1:t y 1:t 1 ) = 1 N N δ X (i) (x 1:t ). 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
74 Monte Carlo Implementation of Prediction Step Assume you have at time t 1 p (x 1:t 1 y 1:t 1 ) = 1 N N δ (i) X (x 1:t 1 ). 1:t 1 i=1 ( ) ( ) By sampling X (i) t f x t X (i) t 1 and setting X (i) 1:t = X (i) 1:t 1, X (i) t then p (x 1:t y 1:t 1 ) = 1 N N δ X (i) (x 1:t ). 1:t i=1 Sampling from f (x t x t 1 ) is usually straightforward and can be done even if f (x t x t 1 ) does not admit any analytical expression; e.g. biochemical network models. A. Doucet (MLSS Sept. 2012) Sept / 136
75 Importance Sampling Implementation of Updating Step Our target at time t is p (x 1:t y 1:t ) = g (y t x t ) p (x 1:t y 1:t 1 ) p (y t y 1:t 1 ) so by substituting p (x 1:t y 1:t 1 ) to p (x 1:t y 1:t 1 ) we obtain p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t = 1 N N ( ) g y t X (i) t. i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
76 Importance Sampling Implementation of Updating Step Our target at time t is p (x 1:t y 1:t ) = g (y t x t ) p (x 1:t y 1:t 1 ) p (y t y 1:t 1 ) so by substituting p (x 1:t y 1:t 1 ) to p (x 1:t y 1:t 1 ) we obtain p (y t y 1:t 1 ) = g (y t x t ) p (x 1:t y 1:t 1 ) dx 1:t We now have = 1 N N ( ) g y t X (i) t. i=1 p (x 1:t y 1:t ) = g (y t x t ) p (x 1:t y 1:t 1 ) = p (y t y 1:t 1 ) ( ) with W (i) t g y t X (i) t, N i=1 W (i) t = 1. N i=1 W (i) t δ X (i) (x 1:t ). 1:t A. Doucet (MLSS Sept. 2012) Sept / 136
77 Multinomial Resampling We have a weighted approximation p (x 1:t y 1:t ) of p (x 1:t y 1:t ) p (x 1:t y 1:t ) = N i=1 W (i) t δ X (i) (x 1:t ). 1:t A. Doucet (MLSS Sept. 2012) Sept / 136
78 Multinomial Resampling We have a weighted approximation p (x 1:t y 1:t ) of p (x 1:t y 1:t ) p (x 1:t y 1:t ) = N i=1 W (i) t δ X (i) (x 1:t ). 1:t To obtain N samples X (i) 1:t approximately distributed according to p (x 1:t y 1:t ), resample N times with replacement to obtain X (i) 1:t p (x 1:t y 1:t ) N δ (i) X (x 1:t ) = 1:t i=1 p (x 1:t y 1:t ) = 1 N { } [ where N (i) t follow a multinomial with E [ ] ( ) V N (1) t = NW (i) t 1 W (i) t. N i=1 N (i) t N (i) t N δ X (i) 1:t ] (x 1:t ) = NW (i) t, A. Doucet (MLSS Sept. 2012) Sept / 136
79 Multinomial Resampling We have a weighted approximation p (x 1:t y 1:t ) of p (x 1:t y 1:t ) p (x 1:t y 1:t ) = N i=1 W (i) t δ X (i) (x 1:t ). 1:t To obtain N samples X (i) 1:t approximately distributed according to p (x 1:t y 1:t ), resample N times with replacement to obtain X (i) 1:t p (x 1:t y 1:t ) N δ (i) X (x 1:t ) = 1:t i=1 p (x 1:t y 1:t ) = 1 N { } [ where N (i) t follow a multinomial with E [ ] ( ) V N (1) t = NW (i) t 1 W (i) t. This can be achieved in O (N). N i=1 N (i) t N (i) t N δ X (i) 1:t ] (x 1:t ) = NW (i) t, A. Doucet (MLSS Sept. 2012) Sept / 136
80 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
81 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 A. Doucet (MLSS Sept. 2012) Sept / 136
82 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 A. Doucet (MLSS Sept. 2012) Sept / 136
83 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 At time t 2 Sample X (i) t f p (x 1:t y 1:t ) = ( ) ( ) x t X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and N i=1 ( W (i) t δ X (i) (x 1:t ), W (i) t g 1:t ) y t X (i) t. A. Doucet (MLSS Sept. 2012) Sept / 136
84 Vanilla SMC: Bootstrap Filter (Gordon et al., 1993) At time t = 1 Sample X (i) 1 µ (x 1 ) then p (x 1 y 1 ) = N ( ) W (i) 1 δ X (i) (x 1 ), W (i) 1 g y 1 X (i) 1. 1 i=1 Resample X (i) 1 p (x 1 y 1 ) to obtain p (x 1 y 1 ) = 1 N N i=1 δ (i) X (x 1 ). 1 At time t 2 Sample X (i) t f p (x 1:t y 1:t ) = ( ) ( ) x t X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and N i=1 ( W (i) t δ X (i) (x 1:t ), W (i) t g 1:t Resample X (i) 1:t p (x 1:t y 1:t ) to obtain p (x 1:t y 1:t ) = 1 N N i=1 δ (i) X (x 1:t ). 1:t ) y t X (i) t. A. Doucet (MLSS Sept. 2012) Sept / 136
85 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
86 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 The marginal likelihood estimate is given by ( t 1 p (y 1:t ) = p (y k y 1:k 1 ) = N k=1 t k=1 N ( g i=1 ) ) y k X (i) k. A. Doucet (MLSS Sept. 2012) Sept / 136
87 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 The marginal likelihood estimate is given by ( t 1 p (y 1:t ) = p (y k y 1:k 1 ) = N k=1 t k=1 N ( g i=1 ) ) y k X (i) k. Computational complexity is O (N) at each time step and memory requirements O (tn). A. Doucet (MLSS Sept. 2012) Sept / 136
88 SMC Output At time t, we get p (x 1:t y 1:t ) = N i=1 p (x 1:t y 1:t ) = 1 N W (i) t δ X (i) (x 1:t ), 1:t N δ (i) X (x 1:t ). 1:t i=1 The marginal likelihood estimate is given by ( t 1 p (y 1:t ) = p (y k y 1:k 1 ) = N k=1 t k=1 N ( g i=1 ) ) y k X (i) k. Computational complexity is O (N) at each time step and memory requirements O (tn). If we are only interested in p (x t y 1:t ) or p (s t (x 1:t ) y 1:t ) where s t (x 1:t ) = Ψ t (x t, s t 1 (x 1:t 1 )) - e.g. s t (x 1:t ) = t k=1 x 2 k - is fixed-dimensional then memory requirements O (N). A. Doucet (MLSS Sept. 2012) Sept / 136
89 state state Figure: p ( x 1 y 1 ) and Ê [ X 1 y 1 ] (top) and particle approximation of p ( x 1 y 1 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136 SMC on Path-Space - figures by Olivier Cappė time index time index
90 state state time index time index Figure: p ( x 1 y 1 ), p ( x 2 y 1:2 )and Ê [ X 1 y 1 ], Ê [ X 2 y 1:2 ] (top) and particle approximation of p ( x 1:2 y 1:2 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136
91 state state time index time index Figure: p ( x t y 1:t ) and Ê [ X t y 1:t ] for t = 1, 2, 3 (top) and particle approximation of p ( x 1:3 y 1:3 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136
92 state state time index time index Figure: p ( x t y 1:t ) and Ê [ X t y 1:t ] for t = 1,..., 10 (top) and particle approximation of p ( x 1:10 y 1:10 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136
93 state state time index time index Figure: p ( x t y 1:t ) and Ê [ X t y 1:t ] for t = 1,..., 24 (top) and particle approximation of p ( x 1:24 y 1:24 ) (bottom) A. Doucet (MLSS Sept. 2012) Sept / 136
94 Remarks Empirically this SMC strategy performs well in terms of estimating the marginals {p (x t y 1:t )} t 1. This is what is only necessary in many applications thankfully. A. Doucet (MLSS Sept. 2012) Sept / 136
95 Remarks Empirically this SMC strategy performs well in terms of estimating the marginals {p (x t y 1:t )} t 1. This is what is only necessary in many applications thankfully. However, the joint distribution p (x 1:t y 1:t ) is poorly estimated when t is large; i.e. we have in the previous example p (x 1:11 y 1:24 ) = δ X 1:11 (x 1:11 ). A. Doucet (MLSS Sept. 2012) Sept / 136
96 Remarks Empirically this SMC strategy performs well in terms of estimating the marginals {p (x t y 1:t )} t 1. This is what is only necessary in many applications thankfully. However, the joint distribution p (x 1:t y 1:t ) is poorly estimated when t is large; i.e. we have in the previous example p (x 1:11 y 1:24 ) = δ X 1:11 (x 1:11 ). Degeneracy problem. For any N and any k, there exists t (k, N) such that for any t t (k, N) p (x 1:k y 1:t ) = δ X 1:k (x 1:k ) ; p (x 1:t y 1:t ) is an unreliable approximation of p (x 1:t y 1:t ) as t. A. Doucet (MLSS Sept. 2012) Sept / 136
97 Another Illustration of the Degeneracy Phenomenon For the linear Gaussian state-space model described before, we can compute exactly S t /t where ( ) t S t = xk 2 p (x 1:t y 1:t ) dx 1:t k=1 using Kalman techniques. A. Doucet (MLSS Sept. 2012) Sept / 136
98 Another Illustration of the Degeneracy Phenomenon For the linear Gaussian state-space model described before, we can compute exactly S t /t where ( ) t S t = xk 2 p (x 1:t y 1:t ) dx 1:t k=1 using Kalman techniques. We compute the SMC estimate of this quantity using Ŝ t /t where ( ) t Ŝ t = xk 2 p (x 1:t y 1:t ) dx 1:t k=1 can be computed sequentially. A. Doucet (MLSS Sept. 2012) Sept / 136
99 Another Illustration of the Degeneracy Phenomenon Figure: S t /t obtained through the Kalman smoother (blue) and its SMC estimate Ŝ t /t (red). A. Doucet (MLSS Sept. 2012) Sept / 136
100 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). A. Doucet (MLSS Sept. 2012) Sept / 136
101 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). Let ϕ t : X t R and consider ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t, ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t = 1 N N ( ϕ t i=1 X (i) 1:t ). A. Doucet (MLSS Sept. 2012) Sept / 136
102 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). Let ϕ t : X t R and consider ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t, ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t = 1 N N ( ϕ t i=1 X (i) 1:t We can prove that for any bounded function ϕ and any p 1 E [ ϕ t ϕ t p ] 1/p B (t) c (p) ϕ, N lim N ( ϕt ϕ N t ) N ( 0, σ 2 ) t. ). A. Doucet (MLSS Sept. 2012) Sept / 136
103 Some Convergence Results for SMC Numerous convergence results for SMC are available; see (Del Moral, 2004). Let ϕ t : X t R and consider ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t, ϕ t = ϕ t (x 1:t ) p (x 1:t y 1:t ) dx 1:t = 1 N N ( ϕ t i=1 X (i) 1:t We can prove that for any bounded function ϕ and any p 1 E [ ϕ t ϕ t p ] 1/p B (t) c (p) ϕ, N lim N ( ϕt ϕ N t ) N ( 0, σ 2 ) t. Very weak results: B (t) and σ 2 t can increase with t and will for a path-dependent ϕ t (x 1:t ) as the degeneracy problem suggests. A. Doucet (MLSS Sept. 2012) Sept / 136 ).
104 Stronger Convergence Results Assume the following exponentially stability assumption: For any x 1, x 1 1 p (x t y 2:t, X 1 = x 1 ) p ( x t y 2:t, X 1 = x ) 1 dx t α t for 0 α < 1. 2 A. Doucet (MLSS Sept. 2012) Sept / 136
105 Stronger Convergence Results Assume the following exponentially stability assumption: For any x 1, x 1 1 p (x t y 2:t, X 1 = x 1 ) p ( x t y 2:t, X 1 = x ) 1 dx t α t for 0 α < 1. 2 Marginal distribution. For ϕ t (x 1:t ) = ϕ (x t L:t ), there exists B 1, B 2 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N, lim N N ( ϕt ϕ t ) N ( 0, σ 2 ) t where σ 2 t B 2, i.e. there is no accumulation of numerical errors over time. A. Doucet (MLSS Sept. 2012) Sept / 136
106 Stronger Convergence Results Assume the following exponentially stability assumption: For any x 1, x 1 1 p (x t y 2:t, X 1 = x 1 ) p ( x t y 2:t, X 1 = x ) 1 dx t α t for 0 α < 1. 2 Marginal distribution. For ϕ t (x 1:t ) = ϕ (x t L:t ), there exists B 1, B 2 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N, lim N N ( ϕt ϕ t ) N ( 0, σ 2 ) t where σ 2 t B 2, i.e. there is no accumulation of numerical errors over time. L1 distance. If p (x 1:t y 1:t ) = E ( p (x 1:t y 1:t )), there exists B 3 < s.t. p (x 1:t y 1:t ) p (x 1:t y 1:t ) dx 1:t B 3 t N ; i.e. the bias only increases in t. A. Doucet (MLSS Sept. 2012) Sept / 136
107 Stronger Convergence Results Unbiasedness. The marginal likelihood estimate is unbiased E ( p (y 1:t )) = p (y 1:t ). A. Doucet (MLSS Sept. 2012) Sept / 136
108 Stronger Convergence Results Unbiasedness. The marginal likelihood estimate is unbiased E ( p (y 1:t )) = p (y 1:t ). Relative Variance Bound. There exists B 4 < ( ) ) ( p (y1:t ) 2 E p (y 1:t ) 1 B 4 t N A. Doucet (MLSS Sept. 2012) Sept / 136
109 Stronger Convergence Results Unbiasedness. The marginal likelihood estimate is unbiased E ( p (y 1:t )) = p (y 1:t ). Relative Variance Bound. There exists B 4 < ( ) ) ( p (y1:t ) 2 E p (y 1:t ) 1 B 4 t N Central Limit Theorem. There exists B 5 < s.t. N (log p (y1:t ) log p (y 1:t )) N ( 0, σ 2 ) t with σ 2 t B 5 t. lim N A. Doucet (MLSS Sept. 2012) Sept / 136
110 Basic Idea Used to Establish Uniform Lp Bounds We denote η k (x k ) = p (x k y 1:k 1 ) and η k (x k ) = p (x k y 1:k 1 ) its particle approximation. A. Doucet (MLSS Sept. 2012) Sept / 136
111 Basic Idea Used to Establish Uniform Lp Bounds We denote and η k (x k ) = p (x k y 1:k 1 ) η k (x k ) = p (x k y 1:k 1 ) its particle approximation. Let Φ k,t be the measure-valued mapping such that η t = Φ k,t (η k ), which satifies Φ k,t (η k ) (x t ) = η k (x k ).p (y k :t 1 x k ) p (x t x k, y k+1:t 1 ) dx k. ηk (x k ) p (y k :t 1 x k ) dx k }{{} p(x k y 1:t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136
112 Key Decomposition Formula η 1 η 2 = Φ 1,2 (η 1 ) η t = Φ 1,t (η 1 ) η 1 Φ 1,2 ( η 1 ) Φ 1,t ( η 1 ) η 2 Φ 2,t ( η 2 ) η t 1 ) Φ t 1,t ( ηt 1 Decomposition of the error η t η t = η t t [ ( ))] Φk,t ( η k ) Φ k,t Φk 1,k ( ηk 1 k=1 A. Doucet (MLSS Sept. 2012) Sept / 136
113 Stability Properties We have p (x t x k, y k+1:t 1 ) = p (x k+1:t x k, y k+1:t 1 ) dx k+1:t 1 where p (x k+1:t x k, y k+1:t 1 ) = t p (x m x m 1, y m:t 1 ) m=k+1 A. Doucet (MLSS Sept. 2012) Sept / 136
114 Stability Properties We have p (x t x k, y k+1:t 1 ) = p (x k+1:t x k, y k+1:t 1 ) dx k+1:t 1 where p (x k+1:t x k, y k+1:t 1 ) = To summarize, we have Φ k,t (η k ) (x t ) = t p (x m x m 1, y m:t 1 ) m=k+1 η k (x k ).p (y k :t 1 x k ) ηk (x k ) p (y k :t 1 x k ) dx k }{{} p(x k y 1:t 1 ) t m=k+1 p (x m x m 1, y m:t 1 ) dx k :t 1 A. Doucet (MLSS Sept. 2012) Sept / 136
115 Stability Properties Assume there exists ɛ > 0 s.t. for any x, x and for any y, x, ɛ 1 ν ( x ) f ( x x ) ɛν ( x ) 0 < g g (y x) g < then there exists 0 λ < 1 1 ( Φ k,k+t (η) (x) Φ k,k+t η ) (x) dx λ t 2 A. Doucet (MLSS Sept. 2012) Sept / 136
116 Stability Properties Assume there exists ɛ > 0 s.t. for any x, x and for any y, x, ɛ 1 ν ( x ) f ( x x ) ɛν ( x ) 0 < g g (y x) g < then there exists 0 λ < 1 1 ( Φ k,k+t (η) (x) Φ k,k+t η ) (x) dx λ t 2 Hence we have as (t k). Φ k,t (η k ) (x t ) Φ k,t ( η k ) (xt ) A. Doucet (MLSS Sept. 2012) Sept / 136
117 Putting Everything Together Under such strong mixing assumptions η t η t = t k=1 [ Φk,t ( η k ) Φ k,t ( Φk 1,k ( ηk 1 ))] } {{ } 1 λ t k+1 for 0 λ 1 N A. Doucet (MLSS Sept. 2012) Sept / 136
118 Putting Everything Together Under such strong mixing assumptions η t η t = t k=1 [ Φk,t ( η k ) Φ k,t ( Φk 1,k ( ηk 1 ))] } {{ } 1 λ t k+1 for 0 λ 1 N We can then obtain results such as there exists B 1 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N A. Doucet (MLSS Sept. 2012) Sept / 136
119 Putting Everything Together Under such strong mixing assumptions η t η t = t k=1 [ Φk,t ( η k ) Φ k,t ( Φk 1,k ( ηk 1 ))] } {{ } 1 λ t k+1 for 0 λ 1 N We can then obtain results such as there exists B 1 < s.t. E [ ϕ t ϕ t p ] 1/p B 1 c (p) ϕ N Much work has been done recently on removing such strong mixing assumptions; e.g. Whiteley (2012) for much weaker and realistic assumptions. A. Doucet (MLSS Sept. 2012) Sept / 136
120 Summary SMC provide consistent estimates under weak assumptions. A. Doucet (MLSS Sept. 2012) Sept / 136
121 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. A. Doucet (MLSS Sept. 2012) Sept / 136
122 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. A. Doucet (MLSS Sept. 2012) Sept / 136
123 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. Even under stability assumptions, one cannot expect to obtain uniform in time stability for SMC estimates of {p (x 1:t y 1:t )} t 1 ; this is due to the degeneracy problem. A. Doucet (MLSS Sept. 2012) Sept / 136
124 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. Even under stability assumptions, one cannot expect to obtain uniform in time stability for SMC estimates of {p (x 1:t y 1:t )} t 1 ; this is due to the degeneracy problem. Is it possible to Q1: eliminate, Q2: mitigate the degeneracy problem? A. Doucet (MLSS Sept. 2012) Sept / 136
125 Summary SMC provide consistent estimates under weak assumptions. Under stability assumptions, we have uniform in time stability of the SMC estimates of {p (x t y 1:t )} t 1. Under stability assumptions, the relative variance of the SMC estimate of {p (y 1:t )} t 1 only increases linearly with t. Even under stability assumptions, one cannot expect to obtain uniform in time stability for SMC estimates of {p (x 1:t y 1:t )} t 1 ; this is due to the degeneracy problem. Is it possible to Q1: eliminate, Q2: mitigate the degeneracy problem? Answer: Q1: no, Q2: yes. A. Doucet (MLSS Sept. 2012) Sept / 136
126 Is Resampling Really Necessary? Resampling is the source of the degeneracy problem and might appear wasteful. A. Doucet (MLSS Sept. 2012) Sept / 136
127 Is Resampling Really Necessary? Resampling is the source of the degeneracy problem and might appear wasteful. The resampling step is an unbiased operation E [ p (x 1:t y 1:t ) p (x 1:t y 1:t )] = p (x 1:t y 1:t ) but clearly it introduces some errors locally in time. That is for any test function, we have [ ] [ ] V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t A. Doucet (MLSS Sept. 2012) Sept / 136
128 Is Resampling Really Necessary? Resampling is the source of the degeneracy problem and might appear wasteful. The resampling step is an unbiased operation E [ p (x 1:t y 1:t ) p (x 1:t y 1:t )] = p (x 1:t y 1:t ) but clearly it introduces some errors locally in time. That is for any test function, we have [ ] [ ] V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t V ϕ (x 1:t ) p (x 1:t y 1:t ) dx 1:t What about eliminating the resampling step? A. Doucet (MLSS Sept. 2012) Sept / 136
129 Sequential Importance Samping: SMC Without Resampling In this case, the estimate of the posterior is p SIS (x 1:t y 1:t ) = where X (i) 1:t p (x 1:t) and W (i) t ( p N i=1 ) y 1:t X (i) 1:t W (i) t δ (i) X (x 1:t ) 1:t t ( g k=1 ) y k X (i) t. A. Doucet (MLSS Sept. 2012) Sept / 136
130 Sequential Importance Samping: SMC Without Resampling In this case, the estimate of the posterior is p SIS (x 1:t y 1:t ) = where X (i) 1:t p (x 1:t) and W (i) t ( p N i=1 ) y 1:t X (i) 1:t W (i) t δ (i) X (x 1:t ) 1:t t ( g k=1 In this case, the marginal likelihood estimate is p SIS (y 1:t ) = 1 N ) y k X (i) t. N ( ) p y 1:t X (i) 1:t i=1 A. Doucet (MLSS Sept. 2012) Sept / 136
131 Sequential Importance Samping: SMC Without Resampling In this case, the estimate of the posterior is p SIS (x 1:t y 1:t ) = where X (i) 1:t p (x 1:t) and W (i) t ( p N i=1 ) y 1:t X (i) 1:t W (i) t δ (i) X (x 1:t ) 1:t t ( g k=1 In this case, the marginal likelihood estimate is p SIS (y 1:t ) = 1 N ) y k X (i) t. N ( ) p y 1:t X (i) 1:t i=1 ( ) Relative variance of p y 1:t X (i) t 1:t = g k=1 exponentially fast... ( ) y k X (i) t is increasing A. Doucet (MLSS Sept. 2012) Sept / 136
132 SIS For Stochastic Volatility Model Figure: Histograms of log 10 ( t = 100 (bottom) Importance Weights (base 10 logarithm) W (i ) t ) for t = 1 (top), t = 50 (middle) and The algorithm performance collapse as t increases as expected. A. Doucet (MLSS Sept. 2012) Sept / 136
133 Central Limit Theorems For both SIS and SMC, we have a CLT for the estimates of the marginal likelihood ) ( psis (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,sis), ) ( psmc (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,smc). A. Doucet (MLSS Sept. 2012) Sept / 136
134 Central Limit Theorems For both SIS and SMC, we have a CLT for the estimates of the marginal likelihood ) ( psis (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,sis), ) ( psmc (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,smc). The variance expressions are σ 2 t,sis = p 2 ( x 1:t y 1:t ) p(x 1:t dx ) 1:t 1 = σ 2 t,smc = p 2 ( x 1 y 1:t ) µ(x 1 dx ) 1 + t k=2 g = 2 ( y 1 x 1 )µ(x 1 )dx 1 p 2 (y 1 + ) t k=2 p 2 ( y 1:t x 1:t )p(x 1:t )dx 1:t p 2 (y 1:t ) 1 p 2 ( x 1:k y 1:t ) p( x 1:k 1 y 1:k 1 )f ( x k x k 1 ) dx 1:k t p 2 ( y k :t x k )p( x k y 1:k 1 )dx k p 2 ( y k :t y 1:k 1 ) t A. Doucet (MLSS Sept. 2012) Sept / 136
135 Central Limit Theorems For both SIS and SMC, we have a CLT for the estimates of the marginal likelihood ) ( psis (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,sis), ) ( psmc (y 1:t ) N 1 N ( 0, σ 2 p (y 1:t ) t,smc). The variance expressions are σ 2 t,sis = p 2 ( x 1:t y 1:t ) p(x 1:t dx ) 1:t 1 = σ 2 t,smc = p 2 ( x 1 y 1:t ) µ(x 1 dx ) 1 + t k=2 g = 2 ( y 1 x 1 )µ(x 1 )dx 1 p 2 (y 1 + ) t k=2 p 2 ( y 1:t x 1:t )p(x 1:t )dx 1:t p 2 (y 1:t ) 1 p 2 ( x 1:k y 1:t ) p( x 1:k 1 y 1:k 1 )f ( x k x k 1 ) dx 1:k t p 2 ( y k :t x k )p( x k y 1:k 1 )dx k p 2 ( y k :t y 1:k 1 ) SMC breaks the integral over X t into t integrals over X. t A. Doucet (MLSS Sept. 2012) Sept / 136
136 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. A. Doucet (MLSS Sept. 2012) Sept / 136
137 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. Assume we observe y 1 = = y t = 0 then we have ) [ ( ( psis (y 1:t ) V = σ2 t,sis p (y 1:t ) N = 1 ) σ 4 t/2 N 2σ 2 1], 1 ) [ ( ( psmc (y 1:t ) V σ2 t,smc = t ) σ 4 1/2 p (y 1:t ) N N 2σ 2 1]. 1 A. Doucet (MLSS Sept. 2012) Sept / 136
138 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. Assume we observe y 1 = = y t = 0 then we have ) [ ( ( psis (y 1:t ) V = σ2 t,sis p (y 1:t ) N = 1 ) σ 4 t/2 N 2σ 2 1], 1 ) [ ( ( psmc (y 1:t ) V σ2 t,smc = t ) σ 4 1/2 p (y 1:t ) N N 2σ 2 1]. 1 If select σ 2 = 1.2 then it is necessary to use N particles to obtain σ2 t,sis N = 10 2 for t = A. Doucet (MLSS Sept. 2012) Sept / 136
139 A Toy Example Consider the case where f (x x) = µ (x ) = N ( x ; 0, σ 2) and g (y x) = N ( y; 0, 1 1 σ 2 ) where σ 2 > 1. Assume we observe y 1 = = y t = 0 then we have ) [ ( ( psis (y 1:t ) V = σ2 t,sis p (y 1:t ) N = 1 ) σ 4 t/2 N 2σ 2 1], 1 ) [ ( ( psmc (y 1:t ) V σ2 t,smc = t ) σ 4 1/2 p (y 1:t ) N N 2σ 2 1]. 1 If select σ 2 = 1.2 then it is necessary to use N particles to obtain σ2 t,sis N = 10 2 for t = To obtain σ2 t,smc N = 10 2, SMC requires only N 10 4 particles: improvement by 19 orders of magnitude! A. Doucet (MLSS Sept. 2012) Sept / 136
140 Better Resampling Schemes [ Better resampling steps can be designed such that E [ ] ( but V < NW (i) t 1 W (i) t entropy resampling etc. (Cappé et al., 2005). N (i) t N (i) t ] = NW (i) t ) ; residual resampling, minimal A. Doucet (MLSS Sept. 2012) Sept / 136
141 Better Resampling Schemes [ Better resampling steps can be designed such that E [ ] ( but V < NW (i) t 1 W (i) t entropy resampling etc. (Cappé et al., 2005). N (i) t Residual Resampling. Set Ñ (i) t = NW (i) t ( multinomial of parameters N, W (1:N ) ) t where W (i) t W (i) t N 1 Ñ (i) t then set N (i) t N (i) t ] = NW (i) t ) ; residual resampling, minimal, sample N 1:N t = Ñ (i) t + N (i) t. from a A. Doucet (MLSS Sept. 2012) Sept / 136
142 Better Resampling Schemes [ Better resampling steps can be designed such that E [ ] ( but V < NW (i) t 1 W (i) t entropy resampling etc. (Cappé et al., 2005). N (i) t Residual Resampling. Set Ñ (i) t = NW (i) t ( multinomial of parameters N, W (1:N ) ) t where N (i) t ] = NW (i) t ) ; residual resampling, minimal, sample N 1:N t from a W (i) t W (i) t N 1 Ñ (i) t then set N (i) t = Ñ (i) t + N (i) t. Systematic Resampling. Sample U 1 U [ 0, 1 ] N and define U i = U { 1 + i 1 N for i = 2,..., N, then set } Nt i = U j : i 1 k=1 W (k) t U j i k=1 W (k) t with the convention 0 k=1 := 0. A. Doucet (MLSS Sept. 2012) Sept / 136
143 Measuring Variability of the Weights To measure the variation of the weights, we can use the Effective Sample Size (ESS) ( N ( ESS = i=1 W (i) t ) 2 ) 1 A. Doucet (MLSS Sept. 2012) Sept / 136
144 Measuring Variability of the Weights To measure the variation of the weights, we can use the Effective Sample Size (ESS) We have ESS = N if W (i) t and W (j) t = 1 for j = i. ( N ( ESS = i=1 W (i) t ) 2 ) 1 = 1/N for any i and ESS = 1 if W (i) t = 1 A. Doucet (MLSS Sept. 2012) Sept / 136
145 Measuring Variability of the Weights To measure the variation of the weights, we can use the Effective Sample Size (ESS) We have ESS = N if W (i) t ( N ( ESS = i=1 W (i) t ) 2 ) 1 = 1/N for any i and ESS = 1 if W (i) t = 1 and W (j) t = 1 for j = i. Liu (1996) showed that for simple importance sampling for ϕ regular enough V ( N i=1 ( W (i) t ϕ X (i) t ) ) V p( x1:t y 1:t ) ( 1 ESS ESS ( ϕ i=1 X (i) t ) ) ; i.e. the estimate is roughly as accurate as using an iid sample of size ESS from p (x 1:t y 1:t ). A. Doucet (MLSS Sept. 2012) Sept / 136
146 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. A. Doucet (MLSS Sept. 2012) Sept / 136
147 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. Dynamic Resampling: If the variation of the weights as measured by ESS is too high, e.g. ESS < N/2, then resample the particles. A. Doucet (MLSS Sept. 2012) Sept / 136
148 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. Dynamic Resampling: If the variation of the weights as measured by ESS is too high, e.g. ESS < N/2, then resample the particles. We can also use the entropy Ent = N i=1 W (i) t log 2 ( W (i) t ) A. Doucet (MLSS Sept. 2012) Sept / 136
149 Dynamic Resampling Resampling at each time step can be harmful: only resample when necessary. Dynamic Resampling: If the variation of the weights as measured by ESS is too high, e.g. ESS < N/2, then resample the particles. We can also use the entropy Ent = N i=1 W (i) t log 2 ( W (i) t We have Ent = log 2 (N) if W (i) t = 1/N for any i. We have Ent = 0 if W (i) t = 1 and W (j) t = 1 for j = i. ) A. Doucet (MLSS Sept. 2012) Sept / 136
150 Improving the Sampling Step Bootstrap filter. Sample particles blindly according to the prior without taking into account the observation Very ineffi cient for vague prior/peaky likelihood. A. Doucet (MLSS Sept. 2012) Sept / 136
151 Improving the Sampling Step Bootstrap filter. Sample particles blindly according to the prior without taking into account the observation Very ineffi cient for vague prior/peaky likelihood. Optimal proposal/perfect adaptation. Implement the following alternative update-propagate Bayesian recursion where Update p (x 1:t 1 y 1:t ) = p( y t x t 1 )p( x 1:t 1 y 1:t 1 ) p( y t y 1:t 1 ) Propagate p (x 1:t y 1:t ) = p (x 1:t 1 y 1:t ) p (x t y t, x t 1 ) p (x t y t, x t 1 ) = f (x t x t 1 ) g (y t x t 1 ) p (y t x t 1 ) Much more effi cient when applicable; e.g. f (x t x t 1 ) = N (x t ; ϕ (x t 1 ), Σ v ), g (y t x t ) = N (y t ; x t, Σ w ). A. Doucet (MLSS Sept. 2012) Sept / 136
152 A General Bayesian Recursion Introduce an arbitrary proposal distribution q (x t y t, x t 1 ); i.e. an approximation to p (x t y t, x t 1 ). A. Doucet (MLSS Sept. 2012) Sept / 136
153 A General Bayesian Recursion Introduce an arbitrary proposal distribution q (x t y t, x t 1 ); i.e. an approximation to p (x t y t, x t 1 ). We have seen that so clearly p (x 1:t y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (x 1:t y 1:t ) = w (x t 1, x t, y t ) q (x t y t, x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) where w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ) q (x t y t, x t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136
154 A General Bayesian Recursion Introduce an arbitrary proposal distribution q (x t y t, x t 1 ); i.e. an approximation to p (x t y t, x t 1 ). We have seen that so clearly where p (x 1:t y 1:t ) = g (y t x t ) f (x t x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) p (x 1:t y 1:t ) = w (x t 1, x t, y t ) q (x t y t, x t 1 ) p (x 1:t 1 y 1:t 1 ) p (y t y 1:t 1 ) w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ) q (x t y t, x t 1 ) This suggests a more general SMC algorithm. A. Doucet (MLSS Sept. 2012) Sept / 136
155 A General SMC Algorithm { } Assume we have N weighted particles W (i) t 1, X (i) 1:t 1 approximating p (x 1:t 1 y 1:t 1 ) then at time t, ( ) ( ) Sample X (i) t q x t y t, X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and p (x 1:t y 1:t ) = N i=1 W (i) t W (i) f t 1 W (i) t δ X (i) (x 1:t ), 1:t ( X (i) t q ) ( ) X (i) t 1 g y t X (i) t ( ) X (i) t y t, X (i). t 1 A. Doucet (MLSS Sept. 2012) Sept / 136
156 A General SMC Algorithm { } Assume we have N weighted particles W (i) t 1, X (i) 1:t 1 approximating p (x 1:t 1 y 1:t 1 ) then at time t, ( ) ( ) Sample X (i) t q x t y t, X (i) t 1, set X (i) 1:t = X (i) 1:t 1, X (i) t and p (x 1:t y 1:t ) = N i=1 W (i) t W (i) f t 1 W (i) t δ X (i) (x 1:t ), 1:t ( X (i) t q ) ( ) X (i) t 1 g y t X (i) t ( ) X (i) t y t, X (i). If ESS< N/2 resample X (i) 1:t p (x 1:t y 1:t ) and set W (i) t 1 N to obtain p (x 1:t y 1:t ) = 1 N N i=1 δ (i) X (x 1:t ). 1:t t 1 A. Doucet (MLSS Sept. 2012) Sept / 136
157 Building Proposals Our aim is to select q (x t y t, x t 1 ) as close as possible to p (x t y t, x t 1 ) as this minimizes the variance of w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ). q (x t y t, x t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136
158 Building Proposals Our aim is to select q (x t y t, x t 1 ) as close as possible to p (x t y t, x t 1 ) as this minimizes the variance of w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ). q (x t y t, x t 1 ) Example - EKF proposal: Let X t = ϕ (X t 1 ) + V t, Y t = Ψ (X t ) + W t, with V t N (0, Σ v ), W t N (0, Σ w ). We perform local linearization Ψ (x) Y t Ψ (ϕ (X t 1 )) + (X t ϕ (X t 1 )) + W t x and use as a proposal. ϕ(xt 1 ) q (x t y t, x t 1 ) ĝ (y t x t ) f (x t x t 1 ). A. Doucet (MLSS Sept. 2012) Sept / 136
159 Building Proposals Our aim is to select q (x t y t, x t 1 ) as close as possible to p (x t y t, x t 1 ) as this minimizes the variance of w (x t 1, x t, y t ) = g (y t x t ) f (x t x t 1 ). q (x t y t, x t 1 ) Example - EKF proposal: Let X t = ϕ (X t 1 ) + V t, Y t = Ψ (X t ) + W t, with V t N (0, Σ v ), W t N (0, Σ w ). We perform local linearization Ψ (x) Y t Ψ (ϕ (X t 1 )) + (X t ϕ (X t 1 )) + W t x and use as a proposal. ϕ(xt 1 ) q (x t y t, x t 1 ) ĝ (y t x t ) f (x t x t 1 ). Any standard suboptimal filtering methods can be used: Unscented Particle filter, Gaussan Quadrature particle filter etc. A. Doucet (MLSS Sept. 2012) Sept / 136
160 Implicit Proposals Proposed recently by Chorin (2012). Let F (x t 1, x t ) = log g (y t x t ) + log f (x t x t 1 ) and xt = arg max F (x t 1, x t ) = arg max p (x t y t, x t 1 ) A. Doucet (MLSS Sept. 2012) Sept / 136
161 Implicit Proposals Proposed recently by Chorin (2012). Let and F (x t 1, x t ) = log g (y t x t ) + log f (x t x t 1 ) x t = arg max F (x t 1, x t ) = arg max p (x t y t, x t 1 ) We sample Z N (0, I nx ), then we solve in X t F (x t 1, x t ) F (x t 1, X t ) = 1 2 Z T Z, Z N (0, I nx ) so if there is a unique solution q (x t y t, x t 1 ) = p Z (z) det z/ x t exp ( F (x t 1, xt )) g (y t x t ) f (x t x t 1 ) det x t / z A. Doucet (MLSS Sept. 2012) Sept / 136
162 Implicit Proposals Proposed recently by Chorin (2012). Let and F (x t 1, x t ) = log g (y t x t ) + log f (x t x t 1 ) x t = arg max F (x t 1, x t ) = arg max p (x t y t, x t 1 ) We sample Z N (0, I nx ), then we solve in X t F (x t 1, x t ) F (x t 1, X t ) = 1 2 Z T Z, Z N (0, I nx ) so if there is a unique solution q (x t y t, x t 1 ) = p Z (z) det z/ x t The incremental weight is g (y t x t ) f (x t x t 1 ) q (x t y t, x t 1 ) exp ( F (x t 1, xt )) g (y t x t ) f (x t x t 1 ) det x t / z det x t / z exp (F (x t 1, x t )) A. Doucet (MLSS Sept. 2012) Sept / 136
163 Auxiliary Particle Filters Popular variation introduced by (Pitt & Shephard, 1999). A. Doucet (MLSS Sept. 2012) Sept / 136
164 Auxiliary Particle Filters Popular variation introduced by (Pitt & Shephard, 1999). This corresponds to a standard SMC algorithm (Johansen & D., 2008) where we target p (x 1:t y 1:t+1 ) p (x 1:t y 1:t ) p (y t+1 x t ) where p (y t+1 x t ) p (y t+1 x t ) using a proposal p (x t y t, x t 1 ). A. Doucet (MLSS Sept. 2012) Sept / 136
165 Auxiliary Particle Filters Popular variation introduced by (Pitt & Shephard, 1999). This corresponds to a standard SMC algorithm (Johansen & D., 2008) where we target p (x 1:t y 1:t+1 ) p (x 1:t y 1:t ) p (y t+1 x t ) where p (y t+1 x t ) p (y t+1 x t ) using a proposal p (x t y t, x t 1 ). When p (y t+1 x t ) = p (y t+1 x t ) and p (x t+1 y t+1, x t ) = p (x t+1 y t+1, x t ) then we are back to perfect adaptation. A. Doucet (MLSS Sept. 2012) Sept / 136
166 Block Sampling Proposals Problem: we only sample X t at time t so, even if you use p (x t y t, x t 1 ), the SMC estimates could have high variance if V p( xt 1 y 1:t 1 ) [p (y t x t 1 )] is high. A. Doucet (MLSS Sept. 2012) Sept / 136
167 Block Sampling Proposals Problem: we only sample X t at time t so, even if you use p (x t y t, x t 1 ), the SMC estimates could have high variance if V p( xt 1 y 1:t 1 ) [p (y t x t 1 )] is high. Block sampling idea: allows yourself to sample again X t L+1:t 1 as well as X t in light of y t. Optimally we would like at time t to sample ( ) X (i) t L+1:t p x t L+1:t y t L+1:t, X (i) t L and W (i) t W (i) ( p ( ) y 1:t ) X (i) t L+1:t y t L+1:t, X (i) t L ) X (i) 1:t t 1 ( ) p X (i) 1:t L y 1:t 1 p ( W (i) t 1 p y t y t L+1:t 1, X (i) t L A. Doucet (MLSS Sept. 2012) Sept / 136
168 Block Sampling Proposals Problem: we only sample X t at time t so, even if you use p (x t y t, x t 1 ), the SMC estimates could have high variance if V p( xt 1 y 1:t 1 ) [p (y t x t 1 )] is high. Block sampling idea: allows yourself to sample again X t L+1:t 1 as well as X t in light of y t. Optimally we would like at time t to sample ( ) X (i) t L+1:t p x t L+1:t y t L+1:t, X (i) t L and W (i) t W (i) ( p ( ) y 1:t ) X (i) t L+1:t y t L+1:t, X (i) t L ) X (i) 1:t t 1 ( ) p X (i) 1:t L y 1:t 1 p ( W (i) t 1 p y t y t L+1:t 1, X (i) t L When p (x t L+1:t y t L+1:t, x t L ) and p (y t y t L+1:t 1, x t L ) are not available, we can use analytical approximations of them and still have consistent estimates (D., Briers & Senecal, 2006). A. Doucet (MLSS Sept. 2012) Sept / 136
169 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? A. Doucet (MLSS Sept. 2012) Sept / 136
170 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? Consider the ideal scenario where X t = X t 1 + V t Y t = X t + W t where X 1 N (0, 1) and V t, W t i.i.d. N (0, 1). A. Doucet (MLSS Sept. 2012) Sept / 136
171 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? Consider the ideal scenario where X t = X t 1 + V t Y t = X t + W t where X 1 N (0, 1) and V t, W t i.i.d. N (0, 1). In this case, we have p(y t y t L+1:t 1, x t L ) p(y t y t L+1:t 1, x t L) < c x t L x t L /2 L where the rate of exponential convergence depends upon the signal-to-noise ratio if more general Gaussian AR are considered. A. Doucet (MLSS Sept. 2012) Sept / 136
172 Block Sampling Proposals Computational cost is increased from O (N) to O (LN) so is it worth it? Consider the ideal scenario where X t = X t 1 + V t Y t = X t + W t where X 1 N (0, 1) and V t, W t i.i.d. N (0, 1). In this case, we have p(y t y t L+1:t 1, x t L ) p(y t y t L+1:t 1, x t L) < c x t L x t L /2 L where the rate of exponential convergence depends upon the signal-to-noise ratio if more general Gaussian AR are considered. We can obtain an analytic expression of the variance of the (normalized) weight. A. Doucet (MLSS Sept. 2012) Sept / 136
173 Block Sampling Proposals Variance of incremental weight w.r.t. p ( x1:t A. Doucet (MLSS Sept. 2012) L j y1:t 1 ). Sept / 136
An Brief Overview of Particle Filtering
1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationIntroduction. log p θ (y k y 1:k 1 ), k=1
ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationParticle Filters: Convergence Results and High Dimensions
Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline
More informationAn introduction to particle filters
An introduction to particle filters Andreas Svensson Department of Information Technology Uppsala University June 10, 2014 June 10, 2014, 1 / 16 Andreas Svensson - An introduction to particle filters Outline
More informationLecture Particle Filters. Magnus Wiktorsson
Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationLecture Particle Filters
FMS161/MASM18 Financial Statistics November 29, 2010 Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationParticle Filtering for Data-Driven Simulation and Optimization
Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationSensor Fusion: Particle Filter
Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationSequential Monte Carlo Samplers for Applications in High Dimensions
Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex
More informationLearning Static Parameters in Stochastic Processes
Learning Static Parameters in Stochastic Processes Bharath Ramsundar December 14, 2012 1 Introduction Consider a Markovian stochastic process X T evolving (perhaps nonlinearly) over time variable T. We
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationSequential Monte Carlo Methods (for DSGE Models)
Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationAn efficient stochastic approximation EM algorithm using conditional particle filters
An efficient stochastic approximation EM algorithm using conditional particle filters Fredrik Lindsten Linköping University Post Print N.B.: When citing this work, cite the original article. Original Publication:
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationSensitivity analysis in HMMs with application to likelihood maximization
Sensitivity analysis in HMMs with application to likelihood maximization Pierre-Arnaud Coquelin, Vekia, Lille, France pacoquelin@vekia.fr Romain Deguest Columbia University, New York City, NY 10027 rd2304@columbia.edu
More informationECE521 Lecture 19 HMM cont. Inference in HMM
ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process
More informationParticle Filters. Outline
Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationBayesian Monte Carlo Filtering for Stochastic Volatility Models
Bayesian Monte Carlo Filtering for Stochastic Volatility Models Roberto Casarin CEREMADE University Paris IX (Dauphine) and Dept. of Economics University Ca Foscari, Venice Abstract Modelling of the financial
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationRobotics. Mobile Robotics. Marc Toussaint U Stuttgart
Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand
More informationSeminar: Data Assimilation
Seminar: Data Assimilation Jonas Latz, Elisabeth Ullmann Chair of Numerical Mathematics (M2) Technical University of Munich Jonas Latz, Elisabeth Ullmann (TUM) Data Assimilation 1 / 28 Prerequisites Bachelor:
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationLagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC
Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC Background Data Assimilation Iterative process Forecast Analysis Background
More informationIntroduction to Particle Filters for Data Assimilation
Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationAn Introduction to Sequential Monte Carlo for Filtering and Smoothing
An Introduction to Sequential Monte Carlo for Filtering and Smoothing Olivier Cappé LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/ cappe/ Acknowlegdment: Eric Moulines (TELECOM ParisTech)
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationRAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS
RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014
More informationPatterns of Scalable Bayesian Inference Background (Session 1)
Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles
More informationA State Space Model for Wind Forecast Correction
A State Space Model for Wind Forecast Correction Valrie Monbe, Pierre Ailliot 2, and Anne Cuzol 1 1 Lab-STICC, Université Européenne de Bretagne, France (e-mail: valerie.monbet@univ-ubs.fr, anne.cuzol@univ-ubs.fr)
More informationThe chopthin algorithm for resampling
The chopthin algorithm for resampling Axel Gandy F. Din-Houn Lau Department of Mathematics, Imperial College London Abstract Resampling is a standard step in particle filters and more generally sequential
More informationBlind Equalization via Particle Filtering
Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte
More informationStatistical Inference and Methods
Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and
More informationAdvanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP
Advanced Monte Carlo integration methods P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP MCQMC 2012, Sydney, Sunday Tutorial 12-th 2012 Some hyper-refs Feynman-Kac formulae,
More informationA Backward Particle Interpretation of Feynman-Kac Formulae
A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint
More informationAN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.
AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER Qi Cheng and Pascal Bondon CNRS UMR 8506, Université Paris XI, France. August 27, 2011 Abstract We present a modified bootstrap filter to draw
More informationConcentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand
Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting
More informationDensity Propagation for Continuous Temporal Chains Generative and Discriminative Models
$ Technical Report, University of Toronto, CSRG-501, October 2004 Density Propagation for Continuous Temporal Chains Generative and Discriminative Models Cristian Sminchisescu and Allan Jepson Department
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationA new class of interacting Markov Chain Monte Carlo methods
A new class of interacting Marov Chain Monte Carlo methods P Del Moral, A Doucet INRIA Bordeaux & UBC Vancouver Worshop on Numerics and Stochastics, Helsini, August 2008 Outline 1 Introduction Stochastic
More informationSequential Monte Carlo methods for system identification
Technical report arxiv:1503.06058v3 [stat.co] 10 Mar 2016 Sequential Monte Carlo methods for system identification Thomas B. Schön, Fredrik Lindsten, Johan Dahlin, Johan Wågberg, Christian A. Naesseth,
More informationLearning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )
Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationPROBABILISTIC REASONING OVER TIME
PROBABILISTIC REASONING OVER TIME In which we try to interpret the present, understand the past, and perhaps predict the future, even when very little is crystal clear. Outline Time and uncertainty Inference:
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationOn-Line Parameter Estimation in General State-Space Models
On-Line Parameter Estimation in General State-Space Models Christophe Andrieu School of Mathematics University of Bristol, UK. c.andrieu@bris.ac.u Arnaud Doucet Dpts of CS and Statistics Univ. of British
More informationSequential Monte Carlo Methods in High Dimensions
Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,
More informationSMC 2 : an efficient algorithm for sequential analysis of state-space models
SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra
More information28 : Approximate Inference - Distributed MCMC
10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,
More informationFinancial Econometrics
Financial Econometrics Estimation and Inference Gerald P. Dwyer Trinity College, Dublin January 2013 Who am I? Visiting Professor and BB&T Scholar at Clemson University Federal Reserve Bank of Atlanta
More informationIntroduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras 1 Motivation Recall: Discrete filter Discretize the
More informationAutonomous Navigation for Flying Robots
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München Motivation Bayes filter is a useful tool for state
More informationDivide-and-Conquer Sequential Monte Carlo
Divide-and-Conquer Joint work with: John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick, Fredrik Lindsten, Christian Næsseth, Thomas Schön University of Warwick a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks/
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationEVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationRao-Blackwellized Particle Filter for Multiple Target Tracking
Rao-Blackwellized Particle Filter for Multiple Target Tracking Simo Särkkä, Aki Vehtari, Jouko Lampinen Helsinki University of Technology, Finland Abstract In this article we propose a new Rao-Blackwellized
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationBayesian Computations for DSGE Models
Bayesian Computations for DSGE Models Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 This Lecture is Based on Bayesian Estimation of DSGE Models Edward P. Herbst &
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More information