An introduction to Sequential Monte Carlo

Similar documents
An Brief Overview of Particle Filtering

Inference in state-space models with multiple paths from conditional SMC

Controlled sequential Monte Carlo

Lecture 8: Bayesian Estimation of Parameters in State Space Models

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Sequential Monte Carlo Methods for Bayesian Computation

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

CPSC 540: Machine Learning

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Computer Intensive Methods in Mathematical Statistics

Monte Carlo Approximation of Monte Carlo Filters

Introduction to Machine Learning

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

STA 4273H: Statistical Machine Learning

Auxiliary Particle Methods

Particle Filtering Approaches for Dynamic Stochastic Optimization

A Note on Auxiliary Particle Filters

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

Introduction to Particle Filters for Data Assimilation

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Lecture 7 and 8: Markov Chain Monte Carlo

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.

SMC 2 : an efficient algorithm for sequential analysis of state-space models

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

An Introduction to Particle Filtering

Markov Chain Monte Carlo Methods for Stochastic Optimization

Computer Intensive Methods in Mathematical Statistics

Sampling Algorithms for Probabilistic Graphical models

Learning the hyper-parameters. Luca Martino

Blind Equalization via Particle Filtering

An efficient stochastic approximation EM algorithm using conditional particle filters

Kernel Sequential Monte Carlo

LECTURE 15 Markov chain Monte Carlo

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo Methods for Stochastic

Bayesian Methods for Machine Learning

Learning Static Parameters in Stochastic Processes

Markov Chain Monte Carlo methods

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

an introduction to bayesian inference

Probabilistic Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Towards a Bayesian model for Cyber Security

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Kernel adaptive Sequential Monte Carlo

Particle Filtering for Data-Driven Simulation and Optimization

The Kalman Filter ImPr Talk

Sequential Monte Carlo methods for system identification

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

28 : Approximate Inference - Distributed MCMC

Density Estimation. Seungjin Choi

19 : Slice Sampling and HMC

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo Methods. Geoff Gordon February 9, 2006

Part 1: Expectation Propagation

Computational statistics

An intro to particle methods for parameter inference in state-space models

Sensor Fusion: Particle Filter

Inferring biological dynamics Iterated filtering (IF)

Package RcppSMC. March 18, 2018

The Hierarchical Particle Filter

DAG models and Markov Chain Monte Carlo methods a short overview

STA 414/2104: Machine Learning

Machine Learning for Data Science (CS4786) Lecture 24

PARTICLE FILTERS WITH INDEPENDENT RESAMPLING

Robert Collins CSE586, PSU Intro to Sampling Methods

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 6: Graphical Models: Learning

Adversarial Sequential Monte Carlo

Sequential Monte Carlo Methods (for DSGE Models)

Linear Dynamical Systems

Bayesian Inference and MCMC

Sampling Methods (11/30/04)

Machine Learning Summer School

Monte Carlo in Bayesian Statistics

Contrastive Divergence

Sequential Bayesian Inference for Dynamic State Space. Model Parameters

LTCC: Advanced Computational Methods in Statistics

Integrated Non-Factorized Variational Inference

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Particle Filters. Outline

Graphical Models and Kernel Methods

MCMC algorithms for fitting Bayesian models

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Adaptive Monte Carlo methods

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET. Machine Learning T. Schön. (Chapter 11) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

17 : Markov Chain Monte Carlo

Approximate Bayesian Computation: a simulation based approach to inference

Linear and nonlinear filtering for state space models. Master course notes. UdelaR 2010

Lecture 6: Bayesian Inference in SDE Models

Variational Scoring of Graphical Model Structures

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Transcription:

An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1

Sequential Monte Carlo (SMC) methods Initially designed for online inference in dynamical systems Observations arrive sequentially and one needs to update the posterior distribution of hidden variables Analytically tractable solutions are available for linear Gaussian models, but not for complex models Examples: target tracking, time series analysis, computer vision Increasingly used to perform inference for a wide range of applications, not just dynamical systems Example: graphical models, population genetic,... SMC methods are scalable, easy to implement and flexible! 2

Outline Motivation References Introduction MCMC and importance sampling Sequential importance sampling and resampling Example: A dynamical system Proposal Smoothing MAP estimation Parameter estimation A generic SMC algorithm Particle MCMC Particle learning for GP regression Summary 3

Bibliography I Andrieu, Christophe, Doucet, Arnaud, & Holenstein, Roman. 2010. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3), 269 342. Cappé, Olivier, Godsill, Simon J, & Moulines, Eric. 2007. An overview of existing methods and recent advances in sequential Monte Carlo. Proceedings of the IEEE, 95(5), 899 924. Doucet, Arnaud, De Freitas, Nando, & Gordon, Neil. 2001. An introduction to sequential Monte Carlo methods. Pages 3 14 of: Sequential Monte Carlo methods in practice. Springer. Gramacy, Robert B., & Polson, Nicholas G. 2011. Particle Learning of Gaussian Process Models for Sequential Design and Optimization. Journal of Computational and Graphical Statistics, 20(1), 102 118. Holenstein, Roman. 2009. Particle Markov Chain Monte Carlo. Ph.D. thesis, The University Of British Columbia. 4

Bibliography II Holenstein, Roman, & Doucet, Arnaud. 2007. Particle Markov Chain Monte Carlo. Workshop: New directions in Monte Carlo Methods Fleurance, France. Wilkinson, Richard. 2014. GPs huh? what are they good for? Gaussian Process Winter School, Sheffield. 5

State Space Models (Doucet et al., 2001; Cappé et al., 2007) The Markovian, nonlinear, non-gaussian state space model Unobserved signal or states {x t t N} Observations or output {y t t N + } or {y t t N} P(x 0 ) P(x t x t 1 ) for t 1 (transition probability) P(y t x t ) for t 0 (emission/observation probability) x t 1 x t x t+1 y t 1 y t y t+1 6

Inference for State Space Model (Doucet et al., 2001; Cappé et al., 2007) We are interested the posterior distributions of the unobserved signal P(x 0:t y 0:t ) fixed interval smoothing distribution P(x t L y 0:t ) fixed lag smoothing distribution P(x t y 0:t ) filtering distribution and expectations under these posteriors, e.g. E P(x0:t y 0:t )(h t ) = h t (x 0:t )P(x 0:t y 0:t ) dx 0:t for some function h t : X (t+1) R n ht 7

Couldn t we use MCMC? (Doucet et al., 2001; Holenstein, 2009) Sure, generate N samples from P(x 0:t y 0:t ) using MH Sample a candidate x 0:t from a proposal distribution x 0:t q(x 0:t x 0:t ) Accept the candidate x 0:t with probability [ α(x 0:t x 0:t ) = min 1, P(x 0:t y 0:t)q(x 0:t x 0:t ) ] P(x 0:t y 0:t )q(x 0:t x 0:t) Obtain a set of sample {x (i) 0:t }N i=1 Calculate empirical estimates for posterior and expectation P(x 0:t y 0:t ) = 1 N E P(x 0:t y 0:t ) (h t) = i δ (i) x (x 0:t ) 0:t h t (x 0:t ) P(x 0:t ) dx 0:t = 1 N N i=1 h t (x (i) 0:t ) 8

Couldn t we use MCMC? (Doucet et al., 2001; Holenstein, 2009) Unbiased estimates and in most cases nice convergence Problem solved!? E P(x 0:t y 0:t ) (h t) a.s. E P(x0:t y 0:t )(h t ) as N I can be hard to design a good proposal q Single-site updates q(x j x 0:t ) can lead to slow mixing What happens if we get a new data point y t+1? We cannot (directly) reuse the samples {x (i) 0:t } We have to run a new MCMC simulations for P(x 0:t+1 y 0:t+1 ) MCMC not well-suited for recursive estimation problems 9

What about importance sampling? (Doucet et al., 2001) Generate N i.i.d. samples {x (i) 0:t }N i=1 from an arbitrary importance sampling distribution π(x 0:t y 0:t ) The empirical estimates are ˆP(x 0:t y 0:t ) = 1 N EˆP(x 0:t y 0:t ) (h t) = 1 N N i=1 N i=1 δ (i) x (x 0:t ) w (i) t 0:t h t (x (i) 0:t ) w(i) t where the importance weights are w(x 0:t ) = P(x 0:t y 0:t ) π(x 0:t y 0:t ) and w (i) t = ( w x (i) 0:t ) j w (x (j) 0:t ) 10

What about importance sampling? (Doucet et al., 2001) EˆP(x 0:t y 0:t ) (h t) is biased, but converges to E P(x0:t y 0:t )(h t ) Problem solved!? Designing a good importance distribution can be hard! Still not adequate for recursive estimation When seeing new data y t+1, we cannot reuse the samples and weights for time t to sample from P(x 0:t+1 y 0:t+1 ) {x (i) 0:t, w(i) t } N i=1 11

Sequential importance sampling (Doucet et al., 2001; Cappé et al., 2007) Assume that the importance distribution can be factored as π(x 0:t y 0:t ) = π(x 0:t 1 y 0:t 1 ) }{{} importance distribution at time t 1 = π(x 0 y 0 ) π(x t x 0:t 1, y 0:t ) }{{} extension to time t t π(x k x 0:k 1, y 0:k ) The importance weight can then be evaluated recursively k=1 w (i) t w (i) t 1 P(y t x (i) t )P(x (i) t x (i) t 1 ) π(x (i) t x (i) 0:t 1, y 0:t) Given past i.i.d. trajectories {x (i) 0:t 1 i = 1,..., N} we can 1. simulate x (i) t π(x t x (i) 0:t 1, y 0:t) 2. update the weight w (i) t for x (i) 0:t Note that the extended trajectories {x (i) 0:t based on w(i) t 1 using eq. (1) } remain i.i.d. (1) 12

Sequential importance sampling {x (i) t 1, w(i) t 1 } Adapted from (Doucet et al., 2001) {x (i) t, w (i) t } Problem solved!? 13

Sequential importance sampling {x (i) t 1, w(i) t 1 } {x (i) t, w (i) t } Adapted from (Doucet et al., 2001) {x (i) t+1, w(i) t+1 } Weights become highly degenerated after few steps 14

Sequential importance resampling (Doucet et al., 2001; Cappé et al., 2007) Key idea to eliminate weight degeneracy 1. Eliminate particles with low importance weights 2. Multiply particles with high importance weights Introduce a resampling each time step (or occasionally ) Resample a new trajectory {x (i) 0:t i = 1,..., N} Draw N samples from ˆP(x 0:t y 0:t ) = 1 N δ (i) N x (x 0:t ) w (i) t 0:t (i) The weights of the new samples are w t = 1 N The new empirical (unweighted) distribution a time step t where N (i) t N (i) t ˆP (x 0:t y 0:t ) = 1 N N i=1 i=1 δ (i) x (x 0:t )N (i) t 0:t is the number of copies of x (i) 0:t. is sampled for a multinomial with parameters w (i) t 15

Sequential importance resampling (Doucet et al., 2001; Cappé et al., 2007) 1: for i = 1,..., N do 2: Sample x (i) 0 π(x 0 y 0 ) 3: w (i) 0 P(y 0 x (i) 0 )P(x(i) 0 ) π(x (i) 0 y 0) 4: for t = 1,..., T do Importance sampling step 5: for i = 1,..., N do 6: Sample x (i) t π(x t x 0:t 1, y 0:t ) 7: x (i) 0:t (x(i) 0:t 1, x(i) 8: w (i) t w (i) t 1 t ) P(y t x (i) t )P(x (i) t x (i) t 1 ) π(x (i) t x (i) 0:t 1,y 0:t) Resampling/selection step 9: Sample N particles {x (i) 0:t for i = 1,..., N 10: w (i) t 1 N 11: return {x (i) 0:t }N i=1 } from { x(i) 0:t } according to { w(i) t } 16

Sequential importance resampling { x (i) t 1, N 1 } { x (i) t 1, w(i) t 1 } {x (i) t 1, N 1 } } Resampling { x (i) t, N 1 } { x (i) t, w (i) t } } Importance sampling i = 1,..., N and N = 10, figure modified from (Doucet et al., 2001) 17

y x Example - A dynamical system x t = 1 2 x t 1 + 25 x t 1 1+x 2 t 1 + 8 cos(1.2t) + u t y t = x2 t 20 + v t where x 0 N (0, σ 2 0 ), u t N (0, σ 2 u), v t N (0, σ 2 v), σ 2 0 = σ2 u = 10, σ 2 v = 1. 25 20 15 10 5 0 5 10 15 20 25 0 10 20 30 40 50 60 70 80 90 100 t 30 25 20 15 10 5 0 5 0 10 20 30 40 50 60 70 80 90 100 t 18

Posterior distribution of states 25 20 15 10 5 p(x t y 1:t ) 0 5 10 15 20 25 10 20 30 40 50 60 70 80 90 100 t 19

Posterior distribution of states 25 20 15 10 5 p(x t y 1:t ) 0 5 10 15 20 25 10 20 30 40 50 60 70 80 90 100 t 20

Posterior distribution of states 0.5 0.45 0.4 90 100 0.35 80 p(x t y 1:t ) 0.3 0.25 0.2 0.15 50 60 70 0.1 40 0.05 30 0 30 20 10 0 10 20 30 20 10 0 t x t 21

Proposal Bootstrap filter uses π t (x t x t 1, y t ) = P(x t x t 1 ) which leads to a simple form for the importance weight update: w (i) t w (i) t 1 P(y t x (i) t ) The weight update depends on the new proposed state and the observation! Uninformative observation can lead to poor performance Optimal proposal: π t (x t x t 1, y t ) = P(x t x t 1, y t ) therefore: w (i) t w (i) t 1 P(y t x t 1 ) = P(y t x t )P(x t x t 1 )dx t The weight update depends on the previous state and the observation Analytically intractable integral, need to resort to approximation techniques. 22

Smoothing For a complex SSM, the posterior distribution of state variables can be smoothed by including future observations. The joint smoothing distribution can be factorised: T 1 P(x 0:T y 0:T ) = P(x T y 0:T ) P(x t x t+1, y 0:t ) t=0 T 1 P(x T y 0:T ) Hence, the weight update: P(x t y 0:t ) }{{} P(x t+1 x t ) }{{} t=0 filtering distribution likelihood of future state ŵ (i) t (x t+1 ) = w (i) t P(x t+1 x t ) 23

Particle smoother Algorithm: Run forward simulation to obtain particle paths {x (i) t, w (i) t } i=1,,n;t=1,,t Draw x T from ˆP(x T y 0:T ) Repeat: Adjust and normalise the filtering weights w (i) t : ŵ (i) t = w (i) t P( x t+1 x t ) Draw a random sample x t from ˆP(x t:t y 0:T ) The sequence ( x 0, x 2,, x T ) is a random draw from the approximate distribution ˆP(x 0:T y 0:T ) [O(NT)] 24

MAP estimation Maximum a posteriori (MAP) estimate: argmax x 0:T P(x 0:T y 0:T ) = argmax x 0:T P(x 0 ) T T P(x t x t 1 ) P(y t x t ) Question: Can we just choose particle trajectory with largest weights? Answer: NO! Assume a discrete particle grid, x t x (i) t 1 i N, the approximation can be interpreted as a Hidden Markov Model with N states. MAP estimate can be found using the Viterbi algorithm Keep track of the probability of the most likely path so far Keep track of the last state index of the most likely path so far t=1 t=0 25

Viterbi algorithm for MAP estimation Observation Particle 1 1 0.65 0.1 3 0.3 0.3 0.15 4 5 0.04 0.01 2 2 0.05 0.02 Particle 2 1 0.2 6 0.02 2 Path probability update: α (j) t = α (j) t 1 P(x(i) t x (i) t ) t 1 )P(y t x (i) 26

Parameter estimation Consider SSMs that have P(x t x t 1, θ), P(y t x t, θ) where θ is a static parameter vector and one wishes to estimate θ. Marginal likelihood: l(y 0:T θ) = p(y 0:T, x 0:T θ)dx 0:T Optimise l(y 0:T θ) using the EM algorithm: E-step: ˆτ(θ, θ k ) = N i=1 T 1 w (i,θ) T t=0 s t,θ (x (i,θk) t, x (i,θk) t+1 ) where s t,θ (x t, x t+1 ) = log(p(x t+1 x t, θ)) + log(p(y t+1 x t+1, θ)) Optimise ˆτ(θ θ k ) to update θ k. 27

A generic SMC algorithm (Holenstein, 2009) We want to sample from a target distribution π(x), x X p Assume we have a sequence of bridging distributions of increasing dimension {π n (x n )} p n=1 = {π 1(x 1 ), π 2 (x 1, x 2 ),..., π p (x 1,..., x p )} where π n (x n ) = Zn 1 γ n (x n ) πp (x p ) = π(x) A sequence of importance densities on X A resampling distribution M 1 (x 1 ), {M }{{} n (x n x n 1 )} p n=2 }{{} for initial sample for extending x n 1 X n 1 by sampling x n X r(a n w n ), A n {1,..., N} N and w n [0, 1] N where A i n 1 is the parent at time n 1 of some particle Xi n 28

A generic SMC algorithm From (Holenstein, 2009) 29

A generic SMC algorithm From (Holenstein, 2009) 30

A generic SMC algorithm (Holenstein, 2009) Again we can calculate empirical estimates for target and the normalization constant (π(x) = Z 1 γ(x)) ˆπ N (x) = Ẑ N (x) = N i=1 p n=1 δ (i) x (x)w p (i) p ( 1 N N i=1 w n (X (i) n ) Convergence can be shown under weak assumptions ) ˆπ N (x) a.s. π(x) as N Ẑ N a.s. Z as N The SIR algorithm for state space models is a special case of this generic SMC algorithm 31

Motivation for Particle MCMC (Holenstein & Doucet, 2007; Holenstein, 2009) Let s return to the problem om sampling from a target π(x), where x = (x 1, x 2,..., x n ) X n using MCMC Single-site proposal q(x j x) Easy to design Often leads to slow mixing It would be more efficient, if we could update larger blocks Such proposals are harder to construct We could use SMC as a proposal distribution! 32

Particle Metropolis Hastings Sampler From (Holenstein, 2009) 33

Particle Metropolis Hastings (PMH) Sampler (Holenstein, 2009) Standard independent MH algorithm (q(x x) = q(x )) Target π N and proposal q N defined on an extended space π N ( ) q N ( ) = ẐN Z which leads to the acceptance ratio [ α = min Ẑ N, 1, Ẑ N (i 1) Note that α 1 as N, since Ẑ N Z as N ] 34

Particle Gibbs (PG) Sampler (Holenstein, 2009) Assume that we are interested in sampling from π(θ, x) = Assume that sampling form π(θ x) is easy π(x θ) is hard γ(θ, x) Z The PG Sampler uses SMC to sample from π(x θ) 1. Sample θ(i) π(θ x(i 1)) 2. Sample x(i) ˆπ N (x θ(i)) If sampling from π(θ x) is not easy? We can use a MH update for θ 35

Parameter estimation a state space models using PG (Andrieu et al., 2010) (Re)consider the non-linear state space model x t 1 x t = 1 2 x t 1 + 25 1 + xt 1 2 + 8 cos(1.2t) + V t y t = x2 t 20 + W t where x 0 N (0, σ 2 0 ), V t N (0, σ 2 V ) and W t N (0, σ 2 W ) Assume that θ = (σv 2, σ2 W ) is unknown Simulate y 1:T for T = 500, σ0 2 = 5, σ2 V = 10 and σ2 W = 1 Sample from P(θ, x 1:t y 1:t ) using Particle Gibbs sampler, with importance dist. fθ (x n x n 1 ) One-at-a-time MH sampler, with proposal fθ (x n x n 1 ) The algorithms ran for 50,000 iterations (burn-in of 10,000) Vague inverse-gamma priors for θ = (σ 2 V, σ 2 W) 36

Parameter estimation a state space models using PG (Andrieu et al., 2010) One-at-a-time Metropolis Hastings Particle Gibbs Sampler 37

Particle learning for GP regression motivation Training a GP using data: D 1:n = {(x 1, y 1 ),, (x n, y n )} and make prediction: P(y ˆθ, D, x ) (2) or P(y D, x ) = P(y θ, D, x )P(θ D)dθ (3) Estimate model hyperparameters θ n using ML (2) or use sampling to find the posterior distribution (3) Find the inverse of the covariance matrix K 1 n. Computational cost O(n 3 ). 38

Sequential update Given a new observation pair (x n+1, y n+1 ) that we want to use in our training set, need to find K 1 n+1 and re-estimate hyperparameters θ n+1. a naive implementation costs O(n 3 ) need an efficient approach that makes use of the sequential nature of data. 39

Particle learning for GP regression (Gramacy & Polson, 2011; Wilkinson, 2014) Sufficient information for each particle S n (i) Two-step update based on: = {K (i) n, θ (i) n } P(S n+1 D 1:n+1 ) = P(S n+1 S n, D n+1 )P(S n D 1:n+1 )ds n P(S n+1 S n, D n+1 )P(D n+1 S n )P(S n D 1:n )ds n 1. Resample indices {i} N i=1 with replacement to obtain new indices {ζ(i)} N i=1 according to weights w i P(D n+1 S (i) n ) = P(y n+1 x n+1, D n, θ (i) n ) 2. Propagate sufficient information from S n to S n+1 S (i) n+1 P(S n+1 S ζ(i) n, D 1:n+1 ) 40

Propagation Parameters θ n are static and can be deterministically copied from Sn ζ(i) to S (i) n+1. Covariance matrix rank-one update to build K 1 n+1 [ ] K n+1 = K n k(x n+1 ) k (x n+1 ) k(x n+1, x n+1 ) from K 1 n : then [ K 1 K n+1 = 1 n + g n (x n+1 )g ] n (x n+1 )/µ n (x n+1 ) g n (x n+1 ) g n (x n+1 ) µ n (x n+1 ) where g n (x) = µ(x)kn 1 k(x) µ n (x) = [k(x, x) k (x)kn 1 k(x)] 1 Use Cholesky update for stability Cost: O(n 2 ) 41

Illustrative result 1 - Prediction From (Gramacy & Polson, 2011) 42

Illustrative result 2 - Particle locations From (Gramacy & Polson, 2011) 43

SMC for learning GP models Advantages: Fast for sequential learning problems Disadvantages: Particle degeneracy/depletion Use MCMC sampler to augment the propagate and rejuvenate the particles after m sequential updates. The predictive distribution given model hyperparameters needs to be analytically tractable [See resample step] Similar treatment for classification can be found in (Gramacy & Polson, 2011). 44

Summary SMC is a powerful method for sampling from distributions with sequential nature Online learning in state space models Sample from high dimensional distributions As proposal distribution in MCMC We presented two concrete examples of using SMC Particle Gibbs for sampling from the posterior distribtuions of the parameters in a non-linear state space model Particle learning of the hyperparameters in a GP model Thank you for your attention! 45