LTCC: Advanced Computational Methods in Statistics

Size: px
Start display at page:

Download "LTCC: Advanced Computational Methods in Statistics"

Transcription

1 LTCC: Advanced Computational Methods in Statistics Advanced Particle Methods & Parameter estimation for HMMs N. Kantas Notes at Slides at

2 ntroduction Particle methods as presented so far can be challenged by: weight degeneracy low observation noise, high dimensions path degeneracy crucial issue when parameters unknown More elaborate/advanced methods methods can be effective Need to adress also parameter estimation using approaches that are: Bayesian or Maximum likelihood on-line or off-line (batch)

3 Outline Advanced methods adaptive resampling the resample move PF the auxiliary particle filter SMC for fixed state spaces Parameter estimation Bayesian or Maximum likelihood on-line or off-line

4 Recipes to improve performance There are more elaborate particle filtering algorithms they can work better than vanilla version in terms of variance of estimators, ESS, accuracy etc. but they do not adress path degeracy due to resampling We will look at: often just mask it or postpone it. adaptive resampling, resample move PF, the auxiliary particle filter note one can combine all the above together

5 Adaptive resampling While resampling is a key component to have a good approximation it tends to leave early states being represented by few particles. adaptive resampling Key idea: use resampling only when you need to Resample only when ESS n apple N e.g. = 1/2. When you dont resample continue with SS

6 SR filter with adaptive resampling At time n 1 Sample X i n q (x n y n, X i n 1 ) and set X i 0:n X i 0:n 1, X i n. Compute the weights w n X i n 1:n and set W i n / W i n 1 w n X i n 1:n, P N i=1 W i n = 1. F ESS n apple N resample W i n n, X0:n i o to obtain N new equally-weighted 1 particles N, X i 0:n. set X i 0:n X i 0:n, W i n 1 N

7 The resample move particle filter (Berzuini & Gilkks 2001 JRSSB) Fight path degeneracy by re-inserting lost diversity in the particles using appropriate MCMC moves on the path space At time n 1 Sample X i n q (x n y n, X i n 1 ) and set X i 0:n X i 0:n 1, X i n Compute the weights w n X i n 1:n and set W i n / w n X i n 1:n, P N i=1 W i n = 1. Resample n Wn, i X0:n i o to obtain N new equally-weighted particles 1 N, X i 0:n. Move particles by independently (for each i) sampling X i 0:n K MCMC ( X i 0:n).

8 The resample move particle filter target density for MCMC move is ny ny p (x 0:n y 0:n ) / (x 0 ) f (x k x k 1 ) g (y k x k ) MCMC proposal in this context k=1 k=0 just provides a jitter or shake in the particle population does not need to move the whole trajectory, moving only X n L+1:n can still lead to correct algorithm a Gibbs update would be very useful if available Note: we are not relying on ergodic properties of MCMC, just invariance want to preserve statistical properties of sample

9 The resample move particle filter using RW-MH Random walk algorithm for X i 0:n K MCMC ( X i 0:n) Set 0:n = X i 0:n For m = 1,...,M Sample U N(0, S), with S of appropriate dimension Propose Z n L+1:n = n L+1:n + cu Compute acceptance ratio = 1 ^ with probab. : nq k=n L+1 nq k=n L+1 f (Z k Z k 1 ) g (y k Z k ) f ( k k 1 ) g (y k k ) accept 0:n ( 0:n L, Z n L+1:n ) otherwise reject proposal and 0:n remains the same

10 The resample move particle filter M can be quite small 1-5 Tuning Can use particles to design S, e.g. look at the empirical covariance of the particles after resampling c can be tuned for average acceptance ratio around Other MCMC moves are possible, Gibbs, Hybrid Monte Carlo,... Method will increase diversity a bit, but notice that it does not affect the weights it might be more effective to use likelihood informed proposals and weights The last point is related to the auxiliary particle filter by (Pitt & Sheppard 99 JASA)

11 The auxiliary particle filter Resample Move and adaptive resampling are meant to improve path degeneracy What if weight degeneracy due to S is still present? Consider the Bayesian recursion: p (x 0:n y 0:n )= 1 Z n p (x 0:n 1 y 0:n 1 ) f (x n x n 1 ) g (y n x n ) with Z n = p (y n y 0:n 1 ). Bootstrap filter: move with f (x n x n 1 ) and weight with g (y n x n ) Alternative route : weight with p (y n x n 1 ) and then move with p (x n x n 1, y n ) Recall

12 The auxiliary particle filter Alternative route : weight with p (y n x n with p (x n x n 1, y n ) Recall 1 ) and then move p (x n x n 1, y n )= f (x n x n 1 ) g (y n x n ) p (y n x n 1 ) (Pitt & Sheppard 99 JASA) Can reverse the steps: move with p ( x n x n 1, y n) and weight with p ( y n+1 x n) Optimal p (x n x n 1, y n ) not available in practice! Can use approximations: move with q (x n x n 1, y n ) and weight with q (y n+1, x n )

13 The auxiliary particle filter On approximations: here q (y n+1, x n ) is not necessarily required to be a pdf just an easy to evaluate non-negative function of (x n, y n+1). often is called a score-function (name is misleading as it is used to denote also gradient term in parameter estimation) q (x n x n 1, y n ) can be a good importance distribution that takes into account the current observation

14 The auxiliary particle filter nstead of the original problem consider the target: n (x 0:n y 0:n ) / (x 0 ) g (y 0 x 0 ) q (y 1, x 0 ) ny f (x k x k 1 ) g (y k x k ) q (y k+1, x k ) q (y k, x k 1 ) Note q (y 1, x 0 ) k=1 ny k=0 q (y k+1, x k ) q (y k, x k 1 ) = q (y n+1, x n ) This means we are targetting a density, twisted with a lookahead n (x 0:n y 0:n ) / p (x 0:n y 0:n ) q (y n+1, x n )

15 The auxiliary particle filter What is the auxiliary PF? it is a PF targetting n using proposal q (x n y n, x n 1 ) We will implement a PF targetting n using as proposal q (x n y n, x n 1 ) and then reweight to get approximations for original n that is actually of interest. Why do we do this: the PF for n is more stable numerically new likelihood g (y n x n ) q (y n+1,x n) q (y n,x n 1) might be less peaky or informative n closer to n 1

16 The auxiliary particle filter So in path space target is n (x 0:n y 0:n ) / n 1 (x 0:n 1 y 0:n 1 ) and proposal f (x n x n 1 ) g (y n x k ) q (y n+1, x n ) q (y n, x n 1 ) q(x 0:n ) / n (x 0:n 1 y 0:n 1 ) q (x n y n, x n 1 ) This leads to the following weights to propagate the particles: w n (x n, x n 1 )= f (x k x k 1 ) g (y k x k ) q (y n+1, x n ) q (y n, x n 1 ) q (x n y n, x n 1 ) = w n (x n 1:n ) q (y n+1, x n )

17 The auxiliary particle filter For convenience we will split evaluation of the weight w n in two time steps evaluate part on yn+1 at time n + 1 Here we use the notation: w 0 (x 0 ) = g (y 0 x 0 ) (x 0 ), q (x 0 y 0 ) w n (x n 1:n ) = g (y n x n ) f (x n x n 1 ) q (x n, y n x n 1 ) where we denote for n 1, for n 1 q (x n, y n x n 1 )=q (x n y n, x n 1 ) q (y n, x n 1 ) (Pitt & Sheppard 99 JASA) recommends using if available q (x n y n, x n 1 )=p (x n y n, x n 1 ) and q (y n, x n 1 )=p (y n x n 1 ) or approximations of them

18 The auxiliary particle filter At time n = 0,foralli 2{1,...,N}: 1. Sample X i 0 q (x 0 y 0 ). 2. Compute W i 1 / w 0 X i 0 q y 1, X i 0, P N i=1 W i 1 = Resample X i 0 P N i=1 W i 1 X0 i (dx 0). At time n 1,foralli 2{1,...,N}: 1. Sample Xn i q (x n y n, X i n 1) and set X0:n X i i 0:n 1, Xn i. 2. Compute W i n+1 / w n X i n 1:n q y n+1, X i n, P N i=1 W i n+1 = Resample X i 0:n P N i=1 W i n+1 X i 0:n (dx 0:n).

19 The auxiliary particle filter BUT note we want the approximations of p (x 0:n y 0:n ) and p (y n y 0:n 1 ) These are given by: NX bp (dx 0:n y 0:n )= Wn i X0:n i (dx 0:n), (1) i=1 bp (y n y 0:n 1 )= 1 N where and! NX w n Xn i 1:n i=1 W i n / w n X i n 1:n, bp (y 0 )= 1 N NX Wn i 1q y n, Xn i 1 i=1 NX Wn i = 1 i=1 NX w 0 X0 i. i=1 (2)!

20 Discussion Choice of w n convenient for reweighting w n ( ) is used to approximate bp (dx 0:n y 0:n ) w n ( ) is used to weight particles connection between two is simply S What are we doing we are changing carefully the weight so that algorithm is well behaved by multiplying with something and dividing at the next step This can be effective when Xt high dimensional or g too informative

21 Discussion Neat extension let X k is obtained from a discretisation of a continuous process, e.g. via an Euler scheme Set q (y k+1, x k ) M q (y k, x k 1 ) = Y r k,m (y k+1, y k, x k,m ) r m=1 k,m 1 (y k+1, y k, x k 1,m 1 ) with X 0,m = X k 1 and X k,m = X k. Doing the same thing as above means that you do intermediate M weight resample steps to process observation Y k+1. Detailed exposition in (Del Moral, Murray SAM/ASA UQ).

22 Tempering based approach Another example that fits this framework is tempering Consider r k,m = g(y k+1 x k,m ) m r k,0 = g(y k x k,0 ) with M = 1and0< 1 < 2 <...< m n the presence of dynamics for x k,m (e.g. discretisation of SDE) implementation is as above.

23 Tempering based approach Some notes: can tune m according to ESS (adaptive tempering) in the absense of dynamics for xk,m use MCMC steps that are invariant to 1 Z k,m p(x 0:k 1 y 0:k 1 )f (x k x k 1 ) g (y k x k ) m otherwise method prone to resampling degenaracy method can be very effective in high dimensions Some references original PF with tempering in Godsill & Clapp 01, based on Neal 01, Jarzynski 97 More resent papers set Jasra, Stephens, Doucet Tsagaris 01, K., Beskos & Jasra 14

24 Discussion: summary Path degeneracy can be addressed partially by: adaptive resampling: applying resampling only when necessary using MCMC moves to jitter the particles and reintroduce lost diversity in particle approximations note that path degeneracy will be still present! Weight degeneracy can be addressed by good selection of importance proposals changing the target sequence to an easier problem as in APF introducing intermediate artificial weighting-resampling sequence, e.g. tempering. Can use all ideas above together to get a very powerful algorithm but also a bit complicated algorithm

25 Homework 4 For the following scalar model where W n, V n iid N(0, 1), X 0 N(0, 1). X n = X n 1 + V n, Y n = X n + W n, (3) 1. Synthesise a data-sets y 0:T for T = 5000, = 0.8, = 1 with varying = 0.001, 0.01, 0.1, 1, 10. Store the real state trajectory x 0:T for future comparisons in each case. 1.1 mplement the auxiliary PF (APF) for bootstrap or optimal importance proposals. 1.2 Compare with bootstrap PF and with SR with optimal proposal in terms of accuracy for filter mean and variance, as well as Monte Carlo variance of the marginal likelihood. 1.3 How small does needs to get so that the APF shows superior performance? 2. For some cases, e.g. = implement the resample move PF for L = 1, M = 3. Plot the ESS for the resample move and compare with APF, bootstrap PF, and optimal proposal PF. 2.2 repeat the above using adaptive resampling PF.

26 SMC for static state spaces Tempering in the absence of dynamics can be used to introduce the question on how can SMC be used when state space is fixed in contrast to dynamically increasing in HMMs, e.g. simply X instead of X n Example from Bayesian inference p(x y 0:n ) / ny p(y k x)p(x) k=0 or more simple example p(x y) / p(y x)p(x) written with tempering, 0 = 0, n = 1 p(x y) / ny p(y x) k k 1 p(x) k=0

27 SMC for static state spaces Method is often referred to as SMC samplers or simply SMC Answer is: at each time k replace dynamics (in earlier algorithm from q or f ) with MCMC steps invariant to Q k p=0 p(y p x)p(x) can use particles to tune MCMC steps i.e. use independence sampler or random walks with covariances from particle approximation there is an interpretation construct a time varying target on an artificial state space model with marginal at time n being p(x y 0:n) Some references: Chopin 01 Biometrika, Del Moral, Doucet & Jasra 06 JRSSB

28 ntroduction to parameter estimation So far: we have managed to get a very good approximation of p (x n y 0:n ) in this case path degeneracy does not matter s this useful? yes, we can track the unknown ship in the sea but only when is known So how do we estimate? this problem is known as parameter inference for HMMs, model calibration, system identification very crucial in practice you cannot do filtering/prediction/smoothing without often ad-hoc calibration methods are used

29 ntroduction to parameter estimation We are interested in principled inferential methods or procedures Bayesian Maximum likelihood nference can be performed either on-line batch (or offline) We need to use PFs within algorithms that are meant to perform inference for.

30 ntroduction to parameter estimation Some algorithms Likelihood methods optimisation based gradient based expectation maximisation Bayesian methods naive approach: augmentstatex 0:n with and do filtering Pseudo marginal MCMC methods: Particle MCMC, Particle Gibbs nested SMC approach: SMC 2

31 Reading List Read introductory Particle MCMC book chapter by Andrieu, Doucet and Holenstein holenstein_pmcmc_mcqmc.pdf Have a look at a review on parameter estimation: singh_maciejowski_tutorialparameterestimation.pdf

32 Bayesian nference Parameter is a random variable and Y is some dataset Bayes rule: posterior/ likelihood prior p( Y ) / p(y )p( ) Markov chain Monte Carlo (MCMC): Obtain samples of using and appropriate ergodic Markov chain { (k)} k 0 with stationary distribution p( Y )

33 Bayesian inference for HMMs Choose a suitable prior density p ( ) for Approximate p ( y 0:n ) which is given by Off-line case: p ( y 0:n ) / p (y 0:n ) p ( ). (4) Compute the joint posterior density p (x0:t, y 0:T ) On-line or sequential case: Compute sequence of posterior densities {p (x 0:n, y 0:n )} on-line means also same quality in every time with fixed computational/memory cost

34 Generic Metropolis Hastings for sampling p( Y ) Sample (0) p( ). Atiterationk 1 Sample proposal 0 q( (k 1)) Compute acceptance ratio (, 0 )=1 ^ p(y 0 )p( 0 )q( (k 1) 0 ) p(y (k 1))p( (k 1))q( 0 (k 1)) With probability (, 0 ) accept proposal setting (k) = 0, otherwise reject sample and set (k) = (k 1)

35 Metropolis Hastings for HMMs Sample (0) p( ). Atiterationk 1 Sample proposal 0 q( ), where = (k 1). Compute acceptance ratio (, 0 )=1 ^ p 0 (y 0:T ) p ( 0 ) q( 0 ) p (y 0:T ) p ( ) q( 0 ) with probability (, 0 ) accept proposal setting (k) = 0, otherwise reject sample and set (k) = (k 1).

36 Metropolis Hastings for HMMs Hard to implement directly as p 0 (y 0:T ) is intractable Could use p (x 0:T, y 0:T )= (x 0 ) T Q k=1 f (x k x k Q 1 ) T g (y k x k ) to k=0 design sampler targetting p (x 0:T, y 0:T ) Approach is usually inefficient: but mixing could deteriote rapidly with T path in x0:t is strongly correlated difficult to find useful hierarchical structure or conditional independencies.

37 Metropolis Hastings for HMMs Take an approach pseudo-marginal approach (Andrieu & Roberts 2009) choose appropriate auxiliary variables. Consider instead sampling from p (x 0:T, y 0:T ) and then integrating out x 0:T ideal marginal Metropolis Sampler marginalising x0:t means running a MCMC chain targetting p (x 0:T, y 0:T ) and using only generated -s for Monte Carlo approximations.

38 deal Marginal Metropolis-Hastings sampler The ideal MMH sampler would utilize the following proposal density: q x 0 0:T, 0 (x 0:T, ) = q 0 p x 0 0:T y 0:T, 0 (5) The acceptance probability is 1 ^ p (x 0 0:T, 0 y 0:T ) q ((x 0:T, ) (x 0 0:T, 0 )) p (x 0:T, y 0:T ) q x 0 0:T, 0 (x 0:T, ) =1 ^ p 0 (y 0:T ) p ( 0 ) q( 0 ) p (y 0:T ) p ( ) q( 0 ) =1 ^ Z 0 T p( 0 )q( 0 ) Z T p( )q( 0 ).

39 Marginal Metropolis-Hastings sampler We cannot sample exactly from p (x 0:T, y 0:T ) and we cannot compute the terms Z T and Z 0 T. AsamplerwithparticleapproximationsforZ T and ZT 0 has the same marginal as an ideal PMMH sampler. it is pseudo marginal sampler targetting p x 1 0,...,x N 0, x 1 1,...,x N n, O 1 (1),...,O 1 (N),...,O n (N), x 0:T, y 0:T all the variables used to construct SMC algorithm n o Xn, i O n (i) N T can be included together with X i=1 0:T as n=1 auxiliary variables and then integrated out. validity of algorithm based on unbiasedness of likelihood E N [ˆp 0 (y 0:T )] = p 0 (y 0:T ) Andrieu, Doucet and Holenstein 2010 particle MCMC paper

40 Particle Marginal Metropolis-Hastings (PMMH) sampler At iteration k = 0, Set (0) p( ). Run an SMC algorithm targeting p (x 0:T, y 0:T ),sample X 0:T (0) bp (dx 0:T y 0:T, (0)), and compute estimate b Z T ( (0)) At iteration k 1 Sample a proposal 0 q ( (k 1)). Run an SMC algorithm targeting p (x 0:T, 0 y 0:T ),samplex 0 0:T bp (dx 0:T y 0:T, 0 ), and compute estimate b Z T ( 0 ). Set (k) = 0, X 0:T (k) =X0:T 0, with probability 1 ^ bz T ( 0 ) p( 0 )q( (k 1) 0 ) bz T ( (k 1))p( (k 1))q( 0 (k 1)), otherwise set (k) = (k 1), X 0:T (k) =X 0:T (k 1).

41 Particle Marginal Metropolis-Hastings (PMMH) sampler The remarkable feature of this algorithm is that the invariant distribution of the Markov chain {X 0:T (k), (k)} is p (x 0:T, y 0:T ) whatever being N. SMC approximations do not introduce any bias. minimal tuning required compared to usual MCMC. The higher N the better the mixing properties of the algorithm. tradeoff with added computational cost could be balanced Under favorable mixing assumptions the variance of the acceptance rate of the PMMH sampler is proportional to T /N N should roughly increase linearly with T, so computational cost O(T 2 ) this can be potentially relaxed

42 Online Bayesian estimation ntroducing the extended state X n =(X n, n ) with initial density p ( 0 ) µ 0 (x 0 ) The transition density is i.e. n = n 1. f n (x n x n 1 ) n 1 ( n ) Applying a standard SMC algorithm to the Markov process {X n } n 0 : parameter space would only be explored at the initialization of the algorithm. successive resampling steps, after a certain time n, the approximation bp (d y 0:n ) will only contain a single unique value for. implicitly requires having to approximate p (i) (y 0:n ) for all the particles (i) approximating p ( y 0:n ), hence we expect estimates whose variance will increase at least linearly with n;

43 Online Bayesian estimation Pragmatic solutions: use artificial dynamics (Liu and West 2001, Hurzeler and Kunsch 2001), simple example n = n 1 + n with n being zero mean noise with small variance can tune variance from the particles also can use fixed lag approximations (Polson et al 2008) stop resampling before n L

44 Online Bayesian estimation Resample Move (Gillks and Berzuini 2001): use an MCMC kernel with invariant density p (x 0:n, y 0:n ),i.e. X (i) 0:n, (i) n K n, X i 0:n, i n where by construction K n satisfies Z p x0:n, 0 0 y 0:n = p (x 0:n, y 0:n ) K n x0:n, 0 0 x 0:n, d (x 0:n, ). n practice set X (i) 0:n L = X i 0:n L for some integer L 1 and only sample (i) n and possibly X (i) n L+1:n

45 Resample Move some cases we can use Gibbs step to update the parameter values K n x 0 0:n, 0 x 0:n, = x0:n x 0 0:n p( 0 x 0:n, y 0:n ), where p ( y 0:n, x 0:n )=p( s n (x 0:n, y 0:n )) with s n (x 0:n, y 0:n ) fixed dimension sufficient statistic. With some variation this has appeared many times: Andrieu et al 1999, Fearnhead 2002, Storvik 2002, Johannes and Polson Elegant, but still not robust since it relies on SMC approximations of p(s n (x 0:n, y 0:n ) y 0:n ), and for fixed N, error increases with n. issue is path degeneracy Unsuitable for high dimensions (> 5 10)

46 Numerical example We will use again X n = X n 1 + W n, Y n = X n + V n (6) where W n, V n iid N(0, 1).

47 Numerical example: on-line inference pdf, n=5000 pdf, n=4000 pdf, n=3000 pdf, n=2000 pdf, n= σ y ρ Figure: Particle method with MCMC, =(, 2 );

48 Numerical example: on-line inference Particle method with MCMC Particle Gibbs σ ρ Figure: Estimated marginal posterior densities for =(, 2 ) with T = 10 3 over 50 runs (black-dotted) versus ground truth (green). Top: Particle method with MCMC, N = Bottom: Particle Gibbs with 3000 iterations and N = 50.

49 Likelihood estimation methods with particle filtering Some algorithms Likelihood methods optimisation based gradient based expectation maximisation offline or online we will focus on offline methods only sketch on-line ones to give very basic idea

50 Maximum Likelihood based methods Off-line case: Estimate of as the maximizing argument of the marginal likelihood of the observed data: b = arg max 2 l T ( ) (7) where Online case: `T ( ) =log p (y 0:T ). (8) use a recursive method let n be the estimate of the model parameter after n 1 observations update the estimate to n+1 after receiving the new data y n.

51 Offline Maximum Likelihood based methods Off-line case: Estimate of as: b = arg max 2 ˆl T ( ) (9) where ˆ`T ( ) = \ log p (y 0:T ). Can use direct optimisation grid on, BFGS, or other popular optimisation methods is difficult due to variance of ˆp (y 0:T )

52 On the Monte Carlo variance of p (y 0:T ) Recall, SMC results in unbiased estimation of the marginal likelihood E N [ˆp (y 0:T )] = p (y 0:T ) Loosely speaking ˆp (y 0:T )=p (y 0:T )+V with V some non-trivial zero mean noise depending on T, N and model. recall bp (y 0:n ) has a relative (non-asymptotic) variance that increases linearly with n The monte carlo variability is quite an issue for finding maximum over

53 Approximating log p (y 0:T ) Note that E N [ˆp (y 0:T )] = p (y 0:T ) implies that E N [log ˆp (y 0:T )] 6= log p (y 0:T ) So log ˆp (y 0:T ) is a biased estimator. Can we correct for the bias?

54 Approximating log p (y 0:T ) Can use bias correction based on Taylor series log(z) =log Z Z 0 (Z Z 0 ) Let Z 0 = E[Z] then ignoring higher order terms 1 2Z 02 (Z Z 0 ) 2 + O(Z 3 ) E [log(z)] = log E[Z] 1 2E[Z] 2 Var[Z] What we have is Z = b Z =ˆp (y 0:T ) and Z 0 = p (y 0:T ) E [log ˆp (y 0:T )] = log p (y 0:T ) Var [ˆp (y 0:T )] 2p (y 0:T ) 2

55 Approximating log p (y 0:T ) Note from slides 1 or 3: Var [ˆp (y 0:T )] p (y 0:T ) 2 Z N p (y 0:T ) 2 Z N p (y 0:T ) 2 Z N q(x 0:T )p(x 0:T y 0:T )dx 0:T 1 w(x 0:T )p(x 0:T y 0:T )dx 0:T 1 T Y n=0 w n (x n 1:n )! p(x 0:T y 0:T )dx 0:T 1! Lets say Ŵ being the particle approximation of R Q T n=0 w n(x n 1:n ) p(x 0:T y 0:T )dx 0:T

56 Approximating log p (y 0:T ) We get then So can use E [log ˆp (y 0:T )] = log ˆp (y 0:T ) (Ŵ 1) 2N log \ p (y 0:T )=log ˆp (y 0:T )+Ŵ 1 2N as a bias reduced estimator for l T

57 Optimising log p (y 0:T ) w.r.t Still ˆ`T ( ) = \ log p (y 0:T ) will exhibit quite a bit of variance This can make finding maximum difficult Potential remedies: smooth the approximation as a function of use a different resampling scheme (Pitt 02, Lee 10) try to reduce the variance with multiple runs

58 Expectation Maximisation Expectation Maximization (EM) algorithm is a very popular alternative procedure for maximizing `T ( ). At iteration k + 1, we set k+1 = arg max Q( k, ) (10) where Z Q( k, )= log p (x 0:T, y 0:T ) p k (x 0:T y 0:T )dx 0:T. (11) The sequence {`T ( k )} k non-decreasing. 0 generated by this algorithm is

59 Expectation Maximisation n particular if p (x 0:T, y 0:T ) belongs to the exponential family, then the EM consists of computing a n s -dimensional summary statistic like Sn the maximizing argument of Q( k, ) can be characterized explicitly through a suitable function :R ns!, i.e. k+1 = S k T Particle implementation consists of computing S k n. (12)

60 Additive functionals S n Sn is an additive functional Z " # nx Sn = s k (x k, x k 1 ) p (x 0:n y 0:n ) dx 0:n, (13) k=0 Theory tells that the asymptotic variance of the SMC estimate Z " # nx cs n = s k (x k, x k 1 ) bp (dx 0:n y 0:n ), (14) satisfies k=0 V cs n even with exponential filter stability. D n 2 N. (15) This motivates the use of dedicated smoothing algorithms

61 Gradient ascent The log-likelihood may be maximized with the following steepest ascent algorithm: at iteration k + 1 k+1 = k + k+1 r `T ( ) = k, (16) { k } k 1 needs to satisfy P k k = 1 and P k could also use Hessian but omitted for simplicity 2 k < 1. To obtain the score vector r `T ( ) we can use Fisher s identity Fisher identity Z r log p (y 0:n )= r log p (x 0:n, y 0:n ) p (x 0:n y 0:n ) dx 0:n The latter is of the form of S n again.

62 Gradient ascent We have ny r log p (x 0:n, y 0:n ) = r log f (x p x p 1 ) g (y p x p ) = Define: p=0 nx (r log f (x p x p 1 )+rlog g (y p x p )) p=0 s p (x p 1:p )=rlog f (x p x p 1 )+rlog g (y p x p ). r log p (y 0:n ) is of the form of Sn again.

63 Smoothing algorithms We are essentially interested in designing better particle approximations for {p (x n y 0:T )} T n=0 Some popular approaches fixed lag smoothing forward filtering backward sampling forward filtering backward smoothing

64 Fixed lag smoothing For state-space models with good forgetting properties if L large enough then p (x 0:n y 0:T ) p x 0:n y 0:(n+L)^T observations collected at times k > n + L do not bring any significant additional information about X 0:n. Fixed lag approximation (Kitagawa & Sato 2001): do not resample the components X i 0:n of the particles X i 0:k obtained by particle filtering at times k > n + L. Could work in practice, but method is asymptotically biased and it might be hard to tune L.

65 Forward-Backward Smoothing using sampling Backward interpretation The joint smoothing distribution p (x 0:T y 0:T ) can be expressed as a function of the filtering distributions {p (x n y 0:n )} T n=0 as follows TY 1 p (x 0:T y 0:T )=p (x T y 0:T ) p (x n y 0:n, x n+1 ) (17) where n=0 p (x n y 0:n, x n+1 )= f (x n+1 x n ) p (x n y 0:n ). (18) p (x n+1 y 0:n )

66 Particle mplementation Forward Filtering Backward Sampling (FFBSa) : run a particle filter from time n = 0toT, storing the approximate filtering distributions {bp (dx n y 0:n )} T n=0,i. Sample X T bp (dx T y 0:T ) and for n = T 1, T 2,...,0sample X n bp (dx n y 0:n, X n+1 ) where this distribution is obtained by substituting bp (dx n y 0:n ) for p (dx n y 0:n ) in (18): bp (dx n y 0:n, X n+1 )= P N i=1 W i nf (X n+1 X i n) X i (dx n n) P N i=1 W. (19) nf i (X n+1 Xn) i

67 Forward-Backward Smoothing A backward in time recursion for {p (x n y 0:T )} T n=0 follows by integrating out x 0:n 1 and x n+1:t in (17) while applying (18): Z p (x n y 0:T ) = p (x n, x n+1 y 0:T ) dx n+1 Z = p (x n y 0:n, x n+1 ) p (x n+1 y 0:T ) dx n+1 = Z f (x n+1 x n ) p (x n y 0:n ) p (x n+1 y 0:T ) dx n+1. p (x n+1 y 0:n )

68 Forward-Backward Smoothing So the backward in time recursion for {p (x n y 0:T )} T n=0 is: Z f (x n+1 x n ) p (x n+1 y 0:T ) p (x n y 0:T )=p (x n y 0:n ) dx n+1. p (x n+1 y 0:n ) (20) So {p (x n y 0:n )} T n=0 can be used in a backward pass to obtain {p (x n y 0:T )} T n=0 and {p (x n y 0:n, x n+1 )} T 1 n=0.

69 Particle mplementation Forward Filtering Backward Smoothing (FFBSm) : Assume we have an approximation p (dx n+1 y 0:T )= NX i=1 W i n+1 T X i n+1 (dx n+1) where W T i T = W T i the approximation then by using (20) and (19), we obtain p (dx n y 0:T )= NX W n T i Xn i (dx n) i=1 with W i n T = W i n NX W j n+1 T f X j n+1 X n i P. (21) N l=1 W nf l X j n+1 X n l j=1

70 Particle mplementation Forward Filtering Backward Smoothing (FFBSm) : Run a particle filter from time n = 0toT, storing the approximate filtering distributions {bp (dx n y 0:n )} T n=0, nitialise backward pass: W T i T = W T i for n = T 1, T 2,...,0computeweights W i n T = W i n NX j=1 and obtain the approximation W j n+1 T f P N l=1 W nf l X j n+1 X n l X j n+1 X n i. (22) p (dx n y 0:T )= NX W n T i Xn i (dx n) i=1

71 Particle mplementation Lets say we have performed Forward Filtering Backward Smoothing (FFBSm) : Assume we have an approximation p (dx n+1 y 0:T )= NX i=1 W i n+1 T and are interested to obtain the approximation p (dx n, dx n+1 y 0:T )= NX i=1 W i n,n+1 T with Xn a(i) being the ancestor of Xn+1 i pair Xn a(i), Xn+1 i by W i n,n+1 T = W a(i) n Wn+1 T i f P N l=1 W nf l X i n+1 (dx n+1) Xn a(i) (dx,xn+1 i n ) then we can weight the X i n+1 a(i) Xn X i n+1 X l n. (23)

72 Discussion n both previous slides the computational cost is prop. to N 2 T operations in total Assuming expontential forgetting: S n based on the fixed-lag approximation has an asymptotic variance with rate n/n with a non-vanishing (as N!1)bias proportional to n and a constant decreasing exponentially fast with L. The asymptotic bias and variance of the particle estimate of Sn computed using the forward-backward procedures satisfy: E bs n Sn n apple F N, V bs n n apple H N. (24) but note this is using algorithms at cost of N 2 T operations

73 Discussion To compute b S n one can implement with cost N 2 T Then 1. simple particle filter with N 2 particles 2. FFBS particle filter with N particles Case 1: suffers from path degeneracy bias of order T /N 2 variance at least of order T 2 /N 2 Case 2: more expensive bias of order T /N variance of order T /N

74 On-line methods On-line/ Forwards only extensions for EM and gradient methods do exist. Poyiadjis, Doucet, Singh 11 Cappe 09 Del Moral, Doucet, Singh 09 Understanding them is beyond this course Next couple of slides are for general information & interest

75 On-line methods On-line extensions for EM and gradient methods do exist. For gradient method: n+1 = n + n+1 r log p 0:n (y n y 0:n 1 ) (25) where r log p 0:n (y n y 0:n 1 ) is defined as r log p 0:n (y n y 0:n 1 )=rlog p 0:n (y 0:n ) r log p 0:n 1 (y 0:n 1 ), (26)

76 On-line methods The notation r log p 0:n (y 0:n ) corresponds to a time-varying score which is computed with a filter using the parameter p at time p. Using Fisher s identity to compute this time-varying score, then we have for 1 apple p apple n s p (x p 1:p )=rlog f (x p x p 1 ) = p + r log g (y p x p ) = p. (27)

77 On-line methods n offline EM maximisation can be rewritten as k+1 = T 1 S k T. (28) So for on-line EM can use Robbins-Monro averaging R S 0:n = n+1 sn (x n 1:n ) p 0:n (x n 1, x n y 0:n )dx n 1:n +(1 n+1) P! n nq k=0 (1 i) k+1 i=k+2 R s k (x k 1:k ) p 0:k (x k 1:k y 0:k )dx k 1:k, (29) Then use standard maximization step is used as in the batch version: n+1 = (S 0:n ). There is also a forward only implementation of FFBSm (Del Moral et. al. 2009)

78 Discussion On-line and offline parameter estimation drops down to computing smoothed integrals of additive functions Can either use standard algorithm (with O(N) cost) or dedicated smoothing algorithms (with O(N 2 ) cost) With the exception of on-line gradient methods when the same computational cost is used: the first choice suffers from the variance the second suffers from the bias both give similar MSE

79 Numerical example 200 O(N) method 200 O(N 2 ) method Bi as ( Ŝn ) x x Va r ( Ŝn n ) x x MS E( Ŝn n ) time n x time n x 10 4 Figure: Estimating smoothed additive functionals: Empirical bias of the estimate of S n (top panel), empirical variance (middle panel) and mean squared error (bottom panel) for the estimate of S n / p n.

80 Numerical example 0.81 ρ 1.05 τ O(N) method O(N 2 ) method Figure: EM: Boxplots of ˆ n for n algorithms using 100 realizations of the

81 Homework 5 For the following scalar model X n = X n 1 + V n, Y n = X n + W n, (30) where W n, V iid n N(0, 1), X 0 N(0, 1). Synthesiseadata-setsy 0:T for T = 1000, = 0.8, = 1 with varying = 0.01, 0.1, 1. Store the real state trajectory x0:t for future comparisons in each case. mplement the a particle filter of your choice. Using appropriate plots, compare the approximations of the mean and variance of p(x n y 0:T ) using a standard particle filter a particle filter with fixed lag smoothing a particle filter with backward smoothing Comment on the computational cost in each case. (*) n each case compare the approximation of p(x 0:n y 0:n) using plots number of unique particles at certain lags or illustration of sampled paths at different times n results showing Monte Carlo bias and variance for smoothed additive functionals.

82 Coursework instructions Coursework option 1: particle methods Pick a HMM of your choice so that it is possible the state and observation to be multidimensional with dimensions d x and d y resp. Using some known values for the static parameters implement a bootstrap particle filter and a more advanced PF of your choice generate plots and tables to compare the two methods for varying N, d x and d y. assess methods based on accuracy & variance of normalising constant and integrals like posterior (filter) mean, variance, etc. Consider a parameter estimation method of your choice (particle MCMC, gradients, EM) implement it and describe results for varying N, d x and d y using plots and tables. n your answers provide also short comments.

83 Coursework instructions Coursework option 2: f your research is related to computational statistics, or uses MCMC: 1. present your model of interest and problem at hand 2. the inferential method for problem (e.g. Bayesian inference, optimisation etc.) and the challenges involved, 3. simulation method (e.g. MCMC, S, SMC), 4. numerical results, 5. a discussion on how material in this course can be used for extensions Page limit: pages, recommended length around 8 pages, use appendices if you need to go beyond page limits Submit by to n.kantas at imperial.ac.uk using subject: LTCC coursework submission Deadline: 5 Dec 18 (a month)

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Sequential Monte Carlo Samplers for Applications in High Dimensions

Sequential Monte Carlo Samplers for Applications in High Dimensions Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

arxiv: v1 [stat.co] 1 Jun 2015

arxiv: v1 [stat.co] 1 Jun 2015 arxiv:1506.00570v1 [stat.co] 1 Jun 2015 Towards automatic calibration of the number of state particles within the SMC 2 algorithm N. Chopin J. Ridgway M. Gerber O. Papaspiliopoulos CREST-ENSAE, Malakoff,

More information

Sequential Monte Carlo methods for system identification

Sequential Monte Carlo methods for system identification Technical report arxiv:1503.06058v3 [stat.co] 10 Mar 2016 Sequential Monte Carlo methods for system identification Thomas B. Schön, Fredrik Lindsten, Johan Dahlin, Johan Wågberg, Christian A. Naesseth,

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

SMC 2 : an efficient algorithm for sequential analysis of state-space models

SMC 2 : an efficient algorithm for sequential analysis of state-space models SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra

More information

An Brief Overview of Particle Filtering

An Brief Overview of Particle Filtering 1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Sequential Monte Carlo Methods in High Dimensions

Sequential Monte Carlo Methods in High Dimensions Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

The Hierarchical Particle Filter

The Hierarchical Particle Filter and Arnaud Doucet http://go.warwick.ac.uk/amjohansen/talks MCMSki V Lenzerheide 7th January 2016 Context & Outline Filtering in State-Space Models: SIR Particle Filters [GSS93] Block-Sampling Particle

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods University of Pennsylvania Bradley Visitor Lectures October 23, 2017 Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Monte Carlo Approximation of Monte Carlo Filters

Monte Carlo Approximation of Monte Carlo Filters Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space

More information

Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems

Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems Jonas Latz 1 Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems Jonas Latz Technische Universität München Fakultät für Mathematik Lehrstuhl für Numerische Mathematik jonas.latz@tum.de November

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Towards a Bayesian model for Cyber Security

Towards a Bayesian model for Cyber Security Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London

More information

Sequential Monte Carlo Methods (for DSGE Models)

Sequential Monte Carlo Methods (for DSGE Models) Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

An efficient stochastic approximation EM algorithm using conditional particle filters

An efficient stochastic approximation EM algorithm using conditional particle filters An efficient stochastic approximation EM algorithm using conditional particle filters Fredrik Lindsten Linköping University Post Print N.B.: When citing this work, cite the original article. Original Publication:

More information

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016 List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Particle Filters: Convergence Results and High Dimensions

Particle Filters: Convergence Results and High Dimensions Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution

Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution Andreas Svensson, Thomas B. Schön, and Fredrik Lindsten Department of Information Technology,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Divide-and-Conquer Sequential Monte Carlo

Divide-and-Conquer Sequential Monte Carlo Divide-and-Conquer Joint work with: John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick, Fredrik Lindsten, Christian Næsseth, Thomas Schön University of Warwick a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks/

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Sequential Monte Carlo samplers

Sequential Monte Carlo samplers J. R. Statist. Soc. B (2006) 68, Part 3, pp. 411 436 Sequential Monte Carlo samplers Pierre Del Moral, Université Nice Sophia Antipolis, France Arnaud Doucet University of British Columbia, Vancouver,

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

An Adaptive Sequential Monte Carlo Sampler

An Adaptive Sequential Monte Carlo Sampler Bayesian Analysis (2013) 8, Number 2, pp. 411 438 An Adaptive Sequential Monte Carlo Sampler Paul Fearnhead * and Benjamin M. Taylor Abstract. Sequential Monte Carlo (SMC) methods are not only a popular

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.

AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France. AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER Qi Cheng and Pascal Bondon CNRS UMR 8506, Université Paris XI, France. August 27, 2011 Abstract We present a modified bootstrap filter to draw

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

Negative Association, Ordering and Convergence of Resampling Methods

Negative Association, Ordering and Convergence of Resampling Methods Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Particle Learning and Smoothing

Particle Learning and Smoothing Particle Learning and Smoothing Carlos Carvalho, Michael Johannes, Hedibert Lopes and Nicholas Polson This version: September 2009 First draft: December 2007 Abstract In this paper we develop particle

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Particle Metropolis-adjusted Langevin algorithms

Particle Metropolis-adjusted Langevin algorithms Particle Metropolis-adjusted Langevin algorithms Christopher Nemeth, Chris Sherlock and Paul Fearnhead arxiv:1412.7299v3 [stat.me] 27 May 2016 Department of Mathematics and Statistics, Lancaster University,

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Particle Learning for Sequential Bayesian Computation Rejoinder

Particle Learning for Sequential Bayesian Computation Rejoinder BAYESIAN STATISTICS 9, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 20 Particle Learning for Sequential Bayesian

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling

More information

Surveying the Characteristics of Population Monte Carlo

Surveying the Characteristics of Population Monte Carlo International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

TSRT14: Sensor Fusion Lecture 8

TSRT14: Sensor Fusion Lecture 8 TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,

More information

Package RcppSMC. March 18, 2018

Package RcppSMC. March 18, 2018 Type Package Title Rcpp Bindings for Sequential Monte Carlo Version 0.2.1 Date 2018-03-18 Package RcppSMC March 18, 2018 Author Dirk Eddelbuettel, Adam M. Johansen and Leah F. South Maintainer Dirk Eddelbuettel

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Particle Metropolis-Hastings using gradient and Hessian information

Particle Metropolis-Hastings using gradient and Hessian information Particle Metropolis-Hastings using gradient and Hessian information Johan Dahlin, Fredrik Lindsten and Thomas B. Schön September 19, 2014 Abstract Particle Metropolis-Hastings (PMH) allows for Bayesian

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Introduction to Particle Filters for Data Assimilation

Introduction to Particle Filters for Data Assimilation Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,

More information

Inexact approximations for doubly and triply intractable problems

Inexact approximations for doubly and triply intractable problems Inexact approximations for doubly and triply intractable problems March 27th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers of) interacting

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information