LTCC: Advanced Computational Methods in Statistics
|
|
- Darcy Manning
- 5 years ago
- Views:
Transcription
1 LTCC: Advanced Computational Methods in Statistics Advanced Particle Methods & Parameter estimation for HMMs N. Kantas Notes at Slides at
2 ntroduction Particle methods as presented so far can be challenged by: weight degeneracy low observation noise, high dimensions path degeneracy crucial issue when parameters unknown More elaborate/advanced methods methods can be effective Need to adress also parameter estimation using approaches that are: Bayesian or Maximum likelihood on-line or off-line (batch)
3 Outline Advanced methods adaptive resampling the resample move PF the auxiliary particle filter SMC for fixed state spaces Parameter estimation Bayesian or Maximum likelihood on-line or off-line
4 Recipes to improve performance There are more elaborate particle filtering algorithms they can work better than vanilla version in terms of variance of estimators, ESS, accuracy etc. but they do not adress path degeracy due to resampling We will look at: often just mask it or postpone it. adaptive resampling, resample move PF, the auxiliary particle filter note one can combine all the above together
5 Adaptive resampling While resampling is a key component to have a good approximation it tends to leave early states being represented by few particles. adaptive resampling Key idea: use resampling only when you need to Resample only when ESS n apple N e.g. = 1/2. When you dont resample continue with SS
6 SR filter with adaptive resampling At time n 1 Sample X i n q (x n y n, X i n 1 ) and set X i 0:n X i 0:n 1, X i n. Compute the weights w n X i n 1:n and set W i n / W i n 1 w n X i n 1:n, P N i=1 W i n = 1. F ESS n apple N resample W i n n, X0:n i o to obtain N new equally-weighted 1 particles N, X i 0:n. set X i 0:n X i 0:n, W i n 1 N
7 The resample move particle filter (Berzuini & Gilkks 2001 JRSSB) Fight path degeneracy by re-inserting lost diversity in the particles using appropriate MCMC moves on the path space At time n 1 Sample X i n q (x n y n, X i n 1 ) and set X i 0:n X i 0:n 1, X i n Compute the weights w n X i n 1:n and set W i n / w n X i n 1:n, P N i=1 W i n = 1. Resample n Wn, i X0:n i o to obtain N new equally-weighted particles 1 N, X i 0:n. Move particles by independently (for each i) sampling X i 0:n K MCMC ( X i 0:n).
8 The resample move particle filter target density for MCMC move is ny ny p (x 0:n y 0:n ) / (x 0 ) f (x k x k 1 ) g (y k x k ) MCMC proposal in this context k=1 k=0 just provides a jitter or shake in the particle population does not need to move the whole trajectory, moving only X n L+1:n can still lead to correct algorithm a Gibbs update would be very useful if available Note: we are not relying on ergodic properties of MCMC, just invariance want to preserve statistical properties of sample
9 The resample move particle filter using RW-MH Random walk algorithm for X i 0:n K MCMC ( X i 0:n) Set 0:n = X i 0:n For m = 1,...,M Sample U N(0, S), with S of appropriate dimension Propose Z n L+1:n = n L+1:n + cu Compute acceptance ratio = 1 ^ with probab. : nq k=n L+1 nq k=n L+1 f (Z k Z k 1 ) g (y k Z k ) f ( k k 1 ) g (y k k ) accept 0:n ( 0:n L, Z n L+1:n ) otherwise reject proposal and 0:n remains the same
10 The resample move particle filter M can be quite small 1-5 Tuning Can use particles to design S, e.g. look at the empirical covariance of the particles after resampling c can be tuned for average acceptance ratio around Other MCMC moves are possible, Gibbs, Hybrid Monte Carlo,... Method will increase diversity a bit, but notice that it does not affect the weights it might be more effective to use likelihood informed proposals and weights The last point is related to the auxiliary particle filter by (Pitt & Sheppard 99 JASA)
11 The auxiliary particle filter Resample Move and adaptive resampling are meant to improve path degeneracy What if weight degeneracy due to S is still present? Consider the Bayesian recursion: p (x 0:n y 0:n )= 1 Z n p (x 0:n 1 y 0:n 1 ) f (x n x n 1 ) g (y n x n ) with Z n = p (y n y 0:n 1 ). Bootstrap filter: move with f (x n x n 1 ) and weight with g (y n x n ) Alternative route : weight with p (y n x n 1 ) and then move with p (x n x n 1, y n ) Recall
12 The auxiliary particle filter Alternative route : weight with p (y n x n with p (x n x n 1, y n ) Recall 1 ) and then move p (x n x n 1, y n )= f (x n x n 1 ) g (y n x n ) p (y n x n 1 ) (Pitt & Sheppard 99 JASA) Can reverse the steps: move with p ( x n x n 1, y n) and weight with p ( y n+1 x n) Optimal p (x n x n 1, y n ) not available in practice! Can use approximations: move with q (x n x n 1, y n ) and weight with q (y n+1, x n )
13 The auxiliary particle filter On approximations: here q (y n+1, x n ) is not necessarily required to be a pdf just an easy to evaluate non-negative function of (x n, y n+1). often is called a score-function (name is misleading as it is used to denote also gradient term in parameter estimation) q (x n x n 1, y n ) can be a good importance distribution that takes into account the current observation
14 The auxiliary particle filter nstead of the original problem consider the target: n (x 0:n y 0:n ) / (x 0 ) g (y 0 x 0 ) q (y 1, x 0 ) ny f (x k x k 1 ) g (y k x k ) q (y k+1, x k ) q (y k, x k 1 ) Note q (y 1, x 0 ) k=1 ny k=0 q (y k+1, x k ) q (y k, x k 1 ) = q (y n+1, x n ) This means we are targetting a density, twisted with a lookahead n (x 0:n y 0:n ) / p (x 0:n y 0:n ) q (y n+1, x n )
15 The auxiliary particle filter What is the auxiliary PF? it is a PF targetting n using proposal q (x n y n, x n 1 ) We will implement a PF targetting n using as proposal q (x n y n, x n 1 ) and then reweight to get approximations for original n that is actually of interest. Why do we do this: the PF for n is more stable numerically new likelihood g (y n x n ) q (y n+1,x n) q (y n,x n 1) might be less peaky or informative n closer to n 1
16 The auxiliary particle filter So in path space target is n (x 0:n y 0:n ) / n 1 (x 0:n 1 y 0:n 1 ) and proposal f (x n x n 1 ) g (y n x k ) q (y n+1, x n ) q (y n, x n 1 ) q(x 0:n ) / n (x 0:n 1 y 0:n 1 ) q (x n y n, x n 1 ) This leads to the following weights to propagate the particles: w n (x n, x n 1 )= f (x k x k 1 ) g (y k x k ) q (y n+1, x n ) q (y n, x n 1 ) q (x n y n, x n 1 ) = w n (x n 1:n ) q (y n+1, x n )
17 The auxiliary particle filter For convenience we will split evaluation of the weight w n in two time steps evaluate part on yn+1 at time n + 1 Here we use the notation: w 0 (x 0 ) = g (y 0 x 0 ) (x 0 ), q (x 0 y 0 ) w n (x n 1:n ) = g (y n x n ) f (x n x n 1 ) q (x n, y n x n 1 ) where we denote for n 1, for n 1 q (x n, y n x n 1 )=q (x n y n, x n 1 ) q (y n, x n 1 ) (Pitt & Sheppard 99 JASA) recommends using if available q (x n y n, x n 1 )=p (x n y n, x n 1 ) and q (y n, x n 1 )=p (y n x n 1 ) or approximations of them
18 The auxiliary particle filter At time n = 0,foralli 2{1,...,N}: 1. Sample X i 0 q (x 0 y 0 ). 2. Compute W i 1 / w 0 X i 0 q y 1, X i 0, P N i=1 W i 1 = Resample X i 0 P N i=1 W i 1 X0 i (dx 0). At time n 1,foralli 2{1,...,N}: 1. Sample Xn i q (x n y n, X i n 1) and set X0:n X i i 0:n 1, Xn i. 2. Compute W i n+1 / w n X i n 1:n q y n+1, X i n, P N i=1 W i n+1 = Resample X i 0:n P N i=1 W i n+1 X i 0:n (dx 0:n).
19 The auxiliary particle filter BUT note we want the approximations of p (x 0:n y 0:n ) and p (y n y 0:n 1 ) These are given by: NX bp (dx 0:n y 0:n )= Wn i X0:n i (dx 0:n), (1) i=1 bp (y n y 0:n 1 )= 1 N where and! NX w n Xn i 1:n i=1 W i n / w n X i n 1:n, bp (y 0 )= 1 N NX Wn i 1q y n, Xn i 1 i=1 NX Wn i = 1 i=1 NX w 0 X0 i. i=1 (2)!
20 Discussion Choice of w n convenient for reweighting w n ( ) is used to approximate bp (dx 0:n y 0:n ) w n ( ) is used to weight particles connection between two is simply S What are we doing we are changing carefully the weight so that algorithm is well behaved by multiplying with something and dividing at the next step This can be effective when Xt high dimensional or g too informative
21 Discussion Neat extension let X k is obtained from a discretisation of a continuous process, e.g. via an Euler scheme Set q (y k+1, x k ) M q (y k, x k 1 ) = Y r k,m (y k+1, y k, x k,m ) r m=1 k,m 1 (y k+1, y k, x k 1,m 1 ) with X 0,m = X k 1 and X k,m = X k. Doing the same thing as above means that you do intermediate M weight resample steps to process observation Y k+1. Detailed exposition in (Del Moral, Murray SAM/ASA UQ).
22 Tempering based approach Another example that fits this framework is tempering Consider r k,m = g(y k+1 x k,m ) m r k,0 = g(y k x k,0 ) with M = 1and0< 1 < 2 <...< m n the presence of dynamics for x k,m (e.g. discretisation of SDE) implementation is as above.
23 Tempering based approach Some notes: can tune m according to ESS (adaptive tempering) in the absense of dynamics for xk,m use MCMC steps that are invariant to 1 Z k,m p(x 0:k 1 y 0:k 1 )f (x k x k 1 ) g (y k x k ) m otherwise method prone to resampling degenaracy method can be very effective in high dimensions Some references original PF with tempering in Godsill & Clapp 01, based on Neal 01, Jarzynski 97 More resent papers set Jasra, Stephens, Doucet Tsagaris 01, K., Beskos & Jasra 14
24 Discussion: summary Path degeneracy can be addressed partially by: adaptive resampling: applying resampling only when necessary using MCMC moves to jitter the particles and reintroduce lost diversity in particle approximations note that path degeneracy will be still present! Weight degeneracy can be addressed by good selection of importance proposals changing the target sequence to an easier problem as in APF introducing intermediate artificial weighting-resampling sequence, e.g. tempering. Can use all ideas above together to get a very powerful algorithm but also a bit complicated algorithm
25 Homework 4 For the following scalar model where W n, V n iid N(0, 1), X 0 N(0, 1). X n = X n 1 + V n, Y n = X n + W n, (3) 1. Synthesise a data-sets y 0:T for T = 5000, = 0.8, = 1 with varying = 0.001, 0.01, 0.1, 1, 10. Store the real state trajectory x 0:T for future comparisons in each case. 1.1 mplement the auxiliary PF (APF) for bootstrap or optimal importance proposals. 1.2 Compare with bootstrap PF and with SR with optimal proposal in terms of accuracy for filter mean and variance, as well as Monte Carlo variance of the marginal likelihood. 1.3 How small does needs to get so that the APF shows superior performance? 2. For some cases, e.g. = implement the resample move PF for L = 1, M = 3. Plot the ESS for the resample move and compare with APF, bootstrap PF, and optimal proposal PF. 2.2 repeat the above using adaptive resampling PF.
26 SMC for static state spaces Tempering in the absence of dynamics can be used to introduce the question on how can SMC be used when state space is fixed in contrast to dynamically increasing in HMMs, e.g. simply X instead of X n Example from Bayesian inference p(x y 0:n ) / ny p(y k x)p(x) k=0 or more simple example p(x y) / p(y x)p(x) written with tempering, 0 = 0, n = 1 p(x y) / ny p(y x) k k 1 p(x) k=0
27 SMC for static state spaces Method is often referred to as SMC samplers or simply SMC Answer is: at each time k replace dynamics (in earlier algorithm from q or f ) with MCMC steps invariant to Q k p=0 p(y p x)p(x) can use particles to tune MCMC steps i.e. use independence sampler or random walks with covariances from particle approximation there is an interpretation construct a time varying target on an artificial state space model with marginal at time n being p(x y 0:n) Some references: Chopin 01 Biometrika, Del Moral, Doucet & Jasra 06 JRSSB
28 ntroduction to parameter estimation So far: we have managed to get a very good approximation of p (x n y 0:n ) in this case path degeneracy does not matter s this useful? yes, we can track the unknown ship in the sea but only when is known So how do we estimate? this problem is known as parameter inference for HMMs, model calibration, system identification very crucial in practice you cannot do filtering/prediction/smoothing without often ad-hoc calibration methods are used
29 ntroduction to parameter estimation We are interested in principled inferential methods or procedures Bayesian Maximum likelihood nference can be performed either on-line batch (or offline) We need to use PFs within algorithms that are meant to perform inference for.
30 ntroduction to parameter estimation Some algorithms Likelihood methods optimisation based gradient based expectation maximisation Bayesian methods naive approach: augmentstatex 0:n with and do filtering Pseudo marginal MCMC methods: Particle MCMC, Particle Gibbs nested SMC approach: SMC 2
31 Reading List Read introductory Particle MCMC book chapter by Andrieu, Doucet and Holenstein holenstein_pmcmc_mcqmc.pdf Have a look at a review on parameter estimation: singh_maciejowski_tutorialparameterestimation.pdf
32 Bayesian nference Parameter is a random variable and Y is some dataset Bayes rule: posterior/ likelihood prior p( Y ) / p(y )p( ) Markov chain Monte Carlo (MCMC): Obtain samples of using and appropriate ergodic Markov chain { (k)} k 0 with stationary distribution p( Y )
33 Bayesian inference for HMMs Choose a suitable prior density p ( ) for Approximate p ( y 0:n ) which is given by Off-line case: p ( y 0:n ) / p (y 0:n ) p ( ). (4) Compute the joint posterior density p (x0:t, y 0:T ) On-line or sequential case: Compute sequence of posterior densities {p (x 0:n, y 0:n )} on-line means also same quality in every time with fixed computational/memory cost
34 Generic Metropolis Hastings for sampling p( Y ) Sample (0) p( ). Atiterationk 1 Sample proposal 0 q( (k 1)) Compute acceptance ratio (, 0 )=1 ^ p(y 0 )p( 0 )q( (k 1) 0 ) p(y (k 1))p( (k 1))q( 0 (k 1)) With probability (, 0 ) accept proposal setting (k) = 0, otherwise reject sample and set (k) = (k 1)
35 Metropolis Hastings for HMMs Sample (0) p( ). Atiterationk 1 Sample proposal 0 q( ), where = (k 1). Compute acceptance ratio (, 0 )=1 ^ p 0 (y 0:T ) p ( 0 ) q( 0 ) p (y 0:T ) p ( ) q( 0 ) with probability (, 0 ) accept proposal setting (k) = 0, otherwise reject sample and set (k) = (k 1).
36 Metropolis Hastings for HMMs Hard to implement directly as p 0 (y 0:T ) is intractable Could use p (x 0:T, y 0:T )= (x 0 ) T Q k=1 f (x k x k Q 1 ) T g (y k x k ) to k=0 design sampler targetting p (x 0:T, y 0:T ) Approach is usually inefficient: but mixing could deteriote rapidly with T path in x0:t is strongly correlated difficult to find useful hierarchical structure or conditional independencies.
37 Metropolis Hastings for HMMs Take an approach pseudo-marginal approach (Andrieu & Roberts 2009) choose appropriate auxiliary variables. Consider instead sampling from p (x 0:T, y 0:T ) and then integrating out x 0:T ideal marginal Metropolis Sampler marginalising x0:t means running a MCMC chain targetting p (x 0:T, y 0:T ) and using only generated -s for Monte Carlo approximations.
38 deal Marginal Metropolis-Hastings sampler The ideal MMH sampler would utilize the following proposal density: q x 0 0:T, 0 (x 0:T, ) = q 0 p x 0 0:T y 0:T, 0 (5) The acceptance probability is 1 ^ p (x 0 0:T, 0 y 0:T ) q ((x 0:T, ) (x 0 0:T, 0 )) p (x 0:T, y 0:T ) q x 0 0:T, 0 (x 0:T, ) =1 ^ p 0 (y 0:T ) p ( 0 ) q( 0 ) p (y 0:T ) p ( ) q( 0 ) =1 ^ Z 0 T p( 0 )q( 0 ) Z T p( )q( 0 ).
39 Marginal Metropolis-Hastings sampler We cannot sample exactly from p (x 0:T, y 0:T ) and we cannot compute the terms Z T and Z 0 T. AsamplerwithparticleapproximationsforZ T and ZT 0 has the same marginal as an ideal PMMH sampler. it is pseudo marginal sampler targetting p x 1 0,...,x N 0, x 1 1,...,x N n, O 1 (1),...,O 1 (N),...,O n (N), x 0:T, y 0:T all the variables used to construct SMC algorithm n o Xn, i O n (i) N T can be included together with X i=1 0:T as n=1 auxiliary variables and then integrated out. validity of algorithm based on unbiasedness of likelihood E N [ˆp 0 (y 0:T )] = p 0 (y 0:T ) Andrieu, Doucet and Holenstein 2010 particle MCMC paper
40 Particle Marginal Metropolis-Hastings (PMMH) sampler At iteration k = 0, Set (0) p( ). Run an SMC algorithm targeting p (x 0:T, y 0:T ),sample X 0:T (0) bp (dx 0:T y 0:T, (0)), and compute estimate b Z T ( (0)) At iteration k 1 Sample a proposal 0 q ( (k 1)). Run an SMC algorithm targeting p (x 0:T, 0 y 0:T ),samplex 0 0:T bp (dx 0:T y 0:T, 0 ), and compute estimate b Z T ( 0 ). Set (k) = 0, X 0:T (k) =X0:T 0, with probability 1 ^ bz T ( 0 ) p( 0 )q( (k 1) 0 ) bz T ( (k 1))p( (k 1))q( 0 (k 1)), otherwise set (k) = (k 1), X 0:T (k) =X 0:T (k 1).
41 Particle Marginal Metropolis-Hastings (PMMH) sampler The remarkable feature of this algorithm is that the invariant distribution of the Markov chain {X 0:T (k), (k)} is p (x 0:T, y 0:T ) whatever being N. SMC approximations do not introduce any bias. minimal tuning required compared to usual MCMC. The higher N the better the mixing properties of the algorithm. tradeoff with added computational cost could be balanced Under favorable mixing assumptions the variance of the acceptance rate of the PMMH sampler is proportional to T /N N should roughly increase linearly with T, so computational cost O(T 2 ) this can be potentially relaxed
42 Online Bayesian estimation ntroducing the extended state X n =(X n, n ) with initial density p ( 0 ) µ 0 (x 0 ) The transition density is i.e. n = n 1. f n (x n x n 1 ) n 1 ( n ) Applying a standard SMC algorithm to the Markov process {X n } n 0 : parameter space would only be explored at the initialization of the algorithm. successive resampling steps, after a certain time n, the approximation bp (d y 0:n ) will only contain a single unique value for. implicitly requires having to approximate p (i) (y 0:n ) for all the particles (i) approximating p ( y 0:n ), hence we expect estimates whose variance will increase at least linearly with n;
43 Online Bayesian estimation Pragmatic solutions: use artificial dynamics (Liu and West 2001, Hurzeler and Kunsch 2001), simple example n = n 1 + n with n being zero mean noise with small variance can tune variance from the particles also can use fixed lag approximations (Polson et al 2008) stop resampling before n L
44 Online Bayesian estimation Resample Move (Gillks and Berzuini 2001): use an MCMC kernel with invariant density p (x 0:n, y 0:n ),i.e. X (i) 0:n, (i) n K n, X i 0:n, i n where by construction K n satisfies Z p x0:n, 0 0 y 0:n = p (x 0:n, y 0:n ) K n x0:n, 0 0 x 0:n, d (x 0:n, ). n practice set X (i) 0:n L = X i 0:n L for some integer L 1 and only sample (i) n and possibly X (i) n L+1:n
45 Resample Move some cases we can use Gibbs step to update the parameter values K n x 0 0:n, 0 x 0:n, = x0:n x 0 0:n p( 0 x 0:n, y 0:n ), where p ( y 0:n, x 0:n )=p( s n (x 0:n, y 0:n )) with s n (x 0:n, y 0:n ) fixed dimension sufficient statistic. With some variation this has appeared many times: Andrieu et al 1999, Fearnhead 2002, Storvik 2002, Johannes and Polson Elegant, but still not robust since it relies on SMC approximations of p(s n (x 0:n, y 0:n ) y 0:n ), and for fixed N, error increases with n. issue is path degeneracy Unsuitable for high dimensions (> 5 10)
46 Numerical example We will use again X n = X n 1 + W n, Y n = X n + V n (6) where W n, V n iid N(0, 1).
47 Numerical example: on-line inference pdf, n=5000 pdf, n=4000 pdf, n=3000 pdf, n=2000 pdf, n= σ y ρ Figure: Particle method with MCMC, =(, 2 );
48 Numerical example: on-line inference Particle method with MCMC Particle Gibbs σ ρ Figure: Estimated marginal posterior densities for =(, 2 ) with T = 10 3 over 50 runs (black-dotted) versus ground truth (green). Top: Particle method with MCMC, N = Bottom: Particle Gibbs with 3000 iterations and N = 50.
49 Likelihood estimation methods with particle filtering Some algorithms Likelihood methods optimisation based gradient based expectation maximisation offline or online we will focus on offline methods only sketch on-line ones to give very basic idea
50 Maximum Likelihood based methods Off-line case: Estimate of as the maximizing argument of the marginal likelihood of the observed data: b = arg max 2 l T ( ) (7) where Online case: `T ( ) =log p (y 0:T ). (8) use a recursive method let n be the estimate of the model parameter after n 1 observations update the estimate to n+1 after receiving the new data y n.
51 Offline Maximum Likelihood based methods Off-line case: Estimate of as: b = arg max 2 ˆl T ( ) (9) where ˆ`T ( ) = \ log p (y 0:T ). Can use direct optimisation grid on, BFGS, or other popular optimisation methods is difficult due to variance of ˆp (y 0:T )
52 On the Monte Carlo variance of p (y 0:T ) Recall, SMC results in unbiased estimation of the marginal likelihood E N [ˆp (y 0:T )] = p (y 0:T ) Loosely speaking ˆp (y 0:T )=p (y 0:T )+V with V some non-trivial zero mean noise depending on T, N and model. recall bp (y 0:n ) has a relative (non-asymptotic) variance that increases linearly with n The monte carlo variability is quite an issue for finding maximum over
53 Approximating log p (y 0:T ) Note that E N [ˆp (y 0:T )] = p (y 0:T ) implies that E N [log ˆp (y 0:T )] 6= log p (y 0:T ) So log ˆp (y 0:T ) is a biased estimator. Can we correct for the bias?
54 Approximating log p (y 0:T ) Can use bias correction based on Taylor series log(z) =log Z Z 0 (Z Z 0 ) Let Z 0 = E[Z] then ignoring higher order terms 1 2Z 02 (Z Z 0 ) 2 + O(Z 3 ) E [log(z)] = log E[Z] 1 2E[Z] 2 Var[Z] What we have is Z = b Z =ˆp (y 0:T ) and Z 0 = p (y 0:T ) E [log ˆp (y 0:T )] = log p (y 0:T ) Var [ˆp (y 0:T )] 2p (y 0:T ) 2
55 Approximating log p (y 0:T ) Note from slides 1 or 3: Var [ˆp (y 0:T )] p (y 0:T ) 2 Z N p (y 0:T ) 2 Z N p (y 0:T ) 2 Z N q(x 0:T )p(x 0:T y 0:T )dx 0:T 1 w(x 0:T )p(x 0:T y 0:T )dx 0:T 1 T Y n=0 w n (x n 1:n )! p(x 0:T y 0:T )dx 0:T 1! Lets say Ŵ being the particle approximation of R Q T n=0 w n(x n 1:n ) p(x 0:T y 0:T )dx 0:T
56 Approximating log p (y 0:T ) We get then So can use E [log ˆp (y 0:T )] = log ˆp (y 0:T ) (Ŵ 1) 2N log \ p (y 0:T )=log ˆp (y 0:T )+Ŵ 1 2N as a bias reduced estimator for l T
57 Optimising log p (y 0:T ) w.r.t Still ˆ`T ( ) = \ log p (y 0:T ) will exhibit quite a bit of variance This can make finding maximum difficult Potential remedies: smooth the approximation as a function of use a different resampling scheme (Pitt 02, Lee 10) try to reduce the variance with multiple runs
58 Expectation Maximisation Expectation Maximization (EM) algorithm is a very popular alternative procedure for maximizing `T ( ). At iteration k + 1, we set k+1 = arg max Q( k, ) (10) where Z Q( k, )= log p (x 0:T, y 0:T ) p k (x 0:T y 0:T )dx 0:T. (11) The sequence {`T ( k )} k non-decreasing. 0 generated by this algorithm is
59 Expectation Maximisation n particular if p (x 0:T, y 0:T ) belongs to the exponential family, then the EM consists of computing a n s -dimensional summary statistic like Sn the maximizing argument of Q( k, ) can be characterized explicitly through a suitable function :R ns!, i.e. k+1 = S k T Particle implementation consists of computing S k n. (12)
60 Additive functionals S n Sn is an additive functional Z " # nx Sn = s k (x k, x k 1 ) p (x 0:n y 0:n ) dx 0:n, (13) k=0 Theory tells that the asymptotic variance of the SMC estimate Z " # nx cs n = s k (x k, x k 1 ) bp (dx 0:n y 0:n ), (14) satisfies k=0 V cs n even with exponential filter stability. D n 2 N. (15) This motivates the use of dedicated smoothing algorithms
61 Gradient ascent The log-likelihood may be maximized with the following steepest ascent algorithm: at iteration k + 1 k+1 = k + k+1 r `T ( ) = k, (16) { k } k 1 needs to satisfy P k k = 1 and P k could also use Hessian but omitted for simplicity 2 k < 1. To obtain the score vector r `T ( ) we can use Fisher s identity Fisher identity Z r log p (y 0:n )= r log p (x 0:n, y 0:n ) p (x 0:n y 0:n ) dx 0:n The latter is of the form of S n again.
62 Gradient ascent We have ny r log p (x 0:n, y 0:n ) = r log f (x p x p 1 ) g (y p x p ) = Define: p=0 nx (r log f (x p x p 1 )+rlog g (y p x p )) p=0 s p (x p 1:p )=rlog f (x p x p 1 )+rlog g (y p x p ). r log p (y 0:n ) is of the form of Sn again.
63 Smoothing algorithms We are essentially interested in designing better particle approximations for {p (x n y 0:T )} T n=0 Some popular approaches fixed lag smoothing forward filtering backward sampling forward filtering backward smoothing
64 Fixed lag smoothing For state-space models with good forgetting properties if L large enough then p (x 0:n y 0:T ) p x 0:n y 0:(n+L)^T observations collected at times k > n + L do not bring any significant additional information about X 0:n. Fixed lag approximation (Kitagawa & Sato 2001): do not resample the components X i 0:n of the particles X i 0:k obtained by particle filtering at times k > n + L. Could work in practice, but method is asymptotically biased and it might be hard to tune L.
65 Forward-Backward Smoothing using sampling Backward interpretation The joint smoothing distribution p (x 0:T y 0:T ) can be expressed as a function of the filtering distributions {p (x n y 0:n )} T n=0 as follows TY 1 p (x 0:T y 0:T )=p (x T y 0:T ) p (x n y 0:n, x n+1 ) (17) where n=0 p (x n y 0:n, x n+1 )= f (x n+1 x n ) p (x n y 0:n ). (18) p (x n+1 y 0:n )
66 Particle mplementation Forward Filtering Backward Sampling (FFBSa) : run a particle filter from time n = 0toT, storing the approximate filtering distributions {bp (dx n y 0:n )} T n=0,i. Sample X T bp (dx T y 0:T ) and for n = T 1, T 2,...,0sample X n bp (dx n y 0:n, X n+1 ) where this distribution is obtained by substituting bp (dx n y 0:n ) for p (dx n y 0:n ) in (18): bp (dx n y 0:n, X n+1 )= P N i=1 W i nf (X n+1 X i n) X i (dx n n) P N i=1 W. (19) nf i (X n+1 Xn) i
67 Forward-Backward Smoothing A backward in time recursion for {p (x n y 0:T )} T n=0 follows by integrating out x 0:n 1 and x n+1:t in (17) while applying (18): Z p (x n y 0:T ) = p (x n, x n+1 y 0:T ) dx n+1 Z = p (x n y 0:n, x n+1 ) p (x n+1 y 0:T ) dx n+1 = Z f (x n+1 x n ) p (x n y 0:n ) p (x n+1 y 0:T ) dx n+1. p (x n+1 y 0:n )
68 Forward-Backward Smoothing So the backward in time recursion for {p (x n y 0:T )} T n=0 is: Z f (x n+1 x n ) p (x n+1 y 0:T ) p (x n y 0:T )=p (x n y 0:n ) dx n+1. p (x n+1 y 0:n ) (20) So {p (x n y 0:n )} T n=0 can be used in a backward pass to obtain {p (x n y 0:T )} T n=0 and {p (x n y 0:n, x n+1 )} T 1 n=0.
69 Particle mplementation Forward Filtering Backward Smoothing (FFBSm) : Assume we have an approximation p (dx n+1 y 0:T )= NX i=1 W i n+1 T X i n+1 (dx n+1) where W T i T = W T i the approximation then by using (20) and (19), we obtain p (dx n y 0:T )= NX W n T i Xn i (dx n) i=1 with W i n T = W i n NX W j n+1 T f X j n+1 X n i P. (21) N l=1 W nf l X j n+1 X n l j=1
70 Particle mplementation Forward Filtering Backward Smoothing (FFBSm) : Run a particle filter from time n = 0toT, storing the approximate filtering distributions {bp (dx n y 0:n )} T n=0, nitialise backward pass: W T i T = W T i for n = T 1, T 2,...,0computeweights W i n T = W i n NX j=1 and obtain the approximation W j n+1 T f P N l=1 W nf l X j n+1 X n l X j n+1 X n i. (22) p (dx n y 0:T )= NX W n T i Xn i (dx n) i=1
71 Particle mplementation Lets say we have performed Forward Filtering Backward Smoothing (FFBSm) : Assume we have an approximation p (dx n+1 y 0:T )= NX i=1 W i n+1 T and are interested to obtain the approximation p (dx n, dx n+1 y 0:T )= NX i=1 W i n,n+1 T with Xn a(i) being the ancestor of Xn+1 i pair Xn a(i), Xn+1 i by W i n,n+1 T = W a(i) n Wn+1 T i f P N l=1 W nf l X i n+1 (dx n+1) Xn a(i) (dx,xn+1 i n ) then we can weight the X i n+1 a(i) Xn X i n+1 X l n. (23)
72 Discussion n both previous slides the computational cost is prop. to N 2 T operations in total Assuming expontential forgetting: S n based on the fixed-lag approximation has an asymptotic variance with rate n/n with a non-vanishing (as N!1)bias proportional to n and a constant decreasing exponentially fast with L. The asymptotic bias and variance of the particle estimate of Sn computed using the forward-backward procedures satisfy: E bs n Sn n apple F N, V bs n n apple H N. (24) but note this is using algorithms at cost of N 2 T operations
73 Discussion To compute b S n one can implement with cost N 2 T Then 1. simple particle filter with N 2 particles 2. FFBS particle filter with N particles Case 1: suffers from path degeneracy bias of order T /N 2 variance at least of order T 2 /N 2 Case 2: more expensive bias of order T /N variance of order T /N
74 On-line methods On-line/ Forwards only extensions for EM and gradient methods do exist. Poyiadjis, Doucet, Singh 11 Cappe 09 Del Moral, Doucet, Singh 09 Understanding them is beyond this course Next couple of slides are for general information & interest
75 On-line methods On-line extensions for EM and gradient methods do exist. For gradient method: n+1 = n + n+1 r log p 0:n (y n y 0:n 1 ) (25) where r log p 0:n (y n y 0:n 1 ) is defined as r log p 0:n (y n y 0:n 1 )=rlog p 0:n (y 0:n ) r log p 0:n 1 (y 0:n 1 ), (26)
76 On-line methods The notation r log p 0:n (y 0:n ) corresponds to a time-varying score which is computed with a filter using the parameter p at time p. Using Fisher s identity to compute this time-varying score, then we have for 1 apple p apple n s p (x p 1:p )=rlog f (x p x p 1 ) = p + r log g (y p x p ) = p. (27)
77 On-line methods n offline EM maximisation can be rewritten as k+1 = T 1 S k T. (28) So for on-line EM can use Robbins-Monro averaging R S 0:n = n+1 sn (x n 1:n ) p 0:n (x n 1, x n y 0:n )dx n 1:n +(1 n+1) P! n nq k=0 (1 i) k+1 i=k+2 R s k (x k 1:k ) p 0:k (x k 1:k y 0:k )dx k 1:k, (29) Then use standard maximization step is used as in the batch version: n+1 = (S 0:n ). There is also a forward only implementation of FFBSm (Del Moral et. al. 2009)
78 Discussion On-line and offline parameter estimation drops down to computing smoothed integrals of additive functions Can either use standard algorithm (with O(N) cost) or dedicated smoothing algorithms (with O(N 2 ) cost) With the exception of on-line gradient methods when the same computational cost is used: the first choice suffers from the variance the second suffers from the bias both give similar MSE
79 Numerical example 200 O(N) method 200 O(N 2 ) method Bi as ( Ŝn ) x x Va r ( Ŝn n ) x x MS E( Ŝn n ) time n x time n x 10 4 Figure: Estimating smoothed additive functionals: Empirical bias of the estimate of S n (top panel), empirical variance (middle panel) and mean squared error (bottom panel) for the estimate of S n / p n.
80 Numerical example 0.81 ρ 1.05 τ O(N) method O(N 2 ) method Figure: EM: Boxplots of ˆ n for n algorithms using 100 realizations of the
81 Homework 5 For the following scalar model X n = X n 1 + V n, Y n = X n + W n, (30) where W n, V iid n N(0, 1), X 0 N(0, 1). Synthesiseadata-setsy 0:T for T = 1000, = 0.8, = 1 with varying = 0.01, 0.1, 1. Store the real state trajectory x0:t for future comparisons in each case. mplement the a particle filter of your choice. Using appropriate plots, compare the approximations of the mean and variance of p(x n y 0:T ) using a standard particle filter a particle filter with fixed lag smoothing a particle filter with backward smoothing Comment on the computational cost in each case. (*) n each case compare the approximation of p(x 0:n y 0:n) using plots number of unique particles at certain lags or illustration of sampled paths at different times n results showing Monte Carlo bias and variance for smoothed additive functionals.
82 Coursework instructions Coursework option 1: particle methods Pick a HMM of your choice so that it is possible the state and observation to be multidimensional with dimensions d x and d y resp. Using some known values for the static parameters implement a bootstrap particle filter and a more advanced PF of your choice generate plots and tables to compare the two methods for varying N, d x and d y. assess methods based on accuracy & variance of normalising constant and integrals like posterior (filter) mean, variance, etc. Consider a parameter estimation method of your choice (particle MCMC, gradients, EM) implement it and describe results for varying N, d x and d y using plots and tables. n your answers provide also short comments.
83 Coursework instructions Coursework option 2: f your research is related to computational statistics, or uses MCMC: 1. present your model of interest and problem at hand 2. the inferential method for problem (e.g. Bayesian inference, optimisation etc.) and the challenges involved, 3. simulation method (e.g. MCMC, S, SMC), 4. numerical results, 5. a discussion on how material in this course can be used for extensions Page limit: pages, recommended length around 8 pages, use appendices if you need to go beyond page limits Submit by to n.kantas at imperial.ac.uk using subject: LTCC coursework submission Deadline: 5 Dec 18 (a month)
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationSequential Monte Carlo Samplers for Applications in High Dimensions
Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationarxiv: v1 [stat.co] 1 Jun 2015
arxiv:1506.00570v1 [stat.co] 1 Jun 2015 Towards automatic calibration of the number of state particles within the SMC 2 algorithm N. Chopin J. Ridgway M. Gerber O. Papaspiliopoulos CREST-ENSAE, Malakoff,
More informationSequential Monte Carlo methods for system identification
Technical report arxiv:1503.06058v3 [stat.co] 10 Mar 2016 Sequential Monte Carlo methods for system identification Thomas B. Schön, Fredrik Lindsten, Johan Dahlin, Johan Wågberg, Christian A. Naesseth,
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationSMC 2 : an efficient algorithm for sequential analysis of state-space models
SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra
More informationAn Brief Overview of Particle Filtering
1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationSequential Monte Carlo Methods in High Dimensions
Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationThe Hierarchical Particle Filter
and Arnaud Doucet http://go.warwick.ac.uk/amjohansen/talks MCMSki V Lenzerheide 7th January 2016 Context & Outline Filtering in State-Space Models: SIR Particle Filters [GSS93] Block-Sampling Particle
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSequential Monte Carlo Methods
University of Pennsylvania Bradley Visitor Lectures October 23, 2017 Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationIntroduction. log p θ (y k y 1:k 1 ), k=1
ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationMonte Carlo Approximation of Monte Carlo Filters
Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space
More informationMultilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems
Jonas Latz 1 Multilevel Sequential 2 Monte Carlo for Bayesian Inverse Problems Jonas Latz Technische Universität München Fakultät für Mathematik Lehrstuhl für Numerische Mathematik jonas.latz@tum.de November
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationTowards a Bayesian model for Cyber Security
Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationInferring biological dynamics Iterated filtering (IF)
Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationSequential Monte Carlo Methods (for DSGE Models)
Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationAn efficient stochastic approximation EM algorithm using conditional particle filters
An efficient stochastic approximation EM algorithm using conditional particle filters Fredrik Lindsten Linköping University Post Print N.B.: When citing this work, cite the original article. Original Publication:
More informationList of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016
List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationParticle Filters: Convergence Results and High Dimensions
Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationLearning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution
Learning of state-space models with highly informative observations: a tempered Sequential Monte Carlo solution Andreas Svensson, Thomas B. Schön, and Fredrik Lindsten Department of Information Technology,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationDivide-and-Conquer Sequential Monte Carlo
Divide-and-Conquer Joint work with: John Aston, Alexandre Bouchard-Côté, Brent Kirkpatrick, Fredrik Lindsten, Christian Næsseth, Thomas Schön University of Warwick a.m.johansen@warwick.ac.uk http://go.warwick.ac.uk/amjohansen/talks/
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More informationSequential Monte Carlo samplers
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 411 436 Sequential Monte Carlo samplers Pierre Del Moral, Université Nice Sophia Antipolis, France Arnaud Doucet University of British Columbia, Vancouver,
More informationPseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory
Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine
More informationAn Adaptive Sequential Monte Carlo Sampler
Bayesian Analysis (2013) 8, Number 2, pp. 411 438 An Adaptive Sequential Monte Carlo Sampler Paul Fearnhead * and Benjamin M. Taylor Abstract. Sequential Monte Carlo (SMC) methods are not only a popular
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationAN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER. Qi Cheng and Pascal Bondon. CNRS UMR 8506, Université Paris XI, France.
AN EFFICIENT TWO-STAGE SAMPLING METHOD IN PARTICLE FILTER Qi Cheng and Pascal Bondon CNRS UMR 8506, Université Paris XI, France. August 27, 2011 Abstract We present a modified bootstrap filter to draw
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationNegative Association, Ordering and Convergence of Resampling Methods
Negative Association, Ordering and Convergence of Resampling Methods Nicolas Chopin ENSAE, Paristech (Joint work with Mathieu Gerber and Nick Whiteley, University of Bristol) Resampling schemes: Informal
More informationAn ABC interpretation of the multiple auxiliary variable method
School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationParticle Learning and Smoothing
Particle Learning and Smoothing Carlos Carvalho, Michael Johannes, Hedibert Lopes and Nicholas Polson This version: September 2009 First draft: December 2007 Abstract In this paper we develop particle
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationParticle Metropolis-adjusted Langevin algorithms
Particle Metropolis-adjusted Langevin algorithms Christopher Nemeth, Chris Sherlock and Paul Fearnhead arxiv:1412.7299v3 [stat.me] 27 May 2016 Department of Mathematics and Statistics, Lancaster University,
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationParticle Learning for Sequential Bayesian Computation Rejoinder
BAYESIAN STATISTICS 9, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 20 Particle Learning for Sequential Bayesian
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling
More informationSurveying the Characteristics of Population Monte Carlo
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More information9 Multi-Model State Estimation
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationTSRT14: Sensor Fusion Lecture 8
TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,
More informationPackage RcppSMC. March 18, 2018
Type Package Title Rcpp Bindings for Sequential Monte Carlo Version 0.2.1 Date 2018-03-18 Package RcppSMC March 18, 2018 Author Dirk Eddelbuettel, Adam M. Johansen and Leah F. South Maintainer Dirk Eddelbuettel
More informationMarkov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationParticle Metropolis-Hastings using gradient and Hessian information
Particle Metropolis-Hastings using gradient and Hessian information Johan Dahlin, Fredrik Lindsten and Thomas B. Schön September 19, 2014 Abstract Particle Metropolis-Hastings (PMH) allows for Bayesian
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationQuantifying Uncertainty
Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationIntroduction to Particle Filters for Data Assimilation
Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,
More informationInexact approximations for doubly and triply intractable problems
Inexact approximations for doubly and triply intractable problems March 27th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers of) interacting
More informationAdaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More information