cappe/

Size: px
Start display at page:

Download "cappe/"

Transcription

1 Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Particle Methods for Hidden Markov Models Olivier Cappé CNRS Lab. Trait. Commun. Inform. & ENST département Trait. Signal Image 46 rue Barrault, Paris cedex 3, France mailto:cappe@tsi.enst.fr cappe/ These lectures are based on the book Inference in Hidden Markov Models written with E. Moulines and T. Rydén (Springer-Verlag, to appear in 2005).

2 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

3 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? A hidden Markov model (abbreviated HMM) is a bivariate discrete-time process {X k, Y k } k 0, where {X k } k 0 is an homogeneous Markov chain and, conditional on {X k } k 0, {Y k } k 0 is a sequence of independent random variables such that the conditional distribution of Y k only depends on X k. The underlying Markov chain {X k } k 0 is called the regime, or state. We denote the state space of the Markov chain {X k } k 0 by X and the set in which {Y k } k 0 takes its values by Y.

4 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? The dependence structure of an HMM can be represented by a graphical model as in X k X k+ Y k Y k+ Graphical representation of the dependence structure of a hidden Markov model, where {Y k } k 0 is the observable process and {X k } k 0 is the hidden chain.

5 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? Of the two processes {X k } k 0 and {Y k } k 0, only {Y k } k 0 is actually observed; the Markov chain {X k } k 0 is unobserved, or hidden. Hence, inference on the parameters of the model must be achieved using {Y k } k 0 only. The other topic of interest is of course inference on the unobserved {X k } k 0 : given a model and some observations, can we estimate the value of the unobservable sequence of states? These two major statistical objectives are indeed strongly connected!

6 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? the Y -variables are conditionally independent given {X k } k 0, but {Y k } k 0 is not an independent sequence because of the dependence in {X k } k 0. {Y k } k 0 is not a Markov chain either: the joint process {X k, Y k } k 0 is a Markov chain, but {Y k } k 0 does not have the loss of memory property: the conditional distribution of Y k given Y 0,..., Y k does depend on all the conditioning variables.

7 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? There are numerous examples: where both X and Y are finite coding, digital communications, bioinformatics where X is finite but Y is not speech recognition, ion channel modelling (Gaussian HMMs) where both X and Y are continuous linear state models, non-linear state space models (ex. stochastic volatility model, bearings-only tracking) where Y is continuous and X = C W with C finite and W continuous conditionally Gaussian linear state space models (AKA jump Markov models) non-hmms that behave similarly switching autoregressions, Markov switching models Except for stability properties and theory of MLE which we don t consider today...

8 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

9 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Hidden Markov Model Notations for HMMs. {X k } k 0 is a Markov chain on X with initial distribution ν and transition kernel Q 2. {Y k } k 0 is such that for f 0,..., f n F b (Y), [ n ] n E f k (Y k ) X 0:n = k=0 k=0 Y f k (y) g(x k, y)µ(dy), where X 0:n denotes the collection X 0,..., X n and g is a transition density function (with respect to µ) sometimes referred to as the conditional likelihood function. We will also use the simplified notation g k (x) def = g(x, Y k )

10 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Some More Notations: Usual Kernel Operations Q(x, A) = Q(x, f) = A Q(x, dx ) P[X k+ A X k ] = Q(X k, A) Q(x, dx )f(x ) E[f(X k+ ) X k ] = Q(X k, f) νq(f) = ν(dx)q(x, dx )f(x ) (also denoted (Qf)(x)) Expectation after one step, starting under ν Q n (x 0, f) = Q n (x 0, Qf) Expectation after n steps, starting under δ x0 Markov transition kernels are such that Q(x, X) =. Sometimes unnormalized transition kernels, such that Q(x, A) 0 for all A X and 0 < Q(x, X) <, are also used.

11 Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Filtering and Smoothing Recursions To be answered Given a HMM, how to evaluate the conditional distribution of the states X k, given the observations Y 0,..., Y n? We introduce the generic notation φ ν,k:l n to denote the conditional distribution of X k:l given Y 0:n, where ν recalls the dependence with respect to the initial distribution (which will sometimes be omitted). The joint probability of the unobservable states and observations up to index n is such ( that, for any function f F ) b {X Y} n+, E ν [f(x 0, Y 0,..., X n, Y n )] = f(x 0, y 0,..., x n, y n ) n ν(dx 0 )g(x 0, y 0 ) {Q(x k, dx k )g(x k, y k )} µ n (dy 0,..., dy n ). k=

12 Particle Methods for Hidden Markov Models - EPFL, 7 Dec The Likelihood Marginalizing with respect to the unobservable variables X 0,..., X n yields E ν [f(y 0,..., Y n )] = f(y 0,..., y n ) L ν,n (y 0,..., y n ) µ n (dy 0,..., dy n ), ( for f F ) b Y n+, where L ν,n (y 0,..., y n ) = ν(dx 0 )g(x 0, y 0 )Q(x 0, dx )g(x, y ) Q(x n, dx n )g(x n, y n ), is the likelihood of the observations.

13 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Joint Smoothing Distribution By Bayes rule φ ν,0:n n (y 0:n, f) = L ν,n (y 0:n ) f(x 0:n ) n ν(dx 0 )g(x 0, y 0 ) Q(x k, dx k )g(x k, y k ) k= for all functions f F b ( X n+ ). In the following, we always use the implicit conditioning convention, writing φ ν,0:n n (f) = L ν,n n f(x 0:n )ν(dx 0 )g 0 (x 0 ) Q(x k, dx k )g k (x k ) k= where L ν,n = ν(dx 0 )g 0 (x 0 ) n Q(x k, dx k )g k (x k ). k=

14 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Recursive Smoothing Formula Comparing the expressions corresponding to n and n + gives the following update equation for the joint smoothing distribution: φ ν,0:n+ n+ (f n+ ) = ( Ln+ L n ) f n+ (x 0:n+ ) φ ν,0:n n (dx 0,..., dx n, dx n ) Q(x n, dx n+ ) g n+ (x n+ ) for functions f n+ F b ( X n+2 ). = Very simple structure but involves the normalization factor c n+ def = L n+ /L n which is not computable, except in simple cases such as when X finite This claim is not obvious (see next slides...)

15 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Filtering Recursion Marginalizing with respect to all variables but x n and x n+ gives the (marginal) filtering recursion: c ν,n+ = φ ν,n n (dx)q(x, dx )g n+ (x ), φ ν,n+ n+ (f) = c ν,n+ f(x) φ ν,n (dx)q(x, dx )g n+ (x ), with initial condition c ν,0 = ν(g 0 ), φ ν,0 0 (f) = c ν,0 f(x)g 0 (x) ν(dx). Remark When X is finite (speech recognition, bioinformatics) the above is known as the normalized forward recursion (of forward-backward); the specialization of this relation to Gaussian linear state-space model is known as Kalman filtering.

16 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Prediction and Filtering Updates It is sometimes convenient to break the previous recursion in two steps: φ ν,n+ n = φ ν,n n Q. c ν,n+ = φ ν,n+ n (g n+ ), φ ν,n+ n+ (f) = c ν,n+ f(x) g n+ (x)φ ν,n+ n (dx). prediction filtering Computation of the Log-Likelihood l ν,n def = log L ν,n = n log φ ν,k k (g k ). This is non-trivial: we have replaced an n + dimensional integral by a product of n + k=0 integrals on X! In finite state space HMMs, the filtering recursion makes it possible to evaluate the (log-)likelihood in O{(n + ) Card 2 (X)} operations.

17 Particle Methods for Hidden Markov Models - EPFL, 7 Dec The recursion Recap: Filtering and Smoothing φ ν,n+ n = φ ν,n n Q, c ν,n+ = φ ν,n+ n (g n+ ), φ ν,n+ n+ (f) = c ν,n+ f(x) g n+ (x)φ ν,n+ n (dx), with φ ν,0 def = ν computes the filtering and predictive distributions recursively, making it possible (i) to compute the likelihood L ν,n+ and, potentially, (ii) the joint smoothing distribution since φ 0:n+ n+ (f n+ ) = c ν,n+ f n+ (x 0:n+ ) φ 0:n n (dx 0,..., dx n, dx n ) Q(x n, dx n+ ) g n+ (x n+ ).

18 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Appendix: Finite-Dimensional Recursive Smoothing for a Sum In particular, if f n (x 0:n ) = n k=0 s(x k), define the signed measure τ ν,n by ( n ) τ ν,n (f) = f(x n ) s(x k ) φ ν,0:n n (dx 0,..., dx n ), k=0 such that τ ν,n (X) = E ν [ n k=0 s(x k) Y 0:n ]. Then, τ ν,n+ (f) Z = c ν,n+ = Z Z f(x n+ ) n+ X k=0 s(x k ) φ ν,0:n n (dx 0,..., dx n, dx n ) Q(x n, dx n+ ) g n+ (x n+ ) f(x n+ ) Z φ ν,n+ n+ (dx n+ ) s(x n+ ) + c n+! «τ ν,n (dx n ) Q(x n, dx n+ ) g n+ (x n+ ).

19 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

20 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Monte-Carlo Integration Objective Given a probability measure µ, how to evaluate numerically µ(f) = X µ(dx)f(x) for arbitrary µ-integrable functions f? The Monte Carlo Answer. Draw an independent sample ξ,..., ξ N from the probability measure µ. 2. Compute the sample average N N i= h(ξ i ). This technique is applicable only when direct sampling from the distribution µ is feasible.

21 Particle Methods for Hidden Markov Models - EPFL, 7 Dec (Unnormalized) Importance Sampling: General Principle It is also possible to sample from an instrumental (or importance) distribution ν, applying a change-of-measure formula to account for the fact that the instrumental distribution differs from the target distribution µ. More formally, if the target probability measure µ is absolutely continuous with respect to to the instrumental probability measure ν, µ ν. For any µ-integrable function f µ(f) = f(x) µ(dx) = f(x) dµ dν (x)ν(dx), where dµ dν is the Radon-Nikodym derivative of µ with respect to ν, called the importance function (or importance ratio) in the context of importance sampling.

22 Particle Methods for Hidden Markov Models - EPFL, 7 Dec (Unnormalized) Importance Sampling: the Algorithm Sampling Draw an independent sample ξ,..., ξ N from the distribution ν. Weighting Compute the importance weights for i =,..., N. Weigthed Monte Carlo Approximation ω i = dµ dν (ξi ), µ IS ν,n (f) = N N ω i f(ξ i ) i=

23 Particle Methods for Hidden Markov Models - EPFL, 7 Dec (Unnormalized) Importance Sampling: Large Sample Performance Strong law of large numbers The sequence µ IS ν,n N. (f) converges to µ(f), almost surely as Central limit theorem If f is a real-valued measurable function satisfying ν ( ( + f 2 ) ( ) ) 2 dµ dν = µ ( ( + f 2 ) dµ ) dν µ IS ν,n (f) is asymptotically normal N( µ IS ν,n (f) µ(f)) where ( ) dµ Var ν dν f = ν ( { f dµ } ) 2 dν µ(f). <, ( ( )) D N 0, Var dµ ν dν f Deviations inequalities (exponential, L p ) or more sophisticated empirical process results are also available. = Choosing ν such that dµ/dν stays as small as possible is very important in practice.

24 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Importance Sampling In situations where dµ dν is known only up to a scaling factor we can still use the importance sampling estimator, just changing the normalization factor µ IS ν,n (f) = N i= f(ξi ) dµ dν (ξi ) N, dµ i= dν (ξi ) The (self-normalized) importance sampling estimator (sometimes also called Bayesian sampling estimator) is defined as a ratio of the unnormalized importance sampling estimators µ IS By the Strong Law of Large Numbers showing that µ IS ν,n ν,n (f) = µis ν,n (f) µ IS ν,n (). µ IS ν,n (f) a.s. µ(f) µ IS ν,n () a.s. (f) is a strongly consistent estimator of µ(f).

25 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Importance Sampling (contd.) Assuming in addition that f is real-valued and satisfies ν ( ( + f 2 ) ( ) ) 2 dµ dν = µ ( ( + f 2 ) dµ ) dν <, N( µ IS ν,n (f) µ(f)) ( dµ σ 2 (ν, f) = Var ν {f µ(f)} dν D N ( 0, σ 2 (ν, h) ), ) ( (dµ ) ) 2 = ν (f µ(f)) 2. dν The estimator is errorless for constant functions and its performance is clearly dependent on the fact that dµ/dν stays small.

26 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sampling Importance Resampling (SIR) While importance sampling is originally designed to overcome difficulties with direct sampling from µ when approximating integrals like µ(f) it can also be used for approximate sampling from the distribution µ. The sampling importance resampling (SIR) method is a two-stages method: Sampling: Draw an i.i.d. sample ξ,..., ξ M from the instrumental distribution ν. Weighting: Compute the (normalized) importance weights ω i = dµ dν ( ξ i ) for i =,..., M. / M j= dµ dν ( ξ j ) Resampling: Draw, conditionally independently given ( ξ,..., ξ M ), N discrete random variables (I,..., I N ) taking values in the set {,..., M} with probabilities (ω,..., ω M ). Set, for i =,..., N, ξ i = ξ Ii. The set (I,..., I N ) is thus a multinomial trial process. This resampling method is known as multinomial resampling.

27 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sampling Importance Resampling (contd.) The first stage sample ξ,..., ξ M is really distributed under ν. In the resampling operation, the bad points, as measured by dµ/dν, are discarded whereas the good points are selected (and perhaps duplicated) with high probability. TARGET TARGET

28 Particle Methods for Hidden Markov Models - EPFL, 7 Dec SIR: Large Sample Behavior It is not obvious in which sense (ξ,..., ξ N ) is (approximately) a sample from the target distribution µ. Rewriting ˆµ SIR ν,m,n (f) = N N f(ξ i ) = i= it is easily seen that the sample mean ˆµ SIR ν,m,n M i= N i N f( ξ i ), (f) of the SIR sample is, conditionally on the first-stage sample ( ξ,..., ξ M ), equal to the importance sampling estimator µ IS ν,m (f): [ E ˆµ SIR ν,m,n (f) ξ,..., ξ ] M = µ IS ν,m (f). As a consequence the SIR estimator ˆµ SIR ν,m,n (f) is an unbiased estimate of µ(f), but its mean squared error is always larger than that of the importance sampling estimator due to the well-known variance decomposition E [ (ˆµ SIR ν,m,n (f) µ(f)) 2] = E [ (ˆµ SIR ν,m,n (f) µ IS ν,m (f)) 2] + E [ ( µ IS ν,m (f) µ(f)) 2].

29 Particle Methods for Hidden Markov Models - EPFL, 7 Dec SIR: Large Sample Behavior (contd.) Going beyond this elementary result is not trivial because the second stage sample ξ,..., ξ N is no more i.i.d. after resampling due to the normalization of the importance weights. Theorem Assume that µ ν. Let {ξ i } i M be i.i.d. random variables with distribution ν. Then ˆµ SIR ν,m,n M, N. (f) is a (weakly) consistent estimate of µ(f) for µ-integrable functions f as Assume in addition that lim M,N M/N = α for some α and that dµ/dν and fdµ/dν are in L 2 (X, ν). Then ˆµ SIR ν,m,n (f) is asymptotically normal N(ˆµ SIR ν,m,n (f) µ(f)) D N ( 0, σ 2 (f) ) with σ 2 (f) = Var µ (f) }{{} variance of resampling ( dµ + α Var ν dν ) {f µ(f)} } {{ } variance of IS. Analysis of opposite case is possible but less interesting in practice.

30 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Alternative Resampling Schemes There are other resampling schemes that guarantee that E [N i ξ,..., ξ M ] = Nω i for i =,..., N and that have lower conditional variance. ω + ω 2 + ω 3 ω + ω 2 ω + ω 2 + ω 3 ω + ω ω ω Principle of stratified sampling (left) and systematic sampling (right). Note: the latter does not always reduce the conditional variance. Studying their large sample behavior is harder however.

31 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

32 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling The principle of sequential Monte Carlo methods is to use Monte Carlo integration to approximate the filtering recursion in general HMMs (not finite HMMs or GLSSMs). The key remark, which can be traced back to (Handschin & Mayne, 969) and (Handschin, 970), is that the importance sampling method targeting the joint smoothing distribution φ 0:n n can be implemented sequentially, due to the particular structure of φ 0:n n. The corresponding algorithm is known as sequential importance sampling (SIS). The SIS algorithm does reasonably well but is bound to become unreliable for larger values of n (this limitation will be taken care of latter...)

33 Particle Methods for Hidden Markov Models - EPFL, 7 Dec HMM Notations (Repeated) Recall that an hidden Markov model is such that X k+ Q(X k, ) Y k G(X k, ) state equation measurement equation where {X k } k 0 is a Markov chain with transition kernel Q and initial distribution ν G is a transition kernel from (X, X ) to (Y, Y) and there exists a measure µ such that, for all x X, A Y, G(x, A) = A g(x, y)µ(dy). To simplify the mathematical expressions, we use the notation g k to denote the function g(, Y k ), considered as a function of its first argument.

34 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Smoothing (Repeated) The posterior distribution φ 0:n n of the states X0:n given the observations Y 0:n may be computed recursively (in n) according to g0 (x 0 )ν(dx 0 )f(x 0 ) φ 0 0 (f) = g0 (x 0 )ν(dx 0 ) φ 0:n n (f n ) = f n (x 0:n ) φ ν,0:n n (dx 0:n )Tn (x u n, dx n ), where, for k 0, Tn u is the unnormalized transition kernel onto (X, X ) given by ( ) Tk u Lk+ (x, A) = Q(x, dx )g k+ (x ), x X, A X. L k A In this part we omit to indicate the dependence with respect to ν which not essential.

35 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Choice of the Instrumental Distribution Key Remark: Both the simulation from the instrumental distribution and the computation of the importance weights can be carried out sequentially if a, possibly non-homogeneous, Markov chain is used as instrumental distribution. More precisely, Let {R k } k 0 denote a family of Markov transition kernels on (X, X ) and ρ 0 a probability measure on (X, X ). Assume that φ 0 0 ρ 0 and for all k 0 and all x X, T u k (x, ) R k(x, ). The inhomogeneous Markov chain with initial distribution ρ 0 and transition kernels {R k } k 0 defines the following distributions ρ 0:k (f k ) = f k (x 0:k ) ρ 0 (dx 0 ) k l=0 R l (x l, dx l+ ).

36 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Computation of the Importance Function The importance function is then defined as dφ 0:n n (x 0:n ) = dφ n 0 0 (x 0 ) dρ 0:n dρ 0 k=0 which can be computed sequentially in the sense that for k 0. dφ 0:k+ k+ dρ 0:k+ dt u n (x k, ) dr n (x k, ) (x k+), (x 0:k+ ) = dφ 0:k k dρ 0:k (x 0:k ) dt u n (x k, ) dr n (x k, ) (x k+),

37 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling Algorithm Initialization Draw ξ 0,..., ξ N 0 independently from ρ 0 and compute the weights ω i 0 = dφ 0 0 dρ 0 (ξ i 0), for i =,..., N. Recursion For k = 0,... For i =,..., N Draw ξ i k+ conditionally independently from {ξj l, ξm k } l<k, j N,m<i under the distribution R k (ξ i k, ). Update the importance weight according to ω i k+ = ω i k dt u k (ξi k, ) dr k (ξ i k, )(ξi k+). The ratio ω i k+ /ωi k is often referred to as the incremental weight; the points ξi k are called particles; the trajectories ξ i 0:k path particles.

38 Particle Methods for Hidden Markov Models - EPFL, 7 Dec FILT. INSTR. FILT. + One step of the SIS algorithm with just seven particles.

39 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling Approximation At any time index n, the sequential importance sampling estimator of φ 0:n n (f n ) is available as ˆφ IS 0:n n (f n) = N i= f n(ξ i 0:n)ω i n N i= ωi n. Remark If we are just interested in functions f n (x 0:n ) = f(x n ), storing the full trajectories of the particles is not required; each step of the algorithm involves O(N) operations and requires just that N + N dim(x) real numbers be stored. Likewise, for functions of the form f n (x 0:n ) = f k (x n k:n ) only the last k + elements of each path particle ξ0:n i needs to be stored. We will see latter that one may indeed consider more general functions f n as long as they have a specific structure...

40 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Choosing the Importance Kernel: () the Prior Kernel As for non-sequential importance sampling, the performance of SIS depends crucially on the choice of the importance kernel R k (and, to a lesser extent, on that of ρ 0 ). The most obvious solution is to use the prior kernel R n = Q: The instrumental kernel at each iteration mimics the state dynamic, which is usually simple to sample from. The incremental weight dtk u (x, ) dr k (x, ) (x ) = L k L k+ g k+ (x ), (x, x ) X X. does not depend on x X, hence computing the incremental weight simply amounts to evaluating the conditional likelihood function for the new particle positions. Recall that the importance weights need to be evaluated up to a constant only, hence the non-computable factor L k /L k+ may be omitted.

41 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Lack of Robustness of the Prior Kernel The prior kernel is a reasonable option which is computationally very simple and is thus often hard to beat, especially in models where the state is not precisely identified by the observations. It is however very sensitive to the presence of outliers : FILT. FILT. + FILT. +2 Conflict between the prior and the posterior: at time k +, the observation does not agree with the particle approximation of the predictive distribution. After reweighting step, the mass becomes concentrated on a single particle. Due to the multiplicative structure of the importance weight, recovering from this situation is almost often impossible.

42 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Choosing the Importance Kernel: (2) the Optimal Kernel To circumvent the problem one needs to incorporate information both on the state dynamic and on the new observation. Among all possible options, there is only one kernel which is such that the new weight ωk+ i is a deterministic function of the current particle ξk i ; this is the only choice for which the conditional variance of the new weights is equal to zero: Let R k (f) = T k (x, f) def = γ k (x) f(x ) Q(x, dx )g k+ (x ) where γ k (x) def = Q(x, dx )g k+ (x ). X Then dtk u (x, ) dt k (x, ) (x ) = L k L k+ γ k (x), (x, x ) X X. Unfortunately, computing γ k is usually not feasible in models where implementing the filtering recursion is problematic!

43 Particle Methods for Hidden Markov Models - EPFL, 7 Dec The Optimal Kernel is More Robust to Outliers FILT. FILT. FILT. + FILT. +2 FILT. + FILT. +2 The optimal kernel proposes particles in the regions where the filtering density has most of its mass.

44 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Local Approximation of the Optimal Importance Kernel The aim is to find a distribution which resembles sampling from the optimal kernel but for which the incremental weight is computable: Ideally, this distribution should be overdispersed (recall the dµ/dν factor!) but not wildly inaccurate. We can find such a distribution in two steps:. locate the high-density region of the (multivariate) optimal distribution to ensure that our proposal does not entirely miss important regions; 2. create an overdispersed approximation, so that the instrumental distribution dominates the optimal importance distribution. Of course, because we have to repeat the process for each particles, the overall procedure should be reasonably simple.

45 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model Consider the (discrete-time) stochastic volatility model where X k+ = φx k + σu k φ <, Y k = β exp(x k /2)V k,. {U k } k 0 and {V k } k 0 are independent standard Gaussian white noise processes. 2. X 0 N(0, σ 2 /( φ 2 ). In this model, ( q(x, x ) = exp (x φx) 2 ) 2πσ 2 2σ 2, ( g k+ (x ) = exp Y k+ 2 2πβ 2 2β 2 exp( x ) ) 2 x, and the incremental weight γ k (x) is not available in closed form.

46 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) The function x log(q(x, x )g k+ (x )) is (strictly) log-concave and thus unimodal. The mode m k (x) of the optimal transition density is the unique solution of the non-linear equation, σ 2 (x φx) + Y k 2 2β 2 exp( x ) 2 = 0. The solution of this equation can be computed numerically. We use, for instance, as instrumental kernel a t-distribution with η = 5 degrees of freedom, the scale of which being set as the inverse of the negated second-order derivative of x log q(x, x )g k (x ) evaluated at the mode m k (x), which is given by: σ 2 k(x) = ( σ 2 + Y 2 2β 2 k+ exp[ m k(x)] ). The incremental weight may easily be evaluated once m k (x) and σk 2 (x) have been computed (note that it now depends both on x and x ). Recall also that we need to repeat these steps independently for each current particle position x = ξ i k.

47 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) 0.08 Density Time Index State Waterfall representation of the sequence of estimated filtering distribution with actual state (,000 particles).

48 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Weight Degeneracy The normalized importance weights measure the pertinence of each particle: a relatively small importance weight implies that the associated particle is far from the main body of the posterior distribution and contributes poorly to the sequential importance sampling approximation. If there are too many such ineffective particles, the Monte-Carlo approximation becomes highly unreliable.

49 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Weight Degeneracy The normalized importance weights measure the pertinence of each particle: a relatively small importance weight implies that the associated particle is far from the main body of the posterior distribution and contributes poorly to the sequential importance sampling approximation. If there are too many such ineffective particles, the Monte-Carlo approximation becomes highly unreliable. Empirically, this phenomenon always happens when n gets larger (N being fixed). In simplistic models, it is possible to show that the asymptotic variance of the approximation ˆφ IS n (f) increases exponentially as n increases (see text).

50 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) Importance Weights (base 0 logarithm) Histograms of the base 0 logarithm of the normalized importance weights after (from top to bottom), 0 and 00 iterations for the stochastic volatility model

51 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Numerical Indicator: () Coefficient of Variation A simple criterion is the coefficient of variation of the normalized weights, CV N (ω) = N { N N i= } 2 ωj N j= ωj /2, ω = (ω,..., ω N ) (R + ) N. When the weights are all equal to /N, then the coefficient of variation is equal to 0. At the other extreme, when one normalized weight is equal to and all the others 0, the coefficient of variation equals N. Therefore, a large CV N (ω k ) indicates that there are many ineffective particles and that memory and computation will be wasted.

52 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Numerical Indicator: (2) Entropy Another possible measure of the weight imbalance is the Shannon Entropy of the importance weights, defined as Ent(ω) = N i= ( ) ω i log ω i N j= ωj 2 N, j= ωj ω = (ω,..., ω N ) (R + ) N. When all the importance weights are 0 except, then the entropy is null. On the contrary, if all the weights are equal to /N, then the entropy is maximal and equal to log 2 (N).

53 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) 20 0 Coeff. of Variation Time Index Entropy Time Index Left: coefficient of variations of the weights; right: weight entropy, as a function of n

54 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

55 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling The solution, proposed by (Gordon, Salmond & Smith, 993), to avoid the degeneracy of the importance weights is to regularly resample the particles according to their importance weights (thus equating all importance weights).

56 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling The solution, proposed by (Gordon, Salmond & Smith, 993), to avoid the degeneracy of the importance weights is to regularly resample the particles according to their importance weights (thus equating all importance weights). The basic idea of resampling is to (i) eliminate particles which have small importance weights, (ii) replicate particles which have large importance weights in proportion of their relevance.

57 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling The solution, proposed by (Gordon, Salmond & Smith, 993), to avoid the degeneracy of the importance weights is to regularly resample the particles according to their importance weights (thus equating all importance weights). The basic idea of resampling is to (i) eliminate particles which have small importance weights, (ii) replicate particles which have large importance weights in proportion of their relevance. Resampling concentrates the particles in regions of the state space which are pertinent and avoids exploration of highly improbable areas.

58 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique.

59 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy.

60 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy. The resampling step, while useful in fighting degeneracy, has a drawback: resampling introduces unnecessary noise into the algorithm, and this extra noise might be far from negligible.

61 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy. The resampling step, while useful in fighting degeneracy, has a drawback: resampling introduces unnecessary noise into the algorithm, and this extra noise might be far from negligible. Intuitively, when the importance weights are nearly constant, resampling only reduce the number of distinct particles thus introducing an extra noise without much benefit on the weight degeneracy.

62 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy. The resampling step, while useful in fighting degeneracy, has a drawback: resampling introduces unnecessary noise into the algorithm, and this extra noise might be far from negligible. Intuitively, when the importance weights are nearly constant, resampling only reduce the number of distinct particles thus introducing an extra noise without much benefit on the weight degeneracy. The one-step effect of resampling is thus negative but, on the long-term, resampling is required to guarantee a correct behavior of the algorithm.

63 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling with Resampling (SISR) For time indices k 0, do the following. Sampling: Draw ( ξ k+,..., ξ k+ N ) conditionally independently given {ξj 0:k, j =,..., N} from the instrumental kernel: ξ k+ i R k(ξk i, ), i =,..., N. Compute the updated importance weights Resampling (Optional): ω i k+ = ω i k g k+ ( ξ i k+) dq(ξi k, ) dr k (ξ i k, )( ξ i k+), i =,..., N. Draw, conditionally independently given {(ξ0:k i, ξ j k+ ), i, j =,..., N}, the multinomial trial (I k+,... IN k+ ) with probabilities of success ωk+ N,..., j ωj k+ ω N k+ N j ωj k+. Reset the importance weights ωk+ i to a constant value for i =,..., N.

64 Particle Methods for Hidden Markov Models - EPFL, 7 Dec SISR contd. If resampling is not applied, set for i =,..., N, I i k+ = i. Trajectory update: for i =,..., N, ξ i 0:k+ = ( ) ξ Ii k+ 0:k, ξ Ii k+ k+. Recall that storing the full particle path is usually not needed. The SISR algorithm with systematic resampling and R k = Q (the prior kernel) is known as the bootstrap filter.

65 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Illustration of the Boostrap Filter on a Toy Example Noisy AR() model X k+ µ = φ(x k µ) + σu k Y k = X k + ηv k µ = 0.9, φ = 0.95, σ 2 = 0.0, η 2 = 0.02 = (σ 2 /( φ 2 ))/5 To approximate the predictive distribution φ k+ k, we use the bootstrap filter with N = 50 particles, plotting the full particle paths {ξ0:k i, ξ k+ i } i N for each time index. This example is used since we may also compute the actual filtering densities using Kalman filtering

66 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

67 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

68 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

69 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

70 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

71 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

72 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

73 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

74 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

75 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

76 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

77 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

78 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

79 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

80 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

81 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

82 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

83 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

84 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

85 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

86 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

87 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

88 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

89 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths

90 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) Coeff. of Variation Time Index Entropy Time Index Coefficient of variation (left) and entropy of the normalized importance weights as a function of the number of iterations when using resampling triggered by CV N (ω) >.

91 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) Importance Weights (base 0 logarithm) Histograms of the base 0 logarithm of the normalized importance weights after (from top to bottom), 0 and 00 iterations when using resampling triggered by CV N (ω) >.

92 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

93 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Alternatives to SISR The resampling step in the SISR algorithm can be seen as a method to sample approximately under φ 0:k+ k+ given the current particle approximation ˆφ 0:k k. This alternative way of thinking about resampling suggests several sequential Monte Carlo variants.

94 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Monte Carlo Reinterpreted Recall that each update consists of two steps. Prediction step: compute the one-step ahead predictive distribution from the filtering distribution. φ 0:k+ k = φ 0:k k Q Correction step (Bayes): compute the filtering distribution from the predictive distribution by taking into account the new observation Y k+ : φ 0:k+ k+ (f k+ ) = fk+ (x 0:k+ ) g k+ (x k+ )φ 0:k+ k (dx 0:k+ ) gk+ (x k+ ) φ k+ k (dx 0:k+ ). φ 0:k k prediction φ 0:k+ k correction φ 0:k+ k+

95 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Monte Carlo Reinterpreted Replace φ 0:k k by the empirical filtering distribution. ˆφ 0:k k = N i= ω i k N j= ωj k δ ξ i 0:k. Applying the prediction and then the correction step to this approximation yields ˆφ 0:k k prediction φ 0:k+ k = N i= correction φ 0:k+ k+ (f k+ ) = ω i k N j= ωj k δ ξ i 0:k Q(ξ i k, ) N i= ωi k fk+ (ξ i 0:k, x) g k+(x)q(ξ i k, dx) N i= ωi k gk+ (x)q(ξ i k, dx). The distribution φ 0:k+ k+ is sometimes called the empirical filtering distribution. It is in some sense the best approximation to φ 0:k+ k+ based on the knowledge of ˆφ 0:k k. It is obvioulsy not in general a distribution supported by a finite set of points!

96 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Monte Carlo Reinterpreted The empirical filtering distribution is a mixture distribution φ 0:k+ k+ = N i= ωk i γ k(ξk i ) N j= ωj k γ k(ξ j k ) f k+ (ξ i 0:k, x) T k (ξ i k, dx), where γ k (x) = Q(x, dx )g k+ (x ), A T k (x, A) = Q(x, dx )g k+ (x ) γ k (x). Direct sampling from this distribution is usually not possible (because sampling from T k and evaluating γ k aren t either).

97 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Auxiliary Sampling But we may in general use importance sampling or SIR, proposing new points ξ k+,..., ξ k+ N under the mixture ρ 0:k+ (f k+ ) = N i= ω i k τ i k N j= ωj k τ j k f(ξ i 0:k, x) R k (ξ i k, dx), where τk,..., τ k N sample from. In doing so, we first need to draw mixture component indicators Ik,..., IN k. are user-selected adjusment weigths and R k is a kernel which is easy to It is easilly checked that the importance weigths are then given by ω i k+ = g k+( ξ i k+ ) τ Ii k k dq(ξ Ii k k, ) ( ξ k+) i. dr k (ξ Ii k k, ) This strategy named auxiliary sampling and proposed by (Pitt & Shephard, 999) is often usefull in practice when combined with clever ways of setting {τ i k } i=,...,n and R k.

98 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling It is interesting to consider what happens in cases where sampling from T k and evaluating γ k is feasible (i.e. when τ i k = γ k(ξ i k ) and R k = T k ): Weight computation: For i =,..., N, compute the (unnormalized) importance weights α i k = γ k (ξ i k). Selection: Draw I k+,..., IN k+ conditionally i.i.d. given {ξi 0:k } i N, with probabilities P(I k+ = j) proportional to αj k, j =,..., N. Sampling: Draw ξ k+,..., ξ N k+ conditionally independently given {ξi 0:k } i N and {Ik+ i } i N, with distribution ξ k+ i T k(ξ Ii k+ k, ). Set ξ0:k+ i = (ξii k+ 0:k, ξ k+ i ) = for i =,..., N. and ω i k+

99 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed.

100 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed. The SISR Algorithm prescribes that each trajectory be first extended by setting ξ i 0:k+ = (ξi 0:k, ξ i k+ ) where ξ i k+ is drawn from T k(ξ i k, ). Then resampling is performed in the population of extended trajectories according to their importance weights.

101 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed. The SISR Algorithm prescribes that each trajectory be first extended by setting ξ0:k+ i = (ξi 0:k, ξ k+ i ) where ξ k+ i is drawn from T k(ξk i, ). Then resampling is performed in the population of extended trajectories according to their importance weights. In contrast, the IID sampling algorithm first selects the trajectories based on the weights α i k and then simulate an independent extension for each selected trajectory. The new particles ξ k+,..., ξ N k+ are conditionally independent given the current generation of particles {ξ i k } i=,...,n.

102 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed. The SISR Algorithm prescribes that each trajectory be first extended by setting ξ0:k+ i = (ξi 0:k, ξ k+ i ) where ξ k+ i is drawn from T k(ξk i, ). Then resampling is performed in the population of extended trajectories according to their importance weights. In contrast, the IID sampling algorithm first selects the trajectories based on the weights α i k and then simulate an independent extension for each selected trajectory. The new particles ξ k+,..., ξ N k+ are conditionally independent given the current generation of particles {ξk i } i=,...,n. This is of course only possible because the optimal importance kernel T k is used as instrumental kernel which renders the incremental weights independent of the position of the particle at index k + and thus allow for early selection. This way of proceeding is provably better than SISR with R k = T k (see text).

103 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation

104 Particle Methods for Hidden Markov Models - EPFL, 7 Dec EM and Friends in HMMs A Long Story Made Short If the HMM has some unknown parameters θ, likelihood-based parameter inference, be it through Expectation-Maximization (EM) or gradient-based approaches, (only) requires to be able to compute quantities of the form E [ n k=0 for some model-dependent functions s i. ] s i (X k, X k+ ) Y 0:n ; θ,

105 Particle Methods for Hidden Markov Models - EPFL, 7 Dec EM and Friends in HMMs A Long Story Made Short If the HMM has some unknown parameters θ, likelihood-based parameter inference, be it through Expectation-Maximization (EM) or gradient-based approaches, (only) requires to be able to compute quantities of the form E [ n k=0 for some model-dependent functions s i. ] s i (X k, X k+ ) Y 0:n ; θ If exact computation is not feasible, we may use approximate Monte-Carlo evaluation in combination with variants of the former methods (MCEM, SAME, SAEM, stochastic gradient, etc.),

106 Particle Methods for Hidden Markov Models - EPFL, 7 Dec EM and Friends in HMMs A Long Story Made Short If the HMM has some unknown parameters θ, likelihood-based parameter inference, be it through Expectation-Maximization (EM) or gradient-based approaches, (only) requires to be able to compute quantities of the form E [ n k=0 for some model-dependent functions s i. ] s i (X k, X k+ ) Y 0:n ; θ If exact computation is not feasible, we may use approximate Monte-Carlo evaluation in combination with variants of the former methods (MCEM, SAME, SAEM, stochastic gradient, etc.), Are sequential Monte Carlo methods appropriate for this task?

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

An Introduction to Sequential Monte Carlo for Filtering and Smoothing

An Introduction to Sequential Monte Carlo for Filtering and Smoothing An Introduction to Sequential Monte Carlo for Filtering and Smoothing Olivier Cappé LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/ cappe/ Acknowlegdment: Eric Moulines (TELECOM ParisTech)

More information

Inference in Hidden Markov Models. Olivier Cappé, Eric Moulines and Tobias Rydén

Inference in Hidden Markov Models. Olivier Cappé, Eric Moulines and Tobias Rydén Inference in Hidden Markov Models Olivier Cappé, Eric Moulines and Tobias Rydén June 17, 2009 2 Contents 1 Main Definitions and Notations 1 1.1 Markov Chains.............................. 1 1.1.1 Transition

More information

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

An Brief Overview of Particle Filtering

An Brief Overview of Particle Filtering 1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Sequential Bayesian Updating

Sequential Bayesian Updating BS2 Statistical Inference, Lectures 14 and 15, Hilary Term 2009 May 28, 2009 We consider data arriving sequentially X 1,..., X n,... and wish to update inference on an unknown parameter θ online. In a

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Sensor Fusion: Particle Filter

Sensor Fusion: Particle Filter Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London

More information

Particle Filters: Convergence Results and High Dimensions

Particle Filters: Convergence Results and High Dimensions Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department

More information

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

Lecture Particle Filters. Magnus Wiktorsson

Lecture Particle Filters. Magnus Wiktorsson Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)

More information

Sequential Monte Carlo Methods (for DSGE Models)

Sequential Monte Carlo Methods (for DSGE Models) Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

Switching Regime Estimation

Switching Regime Estimation Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms

More information

Variantes Algorithmiques et Justifications Théoriques

Variantes Algorithmiques et Justifications Théoriques complément scientifique École Doctorale MATISSE IRISA et INRIA, salle Markov jeudi 26 janvier 2012 Variantes Algorithmiques et Justifications Théoriques François Le Gland INRIA Rennes et IRMAR http://www.irisa.fr/aspi/legland/ed-matisse/

More information

Particle Filtering for Data-Driven Simulation and Optimization

Particle Filtering for Data-Driven Simulation and Optimization Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October

More information

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Self Adaptive Particle Filter

Self Adaptive Particle Filter Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Monte Carlo Approximation of Monte Carlo Filters

Monte Carlo Approximation of Monte Carlo Filters Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

LIMIT THEOREMS FOR WEIGHTED SAMPLES WITH APPLICATIONS TO SEQUENTIAL MONTE CARLO METHODS

LIMIT THEOREMS FOR WEIGHTED SAMPLES WITH APPLICATIONS TO SEQUENTIAL MONTE CARLO METHODS ESAIM: ROCEEDIGS, September 2007, Vol.19, 101-107 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071913. LIMIT THEOREMS FOR WEIGHTED SAMLES WITH ALICATIOS TO SEQUETIAL MOTE CARLO METHODS R.

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

13 Notes on Markov Chain Monte Carlo

13 Notes on Markov Chain Monte Carlo 13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful

More information

Introduction to Particle Filters for Data Assimilation

Introduction to Particle Filters for Data Assimilation Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

TSRT14: Sensor Fusion Lecture 8

TSRT14: Sensor Fusion Lecture 8 TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,

More information

AUTOMOTIVE ENVIRONMENT SENSORS

AUTOMOTIVE ENVIRONMENT SENSORS AUTOMOTIVE ENVIRONMENT SENSORS Lecture 5. Localization BME KÖZLEKEDÉSMÉRNÖKI ÉS JÁRMŰMÉRNÖKI KAR 32708-2/2017/INTFIN SZÁMÚ EMMI ÁLTAL TÁMOGATOTT TANANYAG Related concepts Concepts related to vehicles moving

More information

The Unscented Particle Filter

The Unscented Particle Filter The Unscented Particle Filter Rudolph van der Merwe (OGI) Nando de Freitas (UC Bereley) Arnaud Doucet (Cambridge University) Eric Wan (OGI) Outline Optimal Estimation & Filtering Optimal Recursive Bayesian

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

The Hierarchical Particle Filter

The Hierarchical Particle Filter and Arnaud Doucet http://go.warwick.ac.uk/amjohansen/talks MCMSki V Lenzerheide 7th January 2016 Context & Outline Filtering in State-Space Models: SIR Particle Filters [GSS93] Block-Sampling Particle

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information