cappe/
|
|
- Samson Dickerson
- 5 years ago
- Views:
Transcription
1 Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Particle Methods for Hidden Markov Models Olivier Cappé CNRS Lab. Trait. Commun. Inform. & ENST département Trait. Signal Image 46 rue Barrault, Paris cedex 3, France mailto:cappe@tsi.enst.fr cappe/ These lectures are based on the book Inference in Hidden Markov Models written with E. Moulines and T. Rydén (Springer-Verlag, to appear in 2005).
2 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
3 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? A hidden Markov model (abbreviated HMM) is a bivariate discrete-time process {X k, Y k } k 0, where {X k } k 0 is an homogeneous Markov chain and, conditional on {X k } k 0, {Y k } k 0 is a sequence of independent random variables such that the conditional distribution of Y k only depends on X k. The underlying Markov chain {X k } k 0 is called the regime, or state. We denote the state space of the Markov chain {X k } k 0 by X and the set in which {Y k } k 0 takes its values by Y.
4 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? The dependence structure of an HMM can be represented by a graphical model as in X k X k+ Y k Y k+ Graphical representation of the dependence structure of a hidden Markov model, where {Y k } k 0 is the observable process and {X k } k 0 is the hidden chain.
5 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? Of the two processes {X k } k 0 and {Y k } k 0, only {Y k } k 0 is actually observed; the Markov chain {X k } k 0 is unobserved, or hidden. Hence, inference on the parameters of the model must be achieved using {Y k } k 0 only. The other topic of interest is of course inference on the unobserved {X k } k 0 : given a model and some observations, can we estimate the value of the unobservable sequence of states? These two major statistical objectives are indeed strongly connected!
6 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? the Y -variables are conditionally independent given {X k } k 0, but {Y k } k 0 is not an independent sequence because of the dependence in {X k } k 0. {Y k } k 0 is not a Markov chain either: the joint process {X k, Y k } k 0 is a Markov chain, but {Y k } k 0 does not have the loss of memory property: the conditional distribution of Y k given Y 0,..., Y k does depend on all the conditioning variables.
7 Particle Methods for Hidden Markov Models - EPFL, 7 Dec What is a Hidden Markov Model? There are numerous examples: where both X and Y are finite coding, digital communications, bioinformatics where X is finite but Y is not speech recognition, ion channel modelling (Gaussian HMMs) where both X and Y are continuous linear state models, non-linear state space models (ex. stochastic volatility model, bearings-only tracking) where Y is continuous and X = C W with C finite and W continuous conditionally Gaussian linear state space models (AKA jump Markov models) non-hmms that behave similarly switching autoregressions, Markov switching models Except for stability properties and theory of MLE which we don t consider today...
8 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
9 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Hidden Markov Model Notations for HMMs. {X k } k 0 is a Markov chain on X with initial distribution ν and transition kernel Q 2. {Y k } k 0 is such that for f 0,..., f n F b (Y), [ n ] n E f k (Y k ) X 0:n = k=0 k=0 Y f k (y) g(x k, y)µ(dy), where X 0:n denotes the collection X 0,..., X n and g is a transition density function (with respect to µ) sometimes referred to as the conditional likelihood function. We will also use the simplified notation g k (x) def = g(x, Y k )
10 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Some More Notations: Usual Kernel Operations Q(x, A) = Q(x, f) = A Q(x, dx ) P[X k+ A X k ] = Q(X k, A) Q(x, dx )f(x ) E[f(X k+ ) X k ] = Q(X k, f) νq(f) = ν(dx)q(x, dx )f(x ) (also denoted (Qf)(x)) Expectation after one step, starting under ν Q n (x 0, f) = Q n (x 0, Qf) Expectation after n steps, starting under δ x0 Markov transition kernels are such that Q(x, X) =. Sometimes unnormalized transition kernels, such that Q(x, A) 0 for all A X and 0 < Q(x, X) <, are also used.
11 Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Filtering and Smoothing Recursions To be answered Given a HMM, how to evaluate the conditional distribution of the states X k, given the observations Y 0,..., Y n? We introduce the generic notation φ ν,k:l n to denote the conditional distribution of X k:l given Y 0:n, where ν recalls the dependence with respect to the initial distribution (which will sometimes be omitted). The joint probability of the unobservable states and observations up to index n is such ( that, for any function f F ) b {X Y} n+, E ν [f(x 0, Y 0,..., X n, Y n )] = f(x 0, y 0,..., x n, y n ) n ν(dx 0 )g(x 0, y 0 ) {Q(x k, dx k )g(x k, y k )} µ n (dy 0,..., dy n ). k=
12 Particle Methods for Hidden Markov Models - EPFL, 7 Dec The Likelihood Marginalizing with respect to the unobservable variables X 0,..., X n yields E ν [f(y 0,..., Y n )] = f(y 0,..., y n ) L ν,n (y 0,..., y n ) µ n (dy 0,..., dy n ), ( for f F ) b Y n+, where L ν,n (y 0,..., y n ) = ν(dx 0 )g(x 0, y 0 )Q(x 0, dx )g(x, y ) Q(x n, dx n )g(x n, y n ), is the likelihood of the observations.
13 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Joint Smoothing Distribution By Bayes rule φ ν,0:n n (y 0:n, f) = L ν,n (y 0:n ) f(x 0:n ) n ν(dx 0 )g(x 0, y 0 ) Q(x k, dx k )g(x k, y k ) k= for all functions f F b ( X n+ ). In the following, we always use the implicit conditioning convention, writing φ ν,0:n n (f) = L ν,n n f(x 0:n )ν(dx 0 )g 0 (x 0 ) Q(x k, dx k )g k (x k ) k= where L ν,n = ν(dx 0 )g 0 (x 0 ) n Q(x k, dx k )g k (x k ). k=
14 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Recursive Smoothing Formula Comparing the expressions corresponding to n and n + gives the following update equation for the joint smoothing distribution: φ ν,0:n+ n+ (f n+ ) = ( Ln+ L n ) f n+ (x 0:n+ ) φ ν,0:n n (dx 0,..., dx n, dx n ) Q(x n, dx n+ ) g n+ (x n+ ) for functions f n+ F b ( X n+2 ). = Very simple structure but involves the normalization factor c n+ def = L n+ /L n which is not computable, except in simple cases such as when X finite This claim is not obvious (see next slides...)
15 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Filtering Recursion Marginalizing with respect to all variables but x n and x n+ gives the (marginal) filtering recursion: c ν,n+ = φ ν,n n (dx)q(x, dx )g n+ (x ), φ ν,n+ n+ (f) = c ν,n+ f(x) φ ν,n (dx)q(x, dx )g n+ (x ), with initial condition c ν,0 = ν(g 0 ), φ ν,0 0 (f) = c ν,0 f(x)g 0 (x) ν(dx). Remark When X is finite (speech recognition, bioinformatics) the above is known as the normalized forward recursion (of forward-backward); the specialization of this relation to Gaussian linear state-space model is known as Kalman filtering.
16 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Prediction and Filtering Updates It is sometimes convenient to break the previous recursion in two steps: φ ν,n+ n = φ ν,n n Q. c ν,n+ = φ ν,n+ n (g n+ ), φ ν,n+ n+ (f) = c ν,n+ f(x) g n+ (x)φ ν,n+ n (dx). prediction filtering Computation of the Log-Likelihood l ν,n def = log L ν,n = n log φ ν,k k (g k ). This is non-trivial: we have replaced an n + dimensional integral by a product of n + k=0 integrals on X! In finite state space HMMs, the filtering recursion makes it possible to evaluate the (log-)likelihood in O{(n + ) Card 2 (X)} operations.
17 Particle Methods for Hidden Markov Models - EPFL, 7 Dec The recursion Recap: Filtering and Smoothing φ ν,n+ n = φ ν,n n Q, c ν,n+ = φ ν,n+ n (g n+ ), φ ν,n+ n+ (f) = c ν,n+ f(x) g n+ (x)φ ν,n+ n (dx), with φ ν,0 def = ν computes the filtering and predictive distributions recursively, making it possible (i) to compute the likelihood L ν,n+ and, potentially, (ii) the joint smoothing distribution since φ 0:n+ n+ (f n+ ) = c ν,n+ f n+ (x 0:n+ ) φ 0:n n (dx 0,..., dx n, dx n ) Q(x n, dx n+ ) g n+ (x n+ ).
18 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Appendix: Finite-Dimensional Recursive Smoothing for a Sum In particular, if f n (x 0:n ) = n k=0 s(x k), define the signed measure τ ν,n by ( n ) τ ν,n (f) = f(x n ) s(x k ) φ ν,0:n n (dx 0,..., dx n ), k=0 such that τ ν,n (X) = E ν [ n k=0 s(x k) Y 0:n ]. Then, τ ν,n+ (f) Z = c ν,n+ = Z Z f(x n+ ) n+ X k=0 s(x k ) φ ν,0:n n (dx 0,..., dx n, dx n ) Q(x n, dx n+ ) g n+ (x n+ ) f(x n+ ) Z φ ν,n+ n+ (dx n+ ) s(x n+ ) + c n+! «τ ν,n (dx n ) Q(x n, dx n+ ) g n+ (x n+ ).
19 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
20 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Monte-Carlo Integration Objective Given a probability measure µ, how to evaluate numerically µ(f) = X µ(dx)f(x) for arbitrary µ-integrable functions f? The Monte Carlo Answer. Draw an independent sample ξ,..., ξ N from the probability measure µ. 2. Compute the sample average N N i= h(ξ i ). This technique is applicable only when direct sampling from the distribution µ is feasible.
21 Particle Methods for Hidden Markov Models - EPFL, 7 Dec (Unnormalized) Importance Sampling: General Principle It is also possible to sample from an instrumental (or importance) distribution ν, applying a change-of-measure formula to account for the fact that the instrumental distribution differs from the target distribution µ. More formally, if the target probability measure µ is absolutely continuous with respect to to the instrumental probability measure ν, µ ν. For any µ-integrable function f µ(f) = f(x) µ(dx) = f(x) dµ dν (x)ν(dx), where dµ dν is the Radon-Nikodym derivative of µ with respect to ν, called the importance function (or importance ratio) in the context of importance sampling.
22 Particle Methods for Hidden Markov Models - EPFL, 7 Dec (Unnormalized) Importance Sampling: the Algorithm Sampling Draw an independent sample ξ,..., ξ N from the distribution ν. Weighting Compute the importance weights for i =,..., N. Weigthed Monte Carlo Approximation ω i = dµ dν (ξi ), µ IS ν,n (f) = N N ω i f(ξ i ) i=
23 Particle Methods for Hidden Markov Models - EPFL, 7 Dec (Unnormalized) Importance Sampling: Large Sample Performance Strong law of large numbers The sequence µ IS ν,n N. (f) converges to µ(f), almost surely as Central limit theorem If f is a real-valued measurable function satisfying ν ( ( + f 2 ) ( ) ) 2 dµ dν = µ ( ( + f 2 ) dµ ) dν µ IS ν,n (f) is asymptotically normal N( µ IS ν,n (f) µ(f)) where ( ) dµ Var ν dν f = ν ( { f dµ } ) 2 dν µ(f). <, ( ( )) D N 0, Var dµ ν dν f Deviations inequalities (exponential, L p ) or more sophisticated empirical process results are also available. = Choosing ν such that dµ/dν stays as small as possible is very important in practice.
24 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Importance Sampling In situations where dµ dν is known only up to a scaling factor we can still use the importance sampling estimator, just changing the normalization factor µ IS ν,n (f) = N i= f(ξi ) dµ dν (ξi ) N, dµ i= dν (ξi ) The (self-normalized) importance sampling estimator (sometimes also called Bayesian sampling estimator) is defined as a ratio of the unnormalized importance sampling estimators µ IS By the Strong Law of Large Numbers showing that µ IS ν,n ν,n (f) = µis ν,n (f) µ IS ν,n (). µ IS ν,n (f) a.s. µ(f) µ IS ν,n () a.s. (f) is a strongly consistent estimator of µ(f).
25 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Importance Sampling (contd.) Assuming in addition that f is real-valued and satisfies ν ( ( + f 2 ) ( ) ) 2 dµ dν = µ ( ( + f 2 ) dµ ) dν <, N( µ IS ν,n (f) µ(f)) ( dµ σ 2 (ν, f) = Var ν {f µ(f)} dν D N ( 0, σ 2 (ν, h) ), ) ( (dµ ) ) 2 = ν (f µ(f)) 2. dν The estimator is errorless for constant functions and its performance is clearly dependent on the fact that dµ/dν stays small.
26 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sampling Importance Resampling (SIR) While importance sampling is originally designed to overcome difficulties with direct sampling from µ when approximating integrals like µ(f) it can also be used for approximate sampling from the distribution µ. The sampling importance resampling (SIR) method is a two-stages method: Sampling: Draw an i.i.d. sample ξ,..., ξ M from the instrumental distribution ν. Weighting: Compute the (normalized) importance weights ω i = dµ dν ( ξ i ) for i =,..., M. / M j= dµ dν ( ξ j ) Resampling: Draw, conditionally independently given ( ξ,..., ξ M ), N discrete random variables (I,..., I N ) taking values in the set {,..., M} with probabilities (ω,..., ω M ). Set, for i =,..., N, ξ i = ξ Ii. The set (I,..., I N ) is thus a multinomial trial process. This resampling method is known as multinomial resampling.
27 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sampling Importance Resampling (contd.) The first stage sample ξ,..., ξ M is really distributed under ν. In the resampling operation, the bad points, as measured by dµ/dν, are discarded whereas the good points are selected (and perhaps duplicated) with high probability. TARGET TARGET
28 Particle Methods for Hidden Markov Models - EPFL, 7 Dec SIR: Large Sample Behavior It is not obvious in which sense (ξ,..., ξ N ) is (approximately) a sample from the target distribution µ. Rewriting ˆµ SIR ν,m,n (f) = N N f(ξ i ) = i= it is easily seen that the sample mean ˆµ SIR ν,m,n M i= N i N f( ξ i ), (f) of the SIR sample is, conditionally on the first-stage sample ( ξ,..., ξ M ), equal to the importance sampling estimator µ IS ν,m (f): [ E ˆµ SIR ν,m,n (f) ξ,..., ξ ] M = µ IS ν,m (f). As a consequence the SIR estimator ˆµ SIR ν,m,n (f) is an unbiased estimate of µ(f), but its mean squared error is always larger than that of the importance sampling estimator due to the well-known variance decomposition E [ (ˆµ SIR ν,m,n (f) µ(f)) 2] = E [ (ˆµ SIR ν,m,n (f) µ IS ν,m (f)) 2] + E [ ( µ IS ν,m (f) µ(f)) 2].
29 Particle Methods for Hidden Markov Models - EPFL, 7 Dec SIR: Large Sample Behavior (contd.) Going beyond this elementary result is not trivial because the second stage sample ξ,..., ξ N is no more i.i.d. after resampling due to the normalization of the importance weights. Theorem Assume that µ ν. Let {ξ i } i M be i.i.d. random variables with distribution ν. Then ˆµ SIR ν,m,n M, N. (f) is a (weakly) consistent estimate of µ(f) for µ-integrable functions f as Assume in addition that lim M,N M/N = α for some α and that dµ/dν and fdµ/dν are in L 2 (X, ν). Then ˆµ SIR ν,m,n (f) is asymptotically normal N(ˆµ SIR ν,m,n (f) µ(f)) D N ( 0, σ 2 (f) ) with σ 2 (f) = Var µ (f) }{{} variance of resampling ( dµ + α Var ν dν ) {f µ(f)} } {{ } variance of IS. Analysis of opposite case is possible but less interesting in practice.
30 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Alternative Resampling Schemes There are other resampling schemes that guarantee that E [N i ξ,..., ξ M ] = Nω i for i =,..., N and that have lower conditional variance. ω + ω 2 + ω 3 ω + ω 2 ω + ω 2 + ω 3 ω + ω ω ω Principle of stratified sampling (left) and systematic sampling (right). Note: the latter does not always reduce the conditional variance. Studying their large sample behavior is harder however.
31 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
32 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling The principle of sequential Monte Carlo methods is to use Monte Carlo integration to approximate the filtering recursion in general HMMs (not finite HMMs or GLSSMs). The key remark, which can be traced back to (Handschin & Mayne, 969) and (Handschin, 970), is that the importance sampling method targeting the joint smoothing distribution φ 0:n n can be implemented sequentially, due to the particular structure of φ 0:n n. The corresponding algorithm is known as sequential importance sampling (SIS). The SIS algorithm does reasonably well but is bound to become unreliable for larger values of n (this limitation will be taken care of latter...)
33 Particle Methods for Hidden Markov Models - EPFL, 7 Dec HMM Notations (Repeated) Recall that an hidden Markov model is such that X k+ Q(X k, ) Y k G(X k, ) state equation measurement equation where {X k } k 0 is a Markov chain with transition kernel Q and initial distribution ν G is a transition kernel from (X, X ) to (Y, Y) and there exists a measure µ such that, for all x X, A Y, G(x, A) = A g(x, y)µ(dy). To simplify the mathematical expressions, we use the notation g k to denote the function g(, Y k ), considered as a function of its first argument.
34 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Smoothing (Repeated) The posterior distribution φ 0:n n of the states X0:n given the observations Y 0:n may be computed recursively (in n) according to g0 (x 0 )ν(dx 0 )f(x 0 ) φ 0 0 (f) = g0 (x 0 )ν(dx 0 ) φ 0:n n (f n ) = f n (x 0:n ) φ ν,0:n n (dx 0:n )Tn (x u n, dx n ), where, for k 0, Tn u is the unnormalized transition kernel onto (X, X ) given by ( ) Tk u Lk+ (x, A) = Q(x, dx )g k+ (x ), x X, A X. L k A In this part we omit to indicate the dependence with respect to ν which not essential.
35 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Choice of the Instrumental Distribution Key Remark: Both the simulation from the instrumental distribution and the computation of the importance weights can be carried out sequentially if a, possibly non-homogeneous, Markov chain is used as instrumental distribution. More precisely, Let {R k } k 0 denote a family of Markov transition kernels on (X, X ) and ρ 0 a probability measure on (X, X ). Assume that φ 0 0 ρ 0 and for all k 0 and all x X, T u k (x, ) R k(x, ). The inhomogeneous Markov chain with initial distribution ρ 0 and transition kernels {R k } k 0 defines the following distributions ρ 0:k (f k ) = f k (x 0:k ) ρ 0 (dx 0 ) k l=0 R l (x l, dx l+ ).
36 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Computation of the Importance Function The importance function is then defined as dφ 0:n n (x 0:n ) = dφ n 0 0 (x 0 ) dρ 0:n dρ 0 k=0 which can be computed sequentially in the sense that for k 0. dφ 0:k+ k+ dρ 0:k+ dt u n (x k, ) dr n (x k, ) (x k+), (x 0:k+ ) = dφ 0:k k dρ 0:k (x 0:k ) dt u n (x k, ) dr n (x k, ) (x k+),
37 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling Algorithm Initialization Draw ξ 0,..., ξ N 0 independently from ρ 0 and compute the weights ω i 0 = dφ 0 0 dρ 0 (ξ i 0), for i =,..., N. Recursion For k = 0,... For i =,..., N Draw ξ i k+ conditionally independently from {ξj l, ξm k } l<k, j N,m<i under the distribution R k (ξ i k, ). Update the importance weight according to ω i k+ = ω i k dt u k (ξi k, ) dr k (ξ i k, )(ξi k+). The ratio ω i k+ /ωi k is often referred to as the incremental weight; the points ξi k are called particles; the trajectories ξ i 0:k path particles.
38 Particle Methods for Hidden Markov Models - EPFL, 7 Dec FILT. INSTR. FILT. + One step of the SIS algorithm with just seven particles.
39 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling Approximation At any time index n, the sequential importance sampling estimator of φ 0:n n (f n ) is available as ˆφ IS 0:n n (f n) = N i= f n(ξ i 0:n)ω i n N i= ωi n. Remark If we are just interested in functions f n (x 0:n ) = f(x n ), storing the full trajectories of the particles is not required; each step of the algorithm involves O(N) operations and requires just that N + N dim(x) real numbers be stored. Likewise, for functions of the form f n (x 0:n ) = f k (x n k:n ) only the last k + elements of each path particle ξ0:n i needs to be stored. We will see latter that one may indeed consider more general functions f n as long as they have a specific structure...
40 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Choosing the Importance Kernel: () the Prior Kernel As for non-sequential importance sampling, the performance of SIS depends crucially on the choice of the importance kernel R k (and, to a lesser extent, on that of ρ 0 ). The most obvious solution is to use the prior kernel R n = Q: The instrumental kernel at each iteration mimics the state dynamic, which is usually simple to sample from. The incremental weight dtk u (x, ) dr k (x, ) (x ) = L k L k+ g k+ (x ), (x, x ) X X. does not depend on x X, hence computing the incremental weight simply amounts to evaluating the conditional likelihood function for the new particle positions. Recall that the importance weights need to be evaluated up to a constant only, hence the non-computable factor L k /L k+ may be omitted.
41 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Lack of Robustness of the Prior Kernel The prior kernel is a reasonable option which is computationally very simple and is thus often hard to beat, especially in models where the state is not precisely identified by the observations. It is however very sensitive to the presence of outliers : FILT. FILT. + FILT. +2 Conflict between the prior and the posterior: at time k +, the observation does not agree with the particle approximation of the predictive distribution. After reweighting step, the mass becomes concentrated on a single particle. Due to the multiplicative structure of the importance weight, recovering from this situation is almost often impossible.
42 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Choosing the Importance Kernel: (2) the Optimal Kernel To circumvent the problem one needs to incorporate information both on the state dynamic and on the new observation. Among all possible options, there is only one kernel which is such that the new weight ωk+ i is a deterministic function of the current particle ξk i ; this is the only choice for which the conditional variance of the new weights is equal to zero: Let R k (f) = T k (x, f) def = γ k (x) f(x ) Q(x, dx )g k+ (x ) where γ k (x) def = Q(x, dx )g k+ (x ). X Then dtk u (x, ) dt k (x, ) (x ) = L k L k+ γ k (x), (x, x ) X X. Unfortunately, computing γ k is usually not feasible in models where implementing the filtering recursion is problematic!
43 Particle Methods for Hidden Markov Models - EPFL, 7 Dec The Optimal Kernel is More Robust to Outliers FILT. FILT. FILT. + FILT. +2 FILT. + FILT. +2 The optimal kernel proposes particles in the regions where the filtering density has most of its mass.
44 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Local Approximation of the Optimal Importance Kernel The aim is to find a distribution which resembles sampling from the optimal kernel but for which the incremental weight is computable: Ideally, this distribution should be overdispersed (recall the dµ/dν factor!) but not wildly inaccurate. We can find such a distribution in two steps:. locate the high-density region of the (multivariate) optimal distribution to ensure that our proposal does not entirely miss important regions; 2. create an overdispersed approximation, so that the instrumental distribution dominates the optimal importance distribution. Of course, because we have to repeat the process for each particles, the overall procedure should be reasonably simple.
45 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model Consider the (discrete-time) stochastic volatility model where X k+ = φx k + σu k φ <, Y k = β exp(x k /2)V k,. {U k } k 0 and {V k } k 0 are independent standard Gaussian white noise processes. 2. X 0 N(0, σ 2 /( φ 2 ). In this model, ( q(x, x ) = exp (x φx) 2 ) 2πσ 2 2σ 2, ( g k+ (x ) = exp Y k+ 2 2πβ 2 2β 2 exp( x ) ) 2 x, and the incremental weight γ k (x) is not available in closed form.
46 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) The function x log(q(x, x )g k+ (x )) is (strictly) log-concave and thus unimodal. The mode m k (x) of the optimal transition density is the unique solution of the non-linear equation, σ 2 (x φx) + Y k 2 2β 2 exp( x ) 2 = 0. The solution of this equation can be computed numerically. We use, for instance, as instrumental kernel a t-distribution with η = 5 degrees of freedom, the scale of which being set as the inverse of the negated second-order derivative of x log q(x, x )g k (x ) evaluated at the mode m k (x), which is given by: σ 2 k(x) = ( σ 2 + Y 2 2β 2 k+ exp[ m k(x)] ). The incremental weight may easily be evaluated once m k (x) and σk 2 (x) have been computed (note that it now depends both on x and x ). Recall also that we need to repeat these steps independently for each current particle position x = ξ i k.
47 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) 0.08 Density Time Index State Waterfall representation of the sequence of estimated filtering distribution with actual state (,000 particles).
48 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Weight Degeneracy The normalized importance weights measure the pertinence of each particle: a relatively small importance weight implies that the associated particle is far from the main body of the posterior distribution and contributes poorly to the sequential importance sampling approximation. If there are too many such ineffective particles, the Monte-Carlo approximation becomes highly unreliable.
49 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Weight Degeneracy The normalized importance weights measure the pertinence of each particle: a relatively small importance weight implies that the associated particle is far from the main body of the posterior distribution and contributes poorly to the sequential importance sampling approximation. If there are too many such ineffective particles, the Monte-Carlo approximation becomes highly unreliable. Empirically, this phenomenon always happens when n gets larger (N being fixed). In simplistic models, it is possible to show that the asymptotic variance of the approximation ˆφ IS n (f) increases exponentially as n increases (see text).
50 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) Importance Weights (base 0 logarithm) Histograms of the base 0 logarithm of the normalized importance weights after (from top to bottom), 0 and 00 iterations for the stochastic volatility model
51 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Numerical Indicator: () Coefficient of Variation A simple criterion is the coefficient of variation of the normalized weights, CV N (ω) = N { N N i= } 2 ωj N j= ωj /2, ω = (ω,..., ω N ) (R + ) N. When the weights are all equal to /N, then the coefficient of variation is equal to 0. At the other extreme, when one normalized weight is equal to and all the others 0, the coefficient of variation equals N. Therefore, a large CV N (ω k ) indicates that there are many ineffective particles and that memory and computation will be wasted.
52 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Numerical Indicator: (2) Entropy Another possible measure of the weight imbalance is the Shannon Entropy of the importance weights, defined as Ent(ω) = N i= ( ) ω i log ω i N j= ωj 2 N, j= ωj ω = (ω,..., ω N ) (R + ) N. When all the importance weights are 0 except, then the entropy is null. On the contrary, if all the weights are equal to /N, then the entropy is maximal and equal to log 2 (N).
53 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) 20 0 Coeff. of Variation Time Index Entropy Time Index Left: coefficient of variations of the weights; right: weight entropy, as a function of n
54 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
55 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling The solution, proposed by (Gordon, Salmond & Smith, 993), to avoid the degeneracy of the importance weights is to regularly resample the particles according to their importance weights (thus equating all importance weights).
56 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling The solution, proposed by (Gordon, Salmond & Smith, 993), to avoid the degeneracy of the importance weights is to regularly resample the particles according to their importance weights (thus equating all importance weights). The basic idea of resampling is to (i) eliminate particles which have small importance weights, (ii) replicate particles which have large importance weights in proportion of their relevance.
57 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling The solution, proposed by (Gordon, Salmond & Smith, 993), to avoid the degeneracy of the importance weights is to regularly resample the particles according to their importance weights (thus equating all importance weights). The basic idea of resampling is to (i) eliminate particles which have small importance weights, (ii) replicate particles which have large importance weights in proportion of their relevance. Resampling concentrates the particles in regions of the state space which are pertinent and avoids exploration of highly improbable areas.
58 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique.
59 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy.
60 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy. The resampling step, while useful in fighting degeneracy, has a drawback: resampling introduces unnecessary noise into the algorithm, and this extra noise might be far from negligible.
61 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy. The resampling step, while useful in fighting degeneracy, has a drawback: resampling introduces unnecessary noise into the algorithm, and this extra noise might be far from negligible. Intuitively, when the importance weights are nearly constant, resampling only reduce the number of distinct particles thus introducing an extra noise without much benefit on the weight degeneracy.
62 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Resampling This idea is clearly rooted in the sampling importance resampling (SIR) technique. However, contrary to standard (non-sequential) SIR, the main aim of the resampling step is not to draw (asymptotically correctly) an i.i.d. sample from a distribution but rather to avoid weight degeneracy. The resampling step, while useful in fighting degeneracy, has a drawback: resampling introduces unnecessary noise into the algorithm, and this extra noise might be far from negligible. Intuitively, when the importance weights are nearly constant, resampling only reduce the number of distinct particles thus introducing an extra noise without much benefit on the weight degeneracy. The one-step effect of resampling is thus negative but, on the long-term, resampling is required to guarantee a correct behavior of the algorithm.
63 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Importance Sampling with Resampling (SISR) For time indices k 0, do the following. Sampling: Draw ( ξ k+,..., ξ k+ N ) conditionally independently given {ξj 0:k, j =,..., N} from the instrumental kernel: ξ k+ i R k(ξk i, ), i =,..., N. Compute the updated importance weights Resampling (Optional): ω i k+ = ω i k g k+ ( ξ i k+) dq(ξi k, ) dr k (ξ i k, )( ξ i k+), i =,..., N. Draw, conditionally independently given {(ξ0:k i, ξ j k+ ), i, j =,..., N}, the multinomial trial (I k+,... IN k+ ) with probabilities of success ωk+ N,..., j ωj k+ ω N k+ N j ωj k+. Reset the importance weights ωk+ i to a constant value for i =,..., N.
64 Particle Methods for Hidden Markov Models - EPFL, 7 Dec SISR contd. If resampling is not applied, set for i =,..., N, I i k+ = i. Trajectory update: for i =,..., N, ξ i 0:k+ = ( ) ξ Ii k+ 0:k, ξ Ii k+ k+. Recall that storing the full particle path is usually not needed. The SISR algorithm with systematic resampling and R k = Q (the prior kernel) is known as the bootstrap filter.
65 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Illustration of the Boostrap Filter on a Toy Example Noisy AR() model X k+ µ = φ(x k µ) + σu k Y k = X k + ηv k µ = 0.9, φ = 0.95, σ 2 = 0.0, η 2 = 0.02 = (σ 2 /( φ 2 ))/5 To approximate the predictive distribution φ k+ k, we use the bootstrap filter with N = 50 particles, plotting the full particle paths {ξ0:k i, ξ k+ i } i N for each time index. This example is used since we may also compute the actual filtering densities using Kalman filtering
66 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
67 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
68 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
69 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
70 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
71 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
72 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
73 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
74 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
75 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
76 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
77 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
78 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
79 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
80 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
81 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
82 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
83 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
84 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
85 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
86 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
87 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
88 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
89 Particle Methods for Hidden Markov Models - EPFL, 7 Dec state time index state time index Predictive densities and evolution of the particle paths
90 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) Coeff. of Variation Time Index Entropy Time Index Coefficient of variation (left) and entropy of the normalized importance weights as a function of the number of iterations when using resampling triggered by CV N (ω) >.
91 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Application to the Stochastic Volatility Model (contd.) Importance Weights (base 0 logarithm) Histograms of the base 0 logarithm of the normalized importance weights after (from top to bottom), 0 and 00 iterations when using resampling triggered by CV N (ω) >.
92 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
93 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Alternatives to SISR The resampling step in the SISR algorithm can be seen as a method to sample approximately under φ 0:k+ k+ given the current particle approximation ˆφ 0:k k. This alternative way of thinking about resampling suggests several sequential Monte Carlo variants.
94 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Monte Carlo Reinterpreted Recall that each update consists of two steps. Prediction step: compute the one-step ahead predictive distribution from the filtering distribution. φ 0:k+ k = φ 0:k k Q Correction step (Bayes): compute the filtering distribution from the predictive distribution by taking into account the new observation Y k+ : φ 0:k+ k+ (f k+ ) = fk+ (x 0:k+ ) g k+ (x k+ )φ 0:k+ k (dx 0:k+ ) gk+ (x k+ ) φ k+ k (dx 0:k+ ). φ 0:k k prediction φ 0:k+ k correction φ 0:k+ k+
95 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Monte Carlo Reinterpreted Replace φ 0:k k by the empirical filtering distribution. ˆφ 0:k k = N i= ω i k N j= ωj k δ ξ i 0:k. Applying the prediction and then the correction step to this approximation yields ˆφ 0:k k prediction φ 0:k+ k = N i= correction φ 0:k+ k+ (f k+ ) = ω i k N j= ωj k δ ξ i 0:k Q(ξ i k, ) N i= ωi k fk+ (ξ i 0:k, x) g k+(x)q(ξ i k, dx) N i= ωi k gk+ (x)q(ξ i k, dx). The distribution φ 0:k+ k+ is sometimes called the empirical filtering distribution. It is in some sense the best approximation to φ 0:k+ k+ based on the knowledge of ˆφ 0:k k. It is obvioulsy not in general a distribution supported by a finite set of points!
96 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Sequential Monte Carlo Reinterpreted The empirical filtering distribution is a mixture distribution φ 0:k+ k+ = N i= ωk i γ k(ξk i ) N j= ωj k γ k(ξ j k ) f k+ (ξ i 0:k, x) T k (ξ i k, dx), where γ k (x) = Q(x, dx )g k+ (x ), A T k (x, A) = Q(x, dx )g k+ (x ) γ k (x). Direct sampling from this distribution is usually not possible (because sampling from T k and evaluating γ k aren t either).
97 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Auxiliary Sampling But we may in general use importance sampling or SIR, proposing new points ξ k+,..., ξ k+ N under the mixture ρ 0:k+ (f k+ ) = N i= ω i k τ i k N j= ωj k τ j k f(ξ i 0:k, x) R k (ξ i k, dx), where τk,..., τ k N sample from. In doing so, we first need to draw mixture component indicators Ik,..., IN k. are user-selected adjusment weigths and R k is a kernel which is easy to It is easilly checked that the importance weigths are then given by ω i k+ = g k+( ξ i k+ ) τ Ii k k dq(ξ Ii k k, ) ( ξ k+) i. dr k (ξ Ii k k, ) This strategy named auxiliary sampling and proposed by (Pitt & Shephard, 999) is often usefull in practice when combined with clever ways of setting {τ i k } i=,...,n and R k.
98 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling It is interesting to consider what happens in cases where sampling from T k and evaluating γ k is feasible (i.e. when τ i k = γ k(ξ i k ) and R k = T k ): Weight computation: For i =,..., N, compute the (unnormalized) importance weights α i k = γ k (ξ i k). Selection: Draw I k+,..., IN k+ conditionally i.i.d. given {ξi 0:k } i N, with probabilities P(I k+ = j) proportional to αj k, j =,..., N. Sampling: Draw ξ k+,..., ξ N k+ conditionally independently given {ξi 0:k } i N and {Ik+ i } i N, with distribution ξ k+ i T k(ξ Ii k+ k, ). Set ξ0:k+ i = (ξii k+ 0:k, ξ k+ i ) = for i =,..., N. and ω i k+
99 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed.
100 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed. The SISR Algorithm prescribes that each trajectory be first extended by setting ξ i 0:k+ = (ξi 0:k, ξ i k+ ) where ξ i k+ is drawn from T k(ξ i k, ). Then resampling is performed in the population of extended trajectories according to their importance weights.
101 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed. The SISR Algorithm prescribes that each trajectory be first extended by setting ξ0:k+ i = (ξi 0:k, ξ k+ i ) where ξ k+ i is drawn from T k(ξk i, ). Then resampling is performed in the population of extended trajectories according to their importance weights. In contrast, the IID sampling algorithm first selects the trajectories based on the weights α i k and then simulate an independent extension for each selected trajectory. The new particles ξ k+,..., ξ N k+ are conditionally independent given the current generation of particles {ξ i k } i=,...,n.
102 Particle Methods for Hidden Markov Models - EPFL, 7 Dec IID Sampling Compared with the SISR Algorithm for the particular choice R k = T k, the IID sampling algorithm differs only by the order in which the sampling (or mutation) and selection operations are performed. The SISR Algorithm prescribes that each trajectory be first extended by setting ξ0:k+ i = (ξi 0:k, ξ k+ i ) where ξ k+ i is drawn from T k(ξk i, ). Then resampling is performed in the population of extended trajectories according to their importance weights. In contrast, the IID sampling algorithm first selects the trajectories based on the weights α i k and then simulate an independent extension for each selected trajectory. The new particles ξ k+,..., ξ N k+ are conditionally independent given the current generation of particles {ξk i } i=,...,n. This is of course only possible because the optimal importance kernel T k is used as instrumental kernel which renders the incremental weights independent of the position of the particle at index k + and thus allow for early selection. This way of proceeding is provably better than SISR with R k = T k (see text).
103 Particle Methods for Hidden Markov Models - EPFL, 7 Dec Roadmap. What is a Hidden Markov Model? 2. Filtering and Smoothing Recursions 3. Monte Carlo, Importance Sampling and Sampling Importance Resampling 4. Sequential Importance Sampling 5. Sequential Importance Sampling with Resampling 6. More Sequential Monte Carlo Algorithms 7. Approximation of Sums Functionals and Parameter Estimation
104 Particle Methods for Hidden Markov Models - EPFL, 7 Dec EM and Friends in HMMs A Long Story Made Short If the HMM has some unknown parameters θ, likelihood-based parameter inference, be it through Expectation-Maximization (EM) or gradient-based approaches, (only) requires to be able to compute quantities of the form E [ n k=0 for some model-dependent functions s i. ] s i (X k, X k+ ) Y 0:n ; θ,
105 Particle Methods for Hidden Markov Models - EPFL, 7 Dec EM and Friends in HMMs A Long Story Made Short If the HMM has some unknown parameters θ, likelihood-based parameter inference, be it through Expectation-Maximization (EM) or gradient-based approaches, (only) requires to be able to compute quantities of the form E [ n k=0 for some model-dependent functions s i. ] s i (X k, X k+ ) Y 0:n ; θ If exact computation is not feasible, we may use approximate Monte-Carlo evaluation in combination with variants of the former methods (MCEM, SAME, SAEM, stochastic gradient, etc.),
106 Particle Methods for Hidden Markov Models - EPFL, 7 Dec EM and Friends in HMMs A Long Story Made Short If the HMM has some unknown parameters θ, likelihood-based parameter inference, be it through Expectation-Maximization (EM) or gradient-based approaches, (only) requires to be able to compute quantities of the form E [ n k=0 for some model-dependent functions s i. ] s i (X k, X k+ ) Y 0:n ; θ If exact computation is not feasible, we may use approximate Monte-Carlo evaluation in combination with variants of the former methods (MCEM, SAME, SAEM, stochastic gradient, etc.), Are sequential Monte Carlo methods appropriate for this task?
Computer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationAn Introduction to Sequential Monte Carlo for Filtering and Smoothing
An Introduction to Sequential Monte Carlo for Filtering and Smoothing Olivier Cappé LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/ cappe/ Acknowlegdment: Eric Moulines (TELECOM ParisTech)
More informationInference in Hidden Markov Models. Olivier Cappé, Eric Moulines and Tobias Rydén
Inference in Hidden Markov Models Olivier Cappé, Eric Moulines and Tobias Rydén June 17, 2009 2 Contents 1 Main Definitions and Notations 1 1.1 Markov Chains.............................. 1 1.1.1 Transition
More informationIntroduction. log p θ (y k y 1:k 1 ), k=1
ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationAn Brief Overview of Particle Filtering
1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationSequential Bayesian Updating
BS2 Statistical Inference, Lectures 14 and 15, Hilary Term 2009 May 28, 2009 We consider data arriving sequentially X 1,..., X n,... and wish to update inference on an unknown parameter θ online. In a
More informationAdaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationSensor Fusion: Particle Filter
Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationParticle Filters: Convergence Results and High Dimensions
Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationEVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationLecture Particle Filters. Magnus Wiktorsson
Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)
More informationSequential Monte Carlo Methods (for DSGE Models)
Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationInferring biological dynamics Iterated filtering (IF)
Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating
More informationSwitching Regime Estimation
Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms
More informationVariantes Algorithmiques et Justifications Théoriques
complément scientifique École Doctorale MATISSE IRISA et INRIA, salle Markov jeudi 26 janvier 2012 Variantes Algorithmiques et Justifications Théoriques François Le Gland INRIA Rennes et IRMAR http://www.irisa.fr/aspi/legland/ed-matisse/
More informationParticle Filtering for Data-Driven Simulation and Optimization
Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October
More informationSequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes
Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes Ellida M. Khazen * 13395 Coppermine Rd. Apartment 410 Herndon VA 20171 USA Abstract
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More information9 Multi-Model State Estimation
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State
More informationSelf Adaptive Particle Filter
Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More information2D Image Processing (Extended) Kalman and particle filter
2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationMonte Carlo Approximation of Monte Carlo Filters
Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationLIMIT THEOREMS FOR WEIGHTED SAMPLES WITH APPLICATIONS TO SEQUENTIAL MONTE CARLO METHODS
ESAIM: ROCEEDIGS, September 2007, Vol.19, 101-107 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071913. LIMIT THEOREMS FOR WEIGHTED SAMLES WITH ALICATIOS TO SEQUETIAL MOTE CARLO METHODS R.
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationAppendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More information13 Notes on Markov Chain Monte Carlo
13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful
More informationIntroduction to Particle Filters for Data Assimilation
Introduction to Particle Filters for Data Assimilation Mike Dowd Dept of Mathematics & Statistics (and Dept of Oceanography Dalhousie University, Halifax, Canada STATMOS Summer School in Data Assimila5on,
More informationECE521 Lecture 19 HMM cont. Inference in HMM
ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationTSRT14: Sensor Fusion Lecture 8
TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,
More informationAUTOMOTIVE ENVIRONMENT SENSORS
AUTOMOTIVE ENVIRONMENT SENSORS Lecture 5. Localization BME KÖZLEKEDÉSMÉRNÖKI ÉS JÁRMŰMÉRNÖKI KAR 32708-2/2017/INTFIN SZÁMÚ EMMI ÁLTAL TÁMOGATOTT TANANYAG Related concepts Concepts related to vehicles moving
More informationThe Unscented Particle Filter
The Unscented Particle Filter Rudolph van der Merwe (OGI) Nando de Freitas (UC Bereley) Arnaud Doucet (Cambridge University) Eric Wan (OGI) Outline Optimal Estimation & Filtering Optimal Recursive Bayesian
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationThe Hierarchical Particle Filter
and Arnaud Doucet http://go.warwick.ac.uk/amjohansen/talks MCMSki V Lenzerheide 7th January 2016 Context & Outline Filtering in State-Space Models: SIR Particle Filters [GSS93] Block-Sampling Particle
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationHidden Markov Models. Vibhav Gogate The University of Texas at Dallas
Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More information