Computer Intensive Methods in Mathematical Statistics
|
|
- Lee Mason
- 6 years ago
- Views:
Transcription
1 Computer Intensive Methods in Mathematical Statistics Department of mathematics Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1)
2 Plan of today s lecture 1 Last time: the EM algorithm 2 Computer Intensive Methods (2)
3 Outline 1 Last time: the EM algorithm 2 Computer Intensive Methods (3)
4 Last time: the EM algorithm The algorithm goes as follows. Data: Initial value θ 0 Result: {θ l ; l N} for l 0, 1, 2,... do set Q θl (θ) E θl (log f θ (X, Y ) Y ); set θ l+1 arg max θ Θ Q θl (θ) end The two steps within the main loop are referred to as expectation and maximization steps, respectively. Computer Intensive Methods (4)
5 Last time: the EM inequality By rearranging the terms we obtain the following. Theorem (the EM inequality) For all (θ, θ ) Θ 2 it holds that l(θ) l(θ ) Q θ (θ) Q θ (θ ), where the equality is strict unless f θ (x Y ) = f θ (x Y ). Thus, by the very construction of {θ l ; l N} it is made sure that {l(θ l ); l N} is non-decreasing. Hence, the EM algorithm is a monotone optimization algorithm. Computer Intensive Methods (5)
6 EM in exponential families In order to be practically useful, the E- and M-steps of EM have to be feasible. A rather general context in which this is the case is the following. Definition (exponential family) The family {f θ (x, y); θ Θ} defines an exponential family if the complete data likelihood is of form f θ (x, y) = exp (ψ(θ) φ(x) c(θ)) h(x), where φ and ψ are (possibly) vector-valued functions on R d and Θ, respectively, and h is a non-negative real-valued function on R d. All these quantities may depend on y. Computer Intensive Methods (6)
7 EM in exponential families (cont d) The intermediate quantity becomes Q θ (θ) = ψ(θ) E θ (φ(x) Y ) c(θ) + E θ (log h(x) Y ), }{{} ( ) where ( ) does not depend on θ and may thus be ignored. Consequently, in order to be able to apply EM we need 1 to be able to compute the smoothed sufficient statistics τ = E θ (φ(x) Y ) = φ(x)f θ (x Y ), 2 maximization of θ ψ(θ) τ c(θ) to be feasible for all τ. Computer Intensive Methods (7)
8 Example: nonlinear Gaussian model Let (y i ) n i=1 be independent observations of a random variable Y = h(x) + σ y ε y, where h is a possibly nonlinear function and X = µ + σ x ε x is not observable. Here ε x and ε y are independent standard Gaussian noise variables. The parameters θ = (µ, σ 2 x) governing the distribution of the unobservable variable X are unknown while it is known that σ y =.4. Computer Intensive Methods (8)
9 Example: nonlinear Gaussian model (cont.) In this case the complete data likelihood belongs to an exponential family with ( n ) n ( φ(x 1:n ) = xi 2, x i, ψ(θ) = 1 2σx 2, µ ) σx 2, and i=1 i=1 c(θ) = n 2 log σ2 x + nµ2 2σx 2. Letting τ i = E θl (φ i (X 1:n ) Y 1:n ), i {1, 2}, leads to µ l+1 = τ 2 n, (σ 2 x) l+1 = τ 1 n µ2 l+1. Computer Intensive Methods (9)
10 Example: nonlinear Gaussian model (cont.) Since the smoothed sufficient statistics are of additive form it holds that, e.g., τ 1 = E θl ( n i=1 X 2 i Y 1:n ) = n i=1 E θl ( X 2 i Y i ) (and similarly for τ 2 ). However, computing expectations under ( ) f θl (x i y i ) exp 1 2(σy) 2 {y i h(x i )} 2 1 l 2σy 2 {x i µ l } is in general infeasible (i.e. when the transformation h is a general nonlinear function). Computer Intensive Methods (10)
11 Example: nonlinear Gaussian model (cont.) Thus, within each iteration of EM we sample each component f θl (x i y i ) using MH. For simplicity we use the independent proposal r(x i ) = f θl (x i ), yielding the MH acceptance probability α(x (k) i, Xi ) = 1 f θ l (Y i Xi ) f θl (Y i X (k) i ) = 1 exp ( ) 1 2(σy) 2 {h 2 (Xi ) h 2 (X (k) i ) + 2Y i (h(x (k) i ) h(xi ))} 2. l Computer Intensive Methods (11)
12 Example: nonlinear Gaussian model (cont.) Data: Initial value θ 0 ; Y 1:n Result: {θ l ; l N} for l 0, 1, 2,... do set ˆτ j 0, j; for i 1,..., n do run an MH sampler targeting f θl (x i Y i ) (X (k) i ) N l set ˆτ 1 ˆτ 1 + N l (k) k=1 (X i ) 2 /N l ; set ˆτ 2 ˆτ 2 + N l k=1 X (k) i /N l ; end set µ l+1 ˆτ 2 /n; set (σ 2 x) l+1 ˆτ 1 /n µ 2 l+1 ; end k=1 ; Computer Intensive Methods (12)
13 Example: nonlinear Gaussian model (cont.) In order to assess the performance of this Monte Carlo EM algorithm we focus on the linear case, h(x) = x. A data record comprising n = 40 values was produced by simulation under (µ, σ x) = (1,.4) with σ y =.4. In this case, the true MLE is known and given by (check this!) ˆµ(Y 1:n ) = 1 n ˆσ 2 x(y 1:n ) = 1 n n Y i = 1.04, i=1 n {Y i ˆµ(Y 1:n )} =.27. i=1 Computer Intensive Methods (13)
14 Example: nonlinear Gaussian model (cont.) µ estimate EM interation index Figure: EM learning trajectory (blue curve) for µ in the case h(x) = x. Red-dashed line indicates true MLE. N l = 200 for all iterations. Computer Intensive Methods (14)
15 Example: nonlinear Gaussian model (cont.) σ 2 x estimate EM interation index Figure: EM learning trajectory (blue curve) for σ 2 x in the case h(x) = x. Red-dashed line indicates true MLE. N l = 200 for all iterations. Computer Intensive Methods (15)
16 Averaging Roughly, for the MCEM algorithm one may prove that the variance of θ l ˆθ (where ˆθ is the MLE) is of order 1/N l. In the idealised situation where the estimates (θ l ) l are uncorrelated (which is not the case here) one may obtain an improved estimator of ˆθ by combining the individual estimates θ l in proportion of the inverse of their variance. Starting the averaging at iteration l 0 leads to θ l = l l =l 0 N l l l =l 0 N l θ l, l l 0. In the idealised situation the variance of θ l decreases as 1/ l l =l 0 N l (= total number of simulations). Computer Intensive Methods (16)
17 Example: nonlinear Gaussian model (cont.) µ estimate EM interation index Figure: EM learning trajectory (blue curve) for µ in the case h(x) = x. Red-dashed line indicates true MLE. Averaging after l 0 = 30 iterations. Computer Intensive Methods (17)
18 Example: nonlinear Gaussian model (cont.) σ 2 x estimate EM interation index Figure: EM learning trajectory (blue curve) for σ 2 x in the case h(x) = x. Red-dashed line indicates true MLE. Averaging after l 0 = 30 iterations. Computer Intensive Methods (18)
19 Outline 1 Last time: the EM algorithm 2 Computer Intensive Methods (19)
20 Outline 1 Last time: the EM algorithm 2 Computer Intensive Methods (20)
21 General hidden Markov models (HMMs) A hidden Markov model (HMM) comprises 1 a Markov chain (X k ) k 0 with transition density q, i.e. X k+1 X k = x k q(x k+1 x k ), which is hidden away from us but partially observed through 2 an observation process (Y k ) k 0 such that conditionally on the chain (X k ) k 0, (i) the Y k s are independent with (ii) conditional distribution of each Y k depending on the corresponding X k only. The density of the conditional distribution Y k (X k ) k 0 d. = Y k X k will be denoted by p(y k x k ). Computer Intensive Methods (21)
22 General hidden Markov models (HMMs) A hidden Markov model (HMM) comprises 1 a Markov chain (X k ) k 0 with transition density q, i.e. X k+1 X k = x k q(x k+1 x k ), which is hidden away from us but partially observed through 2 an observation process (Y k ) k 0 such that conditionally on the chain (X k ) k 0, (i) the Y k s are independent with (ii) conditional distribution of each Y k depending on the corresponding X k only. The density of the conditional distribution Y k (X k ) k 0 d. = Y k X k will be denoted by p(y k x k ). Computer Intensive Methods (21)
23 General HMMs (cont.) Graphically: Y k 1 Y k Y k+1 (Observations)... X k 1 X k X k+1... (Markov chain) Computer Intensive Methods (22) Y k X k = x k p(y k x k ) X k+1 X k = x k q(x k+1 x k ) X 0 χ(x 0 ) (Observation density) (Transition density) (Initial distribution)
24 The smoothing distribution In an HMM, the smoothing distribution f n (x 0:n y 0:n ) is the conditional distribution of X 0:n given Y 0:n = y 0:n. Theorem (Smoothing distribution) f n (x 0:n y 0:n ) = χ(x 0)p(y 0 x 0 ) n k=1 p(y k x k )q(x k x k 1 ), L n (y 0:n ) where L n (y 0:n ) = density of the observations y 0:n n = χ(x 0 )p(y 0 x 0 ) p(y k x k )q(x k x k 1 ) dx 0:n. k=1 Computer Intensive Methods (23)
25 The goal We wish, for n N, to estimate the expected value τ n = E[h n (X 0:n ) Y 0:n ], under the smoothing distribution. Here h n (x 0:n ) is of additive form, that is n 1 h n (x 0:n ) = h i (x i:i+1 ). i=0 For instance sufficient statistics which are needed for the EM algorithm. Computer Intensive Methods (24)
26 The goal We wish, for n N, to estimate the expected value τ n = E[h n (X 0:n ) Y 0:n ], under the smoothing distribution. Here h n (x 0:n ) is of additive form, that is n 1 h n (x 0:n ) = h i (x i:i+1 ). i=0 For instance sufficient statistics which are needed for the EM algorithm. Computer Intensive Methods (24)
27 Using basic SISR Given particles and weights {(X i 0:n, ωi n)} N i=1 targeting f (x 0:n y 0:n ) we update this to estimate at time n + 1 using the following steps: Selection: Draw indices {I i n} N i=1 Mult({ωi n} N i=1 ) Mutation: For i = 1,..., N draw X i n+1 q( X Ii n n ) and set X i 0:n+1 = (X Ii n 0:n, X i n+1 ) Weighting: For i = 1,..., N set ω i n+1 = p(y n+1 X i n+1 ). Estimation: We estimate τ n+1 using ˆτ n+1 = N i=1 ω i n+1 N l=1 ωl n+1 h n+1 (X i 0:n+1). Computer Intensive Methods (25)
28 The problem of resampling We saw when comparing with the SIS algorithm that the resampling was necessary to achieve a stable estimate. But the resampling also depletes the historical trajectories. There will be an integer k such that at time n in the algorithm X0:k i = X j 0:k for all i and j. Computer Intensive Methods (26)
29 The problem of resampling We saw when comparing with the SIS algorithm that the resampling was necessary to achieve a stable estimate. But the resampling also depletes the historical trajectories. There will be an integer k such that at time n in the algorithm X0:k i = X j 0:k for all i and j. 1.5 D State Value Computer Intensive Methods (26) Time Index
30 Fixing the degeneracy The goal to fixing this problem is the following observation: Conditioned on the observations the hidden Markov chain is also a Markov chain in the reverse direction. We denote the transition density of this Markov chain by q (xk x k+1, y 0:k ), this density can be written as q (xk x k+1, y 0:k ) = f (x k y 0:k )q(x k+1 x k ) f (x y 0:k )q(x k+1 x )dx. Since we can estimate f (x k y 0:k ) well using SISR we can use the following particle approximation of the backward distribution q N (x i k x j k+1, y 0:k) = ω i k q(x j k+1 x i k ) N l=1 ωl k q(x j k+1 x l k ). Computer Intensive Methods (27)
31 Fixing the degeneracy The goal to fixing this problem is the following observation: Conditioned on the observations the hidden Markov chain is also a Markov chain in the reverse direction. We denote the transition density of this Markov chain by q (xk x k+1, y 0:k ), this density can be written as q (xk x k+1, y 0:k ) = f (x k y 0:k )q(x k+1 x k ) f (x y 0:k )q(x k+1 x )dx. Since we can estimate f (x k y 0:k ) well using SISR we can use the following particle approximation of the backward distribution q N (x i k x j k+1, y 0:k) = ω i k q(x j k+1 x i k ) N l=1 ωl k q(x j k+1 x l k ). Computer Intensive Methods (27)
32 Forward Filtering Backward Smoothing (FFBSm) Using this approximation of the transition density we get the Forward Filtering Backward Smoothing algorithm. First we need to run the SISR algorithm and save all the filter estimates {(xk i, ωi k )}N i=1 for k = 0,..., N. We then estimate τ n using ˆτ n = N i 0 =1 N n 1 i n=0 s=0 ω is s q(x i s+1 s+1 x is N l=1 ωl sq(x i s+1 s+1 x s) l s ) ωn in N h n (x i 0 0, x i 1 1,..., x l=1 ωl n in ) n Computer Intensive Methods (28)
33 A Forward only version The previous algorithm requires two passes of the data, first the forward filtering, then the backward smoothing. When working with functions of additive form it is possible to perform smoothing in the following way. Introduce the auxiliary function T k (x), which we define as T k (x) = E[h k (X 0:k ) Y 0:k, X k = x] We can update this function recursively by noting that T k+1 (x) = E[T k (X k ) + h k (X k, X k+1 ) Y 0:k, X k+1 = x], that is the expected value of the previous function with the next additive part under the backward distribution! Computer Intensive Methods (29)
34 Using this decomposition Given that we have estimates for the filter at time k, {(xk i, ωi k )}N i=1, and estimates of {T k(xk i )}N i=1. Propagate the filter estimates using one step of SISR algorithm to get {(xk+1 i, ωi k+1 )}N i=1, now for i = 1,..., N estimate the function T k+1 (xk+1 i ) using T k+1 (x i k+1 ) = N j=1 τ k+1 is estimated using ˆτ k+1 = ω j k q(x i k+1 x j k ) N l=1 ωl k q(x i k+1 x l k )(T k(x j k ) + h(x j k, x i k+1 )) N i=1 ω i k+1 N l=1 ωl k+1 T k+1 (x i k+1 ) Computer Intensive Methods (30)
35 Speeding up the Algorithm This algorithm works, but it is quite slow. Idea! Can we replace calculating the expected value with an MC estimator? Can we sample sufficiently fast from the backward distribution? Formally we wish to sample J such that given all particles and weights at time k and k + 1 and an index i P(J = j) = ω j k q(x i k+1 x j k ) N l=1 ωl k q(x i k+1 x l k ) We can do this efficiently using accept-reject sampling! Computer Intensive Methods (31)
36 Speeding up the Algorithm This algorithm works, but it is quite slow. Idea! Can we replace calculating the expected value with an MC estimator? Can we sample sufficiently fast from the backward distribution? Formally we wish to sample J such that given all particles and weights at time k and k + 1 and an index i P(J = j) = ω j k q(x i k+1 x j k ) N l=1 ωl k q(x i k+1 x l k ) We can do this efficiently using accept-reject sampling! Computer Intensive Methods (31)
37 Speeding up the Algorithm This algorithm works, but it is quite slow. Idea! Can we replace calculating the expected value with an MC estimator? Can we sample sufficiently fast from the backward distribution? Formally we wish to sample J such that given all particles and weights at time k and k + 1 and an index i P(J = j) = ω j k q(x i k+1 x j k ) N l=1 ωl k q(x i k+1 x l k ) We can do this efficiently using accept-reject sampling! Computer Intensive Methods (31)
38 The PaRIS algorithm Now we can describe the PaRIS algorithm. Given particles and weights {(xk i, ωi k )}N i=1 targeting the filter distribution at time k. Together with values {T k (xk i )}N i=1 estimating T k (x). We proceed by first taking one step using the SISR algorithm to get particles and weights {(xk+1 i, ωi k+1 )}n i=1 targeting the filter at time k + 1. For i = 1,..., N we draw Ñ indices {Ji }Ñi =1 from the previous slide and calculate T k+1 (x i k+1) = 1 Ñ Ñ i =1 (T k (x Ji k ) + h ) k (xk Ji, x k+1) i The estimate of τ k+1 is then given by ˆτ k+1 = N ω i k+1 N i=1 l=1 l k+1 T k+1 (x i k+1) Computer Intensive Methods (32)
39 The PaRIS algorithm (cont.) Using this algorithm we have an efficient algorithm for online estimation of the expected value under the joint smoothing distribution. We require the target function to be of additive form. The number of backward samples (Ñ) needed in the algorithm turns out to be 2. We can (under some additional assumptions) prove the following: The computational complexity grows linearly with N. The estimate is asymptotically consistent. We can find a CLT for the error which behaves like 1/ N. Computer Intensive Methods (33)
40 An Example, Paramter Estimation. It turns out that the PaRIS algorithm can be used efficently when performing online parameter estimation. We tested our method on the stochastic volatility model X t+1 = φx t + σv t+1, Y t = β exp(x t /2)U t, t N, where {V t } t N and {U t } t N are independent sequences of mutually independent standard Gaussian noise variables. Parameters to be estimated are θ = (φ, σ 2, β 2 ). Computer Intensive Methods (34)
41 Last time: the EM algorithm Parameter Estimation (cont.) φ time σ time 2 β time PaRIS-based RML Computer Intensive Methods (35)
42 End of the course That is it for the lectures in this course. Hope you have enjoyed it. Review of HA2 sent by mail to by 24 May 12:00:00 Exam 30 May, 14:00:00 19:00:00 Re-Exam 14 August, 08:00:00 13:00:00 Computer Intensive Methods (36)
Computer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationLecture Particle Filters. Magnus Wiktorsson
Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationcappe/
Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Particle Methods for Hidden Markov Models Olivier Cappé CNRS Lab. Trait. Commun. Inform. & ENST département Trait. Signal Image 46 rue Barrault,
More informationAn Brief Overview of Particle Filtering
1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationSequential Monte Carlo Methods (for DSGE Models)
Sequential Monte Carlo Methods (for DSGE Models) Frank Schorfheide University of Pennsylvania, PIER, CEPR, and NBER October 23, 2017 Some References These lectures use material from our joint work: Tempered
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationParticle Filtering for Data-Driven Simulation and Optimization
Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationMonte Carlo Approximation of Monte Carlo Filters
Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationSequential Bayesian Updating
BS2 Statistical Inference, Lectures 14 and 15, Hilary Term 2009 May 28, 2009 We consider data arriving sequentially X 1,..., X n,... and wish to update inference on an unknown parameter θ online. In a
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationAdaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationState Estimation of Linear and Nonlinear Dynamic Systems
State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationLecture 7: Optimal Smoothing
Department of Biomedical Engineering and Computational Science Aalto University March 17, 2011 Contents 1 What is Optimal Smoothing? 2 Bayesian Optimal Smoothing Equations 3 Rauch-Tung-Striebel Smoother
More informationBlind Equalization via Particle Filtering
Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte
More informationAn Introduction to Sequential Monte Carlo for Filtering and Smoothing
An Introduction to Sequential Monte Carlo for Filtering and Smoothing Olivier Cappé LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/ cappe/ Acknowlegdment: Eric Moulines (TELECOM ParisTech)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationTSRT14: Sensor Fusion Lecture 8
TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationState Space and Hidden Markov Models
State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationTheory of Stochastic Processes 8. Markov chain Monte Carlo
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationThe Hierarchical Particle Filter
and Arnaud Doucet http://go.warwick.ac.uk/amjohansen/talks MCMSki V Lenzerheide 7th January 2016 Context & Outline Filtering in State-Space Models: SIR Particle Filters [GSS93] Block-Sampling Particle
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationHidden Markov Models for precipitation
Hidden Markov Models for precipitation Pierre Ailliot Université de Brest Joint work with Peter Thomson Statistics Research Associates (NZ) Page 1 Context Part of the project Climate-related risks for
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationChapter 8.8.1: A factorization theorem
LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More information