Machine Learning 4771
|
|
- Iris Pope
- 5 years ago
- Views:
Transcription
1 Machine Learning 4771 Instructor: ony Jebara
2 Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion
3 Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object pose y=pixel coordinates Industrial: x=state of steel y=temperature/pressure Military: x=target pos n y=radar returns Bio: x=protein levels y=gene expression levels Vision example (Rao)
4 Linear Dynamical Systems A Linear Dynamical System (LDS) or Kalman Filter (KF) has linear state dynamics and linear outputs: x t = Ax t 1 +Gw t w t N. 0,Q y t = Cx t + v t v t N. 0,R x t = state n y t = observed output p w v t = measurement noise p t = process noise n Called Linear-Gauss Markov Model (See Markov property?) Stationary LDS: A,C,G,Q,R do not change with t Non-Stationary LDS: they A,C,G,Q,R depend on t
5 Gaussian Marginals/Conditionals Conditional & marginal from joint: See derivation in Jordan 12.4: p( z µ,σ) 1 = exp 1 ( 2π) D/2 Σ z µ 2 p( x,y) 1 = exp 1 x ( 2π) D/2 Σ 2 y µ x µ y p( x) 1 = exp 1 ( 2π) x µ D x /2 Σ 2 x xx p y x = p ( x,y ) p( x) = p ( x,y ) p( x,y) p y x Σ 1 ( z µ ) Σ 1 xx Σ xx Σ xy x µ x Σ yx Σ yy ( 1 ( x µ x ),Σ yy Σ yx Σ 1 xx Σ ) xy = N y µ y + Σ yx Σ xx y = µ y + Σ yx Σ 1 xx y = Σ yx Σ 1 xx x + noise ( x µ ) x + noise 1 x y if zero-mean y µ x µ y
6 LDS as a Gaussian More statistically, the LDS is a large Gaussian distribution All variables are related by linear conditioned Gaussians: x t = Ax t 1 +Gw t w t N. 0,Q y t = Cx t + v t v t N (. 0,R ) = N ( x t Ax t 1,Q) p x t x t 1 = N ( y t Cx t,r) p y t x t We are assuming G=I, to simplify Jordan s approach he first hidden state has no parents, but still needs a Gaussian distribution: p( x ) 0 = N ( x t 0,Q ) Without loss of generality: all are zero-mean Gaussians
7 LDS as a Gaussian LDS has the following graphical model and conditionals: = N ( x t Ax t 1,Q) p x t x t 1 p( y t x ) t = N ( y t Cx t,r) p( x ) 0 = N ( x 0 0,P ) 0 Note (as with HMMs) stationary since A,C,R,Q constant Products of Gaussian distributions form Gaussian p( X ) U = p( x ) 0 p( x t x ) t 1 t=1 p( y t x ) t = N X U 0,Σ he graphical model describes a particular factorization of this large Gaussian over all variables. his factorization forms a huge yet sparse overall covariance Σ t=0
8 Properties of the LDS Since all variables in the LDS are Gaussian, we summarize their conditional probabilities not by tables p(q t q t-1 ) but by Mean & Covariance (Expectations or Moments) of Gaussian p( x t y 0, y ) t ' = N ( x t ˆx t t ',P ) t t ' Conditional Mean Conditional Cov Condition on Information until t observations y 0,,y t ˆx t t ' = E p ( xt y 0 y t ' ) x P t t t ' = E ( p xt x ( y 0 y t ' ) t ˆx )( t t ' x t ˆx ) t t ' Lyapunov Eqn shows evolution of unconditional covariance Σ t+1 = E x t+1 x t+1 = E ( Ax + w )( Ax t t t + w ) t = AE x t x t A + 2AE x t w t + E w w If evals of A <1 t t = AΣ t A + 0 +Q Σ = AΣ A +Q
9 Steady-State Covariance Consider Lyapunov at t going to infinity Σ t+1 = AΣ t A +Q If evals of A <1 Σ = AΣ A +Q Can solve for covariance at steady state (t = infinity) Example (Boyd): pick A and Q, solve for steady covariance A = Q = Σ 0 1 = initialize Σ 0 = 0 I Σ 0 = 100 I
10 LDS: Basic Operations Recall we did 3 basic things with Bayesian networks: 1) Evaluate: given y 0,,y compute likelihood p(y 0,,y ) 2) Decode: given y 0,,y compute p(q t ) 3) Max Likelihood: given y 0,,y learn parameters θ = p( q,y) p y q0 q We want 3 basic things with our LDSs: 1) Evaluate: given y 0,,y compute likelihood p(y 0,,y ) 2) Filter/Smooth: given y 0,,y compute p(x t ) 3) Max Likelihood: given y 0,,y learn parameters θ p( y) = p( x,y)dx 0 dx x 0 x Will do Filtering traditionally but Junction ree does it all
11 Kalman Filtering Used in online tracking, information up to time t Filtering: given y 0,,y t compute p(x t ) incrementally via ˆx t t = E p ( X y0 y t ) x t P t t = E p X y0 x ( y t ) t ˆx t t Update with 2 steps: ( ˆx t t,p ) t t ˆx t+1 t,p t+1 t ( x t ˆx ) t t ( ˆx t+1 t+1,p ) t+1 t+1 ime Update: p(x t+1 ) moments at t+1 given same y s until t ˆx t+1 t = E p ( X y0 y t ) x t+1 = E p( X y 0 y t ) Ax + w t t = AE p ( X y0 y t ) x t + 0 = Aˆx t t P t+1 t = E ( p ( X y0 x y t ) t+1 ˆx )( t+1 t x t+1 ˆx ) t+1 t = E ( p ( X y0 Ax y t ) t + w t Aˆx )( t t ) = AP t t A +Q
12 Kalman Filtering Measurement Update: p(x t+1 ) moments given y t+1 E p X,yt+1 y 0 y t So: p Conditional Mean y t+1 = E p( X,y t+1 y 0 y t ) Cx + v t+1 t+1 = Cˆx + 0 t+1 t Conditional Covariance E ( p X,yt+1 y ( y 0 y t ) t+1 ŷ ) t+1 t ( ) = E p X,yt+1 Cx ( y 0 y t ) t+1 + v t+1 Cˆx t+1 t Conditional Correlation E p ( X,yt+1 y 0 y t ) x t+1 y t+1 ( y t+1 ŷ ) t+1 t x t+1 ˆx t+1 t y 0 y t = N x t+1 y t+1 ˆx t+1 t Cˆx t+1 t = CP t+1 t, P t+1 t CP t+1 t = CP C + R t+1 t P t+1 t C CP t+1 t C + R
13 Kalman Filtering Measurement Update: x p t+1 y y t+1 0 y t = N x t+1 y t+1 Want to condition this to get: Recall conditioning formula: p x y = N x µ x + Σ xy Σ yy ˆx t+1 t+1 = ˆx t+1 t + K t+1 ˆx t+1 t Cˆx t+1 t, P t+1 t CP t+1 t p x t+1,y t+1 P t+1 t C CP t+1 t C + R p( x t+1,y t+1 ) ( 1 ( y µ y ),Σ xx Σ xy Σ 1 yy Σ ) yx Next Conditional Mean y t+1 Cˆx t+1 t ˆx t+1 t = Aˆx t t P t+1 t = AP t t A +Q Next Conditional Cov where Kalman gain is K = P C t+1 t+1 t CP C t+1 t + R P t+1 t+1 = P t+1 t K t+1 CP t+1 t 1
14 Kalman Filtering (summary) Start at time t=0 and run forward until t= ˆx t+1 t = Aˆx t t P t+1 t = AP t t A +Q K t+1 = P t+1 t C ˆx t+1 t+1 = ˆx t+1 t + K t+1 ( CP t+1 t C + R) 1 P t+1 t+1 = P t+1 t K t+1 CP t+1 t ( y t+1 Cˆx ) t+1 t hen, the log-likelihood is (using the Gaussian model): loglikelihood = t=0 log N ( y t Cˆx t t 1,CP t t 1 C + R)
15 Filtering & Smoothing We have a recursive formula for going from ( ˆx t t,p ) t t ˆx t+1 t,p t+1 t ( ˆx t+1 t+1,p ) t+1 t+1 Initialize with: ( ˆx 0 1 = 0,P 0 1 = P ) 0 and loop updates Filtering: lets us compute the distribution over hidden states X online as we get more measurements given y 0,,y t compute p(x t ) p(x t ) does not get any information about future y s Smoothing: assume we have access to the whole sequence at once, can get BEER estimates for the hidden states X given y 0,,y compute p(x t ) Not real time tracking, since have all observations
16 RS Smoothing Smoothing goes back in time, propagates expectations we know at the last observation to earlier states Procedure: ( ˆx t+1,p ) t+1 ( ˆx t,p ) t Filtering already gave: ( ˆx 0 0,P 0 0 ) ( ˆx t t,p t t ) ( ˆx,P ) Recall: ˆx t+1 t = Aˆx t t so E ( P X y0 x (,,y t ) t t ˆx )( t t x t+1 t ˆx ) t+1 t = P t t A hus, have joint probability: x p t x y x t+1 0 y t = N t ˆx t t P t t P t t A, x t+1 ˆx t+1 t AP t t P t+1 t Using conditioning rule: p( x t x t+1,y 0 y t ) = N ( x t ˆx t t + L ( t x t+1 ˆx t+1 t ),P t t L t P t+1 t L ) t where we define L t = P t t A 1 P t+1 t
17 RS Smoothing Iteratively get the conditional mean on all sequence ˆx t = E P ( X y0,,y ) x t = x t Same for conditional cov: x p x y t (,,y t 0 ) x t = x t p x t,x t+1 y 0,,y x dx t dx xt+1 t+1 t = x t p( x t x t+1,y 0,,y ) dx t p( x t+1 y 0,,y )dx t+1 xt+1 xt+1 = x t p x t x t+1,y 0,,y t x t = ˆx t t + L t x t+1 ˆx xt+1 t+1 t = ˆx t t + L t ˆx t+1 ˆx t+1 t dx t dx t p( x t+1 y 0,,y )dx t+1 p ( x y,,y t+1 0 )dx t+1 P t = P t t + L ( t P t+1 P t+1 t )L t End up with: ( ˆx 0,P 0 ) ( ˆx t,p t ) ( ˆx,P )
18 Junction ree Approach Instead of deriving filtering & smoothing could use JA forward collect = filter, backward distribute = smooth Same directed graphical model as HMM Get same Junction tree as HMM ψ ( x,x 0 1) ψ ( x,x 1 2) ψ ( x,x 1 ) ψ x 0,y 0 φ ( x 1) ς( x 1 ) ς ( x 2) ς ( x ) φ x 0 ψ( x 2,y 2 ) ψ( x,y ) ψ x 1,y 1 Except now cliques & separators are not discrete tables but continuous functions of Gaussian form
19 Junction ree Approach Rewrite Gaussians in canonical or natural parameters ξ,k instead of mean and covariance µ,σ N ( x µ,σ) exp 1 ( x µ 2 )Σ 1 ( x µ ) ξ = Σ 1 µ exp( ξ x 1 2 x Kx) K = Σ 1 property: exp( ξ 1 x 1 2 x K 1 x)exp ξ 2 x 1 2 x K 2 x Initialize all cliques to the conditional Gaussians = exp 1 2 x t+1 Ax t ψ x t+1,x t = exp 1 y Cx 2 t t ζ( x ) t = φ( x ) t = 1 ψ x t,y t = exp( ξ 1+2 x 1 2 x K 1+2 x) ( Q 1 ( x t+1 Ax )) t p x t+1 x t ( R 1 ( x t Cx )) t p y t x t
20 Junction ree Algorithm Canonical-form Gaussians make it easy to compute the continuous version of JA message passing Collect Distribute φ * S = V \S ψ V ψ W * ψ V * = φ * S ψ φ W S = ψ V φ * S = ψ V \S V ψ W * ψ V * = φ * S ψ φ W S = ψ V Get same implementation as previous filtering & smoothing All cliques and separators become the marginals ψ( x t+1,x ) ** t p( x t+1,x t y 0,,y ) ζ( x ) ** t p( x t y 0,,y )
21 Expectation Maximization If we knew X states, maximum complete likelihood is easy: log p( X,Y ) = log p( x ) 0 p( x t x ) t=1 t 1 p( y t x ) t=0 t = log p( x ) 0 + log p( x t x ) t 1 + log p y t x t 1 1 y 2 t Cx t + t=0 2 For example, take derivatives over C and set to 0: log p = 1 ( y C C 2 t Cx ) t R 1 ( y t Cx ) t + t=0 2 t=1 1 x P x 0 1 log P = t=1 2R 1 t=0 t=0 ( x t Ax ) t 1 Q 1 x t Ax t 1 R 1 ( y t Cx ) t log R 1 log R 1 ( y t Cx t )x t C = ( y t x ) t=0 t x t x t=0 t log Q 1 1
22 Expectation Maximization But, we don t know hidden states, maximize incomplete E-step: Use current model A,C,Q,R to get expectations smoothing gives: ( ˆx 0,P 0 ) ( ˆx t,p t ) ( ˆx,P ) M-step: Maximize the expected complete likelihood (replace x s & outerproducts with their expectations) C = ( y t x )( t=0 t x t x ) 1 t=0 t ( y t ˆx )( t=0 t P ) 1 t=0 t ( ) P t 1 1 A = E x t 1 x t etc... t=1 t=1 actually need pairwise marginals (see Ghahramani Paper) p( x t 1,x t y 0,,y ) = N x t 1 x t ˆx t 1 P t 1 P t 1,t, ˆx t P t,t 1 P t and iterate until convergence
23 LDS Smoothing Example Given some data y 0, y Fit LDS model with EM A,C,Q,R,P 0 hen do smoothing,( ˆx t,p t ), Show expected y s,( Cˆx t ),
24 Nonlinear Dynamical Systems What about nonlinear dynamics? LDS is simplistic since many real phenomena have nonlinear state evolution and nonlinear output given state x t = a x t 1 y t = c x t + w t w t N (. + v t v t N. 0,R 0,Q ) x t = state n w t = process noise n y t = observed output p v t = measurement noise p ypically we prespecify the functions a() and c() Also called an Extended Kalman Filter Good news: the basic filtering & smoothing algorithms stay the same and we can recover hidden states Bad news: not guaranteed optimal, EM learning harder
25 Extended Kalman Filtering he regular Kalman filter becomes non-stationary We need to approximate the nonlinear a() and c() functions at each step in time with their best constant (linear) estimate A and C given our guess of hidden state A t = a ( x ) C x ˆx t = c ( x ) t 1 t 1 x ˆx t 1 t 1 Change filtering equations as follows ˆx t+1 t = a ( ˆx ) t t P t+1 t = A t P t t A t +Q ˆx t+1 t+1 = ˆx t+1 t + K t+1 ( y t+1 c( ˆx )) t+1 t P t+1 t+1 = P t+1 t K t+1 C t P t+1 t K = P C t+1 t+1 t t ( C t P t+1 t C t + R) 1
Probabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1
Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be
More informationLearning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )
Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationLecture 7: Optimal Smoothing
Department of Biomedical Engineering and Computational Science Aalto University March 17, 2011 Contents 1 What is Optimal Smoothing? 2 Bayesian Optimal Smoothing Equations 3 Rauch-Tung-Striebel Smoother
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationCS281A/Stat241A Lecture 17
CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationThe Kalman Filter ImPr Talk
The Kalman Filter ImPr Talk Ged Ridgway Centre for Medical Image Computing November, 2006 Outline What is the Kalman Filter? State Space Models Kalman Filter Overview Bayesian Updating of Estimates Kalman
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationLEARNING DYNAMIC SYSTEMS: MARKOV MODELS
LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily
More information1 What is a hidden Markov model?
1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and
More informationParticle Filtering for Data-Driven Simulation and Optimization
Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October
More informationRobotics. Mobile Robotics. Marc Toussaint U Stuttgart
Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand
More informationPROBABILISTIC REASONING OVER TIME
PROBABILISTIC REASONING OVER TIME In which we try to interpret the present, understand the past, and perhaps predict the future, even when very little is crystal clear. Outline Time and uncertainty Inference:
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationHidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1
Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1 Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationGraphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !
Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More information1 Kalman Filter Introduction
1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationMachine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017
1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationAn EM algorithm for Gaussian Markov Random Fields
An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationMultiscale Systems Engineering Research Group
Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationCRF for human beings
CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationA Study of the Kalman Filter applied to Visual Tracking
A Study of the Kalman Filter applied to Visual Tracking Nathan Funk University of Alberta Project for CMPUT 652 December 7, 2003 Abstract This project analyzes the applicability of the Kalman filter as
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationGraphical Models for Statistical Inference and Data Assimilation
Graphical Models for Statistical Inference and Data Assimilation Alexander T. Ihler a Sergey Kirshner a Michael Ghil b,c Andrew W. Robertson d Padhraic Smyth a a Donald Bren School of Information and Computer
More informationMachine Learning for Data Science (CS4786) Lecture 19
Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 18 The Junction Tree Algorithm Collect & Distribute Algorithmic Complexity ArgMax Junction Tree Algorithm Review: Junction Tree Algorithm end message
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationGraphical Models for Statistical Inference and Data Assimilation
Graphical Models for Statistical Inference and Data Assimilation Alexander T. Ihler a Sergey Kirshner b Michael Ghil c,d Andrew W. Robertson e Padhraic Smyth a a Donald Bren School of Information and Computer
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationEM-algorithm for Training of State-space Models with Application to Time Series Prediction
EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationState Estimation of Linear and Nonlinear Dynamic Systems
State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationProbabilistic & Unsupervised Learning. Latent Variable Models for Time Series
Probabilistic & Unsupervised Learning Latent Variable Models for Time Series Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationDynamic models 1 Kalman filters, linearization,
Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models
More informationMiscellaneous. Regarding reading materials. Again, ask questions (if you have) and ask them earlier
Miscellaneous Regarding reading materials Reading materials will be provided as needed If no assigned reading, it means I think the material from class is sufficient Should be enough for you to do your
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationDynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models
6 Dependent data The AR(p) model The MA(q) model Hidden Markov models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationU-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models
U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationData assimilation with and without a model
Data assimilation with and without a model Tim Sauer George Mason University Parameter estimation and UQ U. Pittsburgh Mar. 5, 2017 Partially supported by NSF Most of this work is due to: Tyrus Berry,
More informationStatistical Filtering and Control for AI and Robotics. Part II. Linear methods for regression & Kalman filtering
Statistical Filtering and Control for AI and Robotics Part II. Linear methods for regression & Kalman filtering Riccardo Muradore 1 / 66 Outline Linear Methods for Regression Gaussian filter Stochastic
More information