Hidden Markov Models (recap BNs)

Similar documents
Hidden Markov models 1

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

PROBABILISTIC REASONING OVER TIME

Temporal probability models. Chapter 15, Sections 1 5 1

CS 188: Artificial Intelligence

CSE 473: Artificial Intelligence

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

CSEP 573: Artificial Intelligence

Bayesian Networks BY: MOHAMAD ALSABBAGH

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Temporal probability models. Chapter 15

CS532, Winter 2010 Hidden Markov Models

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Approximate Inference

Probabilistic representation and reasoning

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 188: Artificial Intelligence Spring 2009

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Probabilistic representation and reasoning

Artificial Intelligence

Introduction to Artificial Intelligence (AI)

CS 188: Artificial Intelligence Spring Announcements

Markov Chains and Hidden Markov Models

Bayesian Methods in Artificial Intelligence

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

CS711008Z Algorithm Design and Analysis

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

STA 4273H: Statistical Machine Learning

Hidden Markov Models,99,100! Markov, here I come!

CS Lecture 3. More Bayesian Networks

Probabilistic Graphical Models

Introduction to Artificial Intelligence (AI)

CS 188: Artificial Intelligence Fall 2011

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Linear Dynamical Systems

CS 5522: Artificial Intelligence II

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

CS 5522: Artificial Intelligence II

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Markov Models and Hidden Markov Models

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

Hidden Markov Models. George Konidaris

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

9 Forward-backward algorithm, sum-product on factor graphs

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

CSE 473: Artificial Intelligence Spring 2014

Machine Learning for Data Science (CS4786) Lecture 19

Dynamic Bayesian Networks and Hidden Markov Models Decision Trees

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Reasoning Under Uncertainty: Bayesian networks intro

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

CS 188: Artificial Intelligence Fall 2009

STA 414/2104: Machine Learning

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Intelligent Systems (AI-2)

STA 4273H: Statistical Machine Learning

State Space and Hidden Markov Models

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

Rapid Introduction to Machine Learning/ Deep Learning

Introduction to Machine Learning CMU-10701

Bayesian Machine Learning - Lecture 7

CS 7180: Behavioral Modeling and Decision- making in AI

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Intelligent Systems (AI-2)

Intelligent Systems (AI-2)

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Directed Probabilistic Graphical Models CMSC 678 UMBC

Final Exam December 12, 2017

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Introduction to Mobile Robotics Probabilistic Robotics

p L yi z n m x N n xi

Probability. CS 3793/5233 Artificial Intelligence Probability 1

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

CS 5522: Artificial Intelligence II

Bayesian networks. Chapter Chapter

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Lecture 2: From Linear Regression to Kalman Filter and Beyond

CS 188: Artificial Intelligence. Bayes Nets

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Machine Learning Lecture 14

CSE 473: Ar+ficial Intelligence

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Learning in Bayesian Networks

Final Examination CS 540-2: Introduction to Artificial Intelligence

Transcription:

Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1

A robot s view of the world... 9000 8000 Scan data Robot Distance in mm relative to robot position 7000 6000 5000 4000 3000 2000 1000 0 1000 5000 4000 3000 2000 1000 0 1000 2000 3000 Distance in mm relative to robot position 2

Bayes Rule and conditional independence P( PersonLeg #pointsinrange curvaturecorrect) = α P( #pointsinrange curvaturecorrect PersonLeg) P( PersonLeg) = α P( #pointsinrange PersonLeg) P( curvaturecorrect PersonLeg) P( PersonLeg) An example of a naive Bayes model: P( Cause, Effect1,..., Effectn) = P( Cause) i P( Effecti Cause) Person leg Cause #Points Curvature Effect 1... Effect n The total number of parameters is linear in n 3

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per random variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P( Xi Parents( Xi)) In the simplest case, conditional distribution represented as a conditional probability table ( CPT) giving the distribution over Xi for each combination of parent values 4

Tracking and associating... while moving... Distance in mm relative to robot start position 5000 4000 3000 2000 1000 0 Target 0 Target 1 Target 2 Robot Robot (1) Distance in mm relative to robot start position 5000 4000 3000 2000 1000 0 Target 3 Target 4 Robot (1) Robot Robot (2) 1000 1000 0 1000 2000 3000 4000 5000 Distance in mm relative to robot start position 1000 1000 0 1000 2000 3000 4000 5000 Distance in mm relative to robot start position Distance in mm relative to robot start position 5000 4000 3000 2000 1000 0 Target 5 Target 6 Target 7 Target 8 Robot (1) Robot 1000 1000 0 1000 2000 3000 4000 5000 Distance in mm relative to robot start position 5

Probabilistic reasoning over time... means to keep track of the current state of - a process (temperature controller, other controllers) - an agent with respect to the world (localisation of a robot in some world ) in order to make predictions or to simply understand what might have caused this current state. This involves both a transition model (how the state is assumed to change) and a sensor model (how observations / percepts are related to the world state). Previously: the focus was on what was possible to happen (e.g., search), now it is on what is likely / unlikely to happen the focus was on static worlds (Bayesian networks), now we look at dynamic processes where everything (state AND observations) depend on time. 6

Three classes of approaches Hidden Markov models (Particle filters) Kalman filters Dynamic Bayesian networks (cover actually the other two as special cases) But first, some basics... 7

Reasoning over time With Xt the current state description at time t Et the evidence obtained at time t we can describe a state transition model and a sensor model that we can use to model a time step sequence - a chain of states and sensor readings according to discrete time steps - so that we can understand the ongoing process. We assume to start out in X0, but evidence will only arrive after the first state transition is made: E1 is then the first piece of evidence to be plugged into the chain. The general transition model would then specify P( Xt X0:t-1)... this would mean we need full joint distributions over all time steps... or not? X

The Markov assumption A process is Markov (i.e., complies with the Markov assumption), when any given state Xt depends only on a finite and fixed number of previous states. X t 2 X t 1 X t X t+1 X t+2 X t 2 X t 1 X t X t+1 X t+2 8

A first-order Markov chain as Bayesian network Rt-1 P(Rt Rt-1) cause / state T 0.7 F 0.3 Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 effect / evidence Rt P(Ut Rt) T 0.9 F 0.2 9

Inference for any t With P( X0) the prior probability distribution in t=0 (i.e., the initial state model), P( Xi Xi-1) the state transition model and P( Ei Xi) the sensor model we have the complete joint distribution for all variables for any t. t P( X0:t, E1:t) = P( X0) P( Xi Xi-1) P( Ei Xi) i=1 X

The Markov assumption First-order Markov chain: State variables (at t) contain ALL information needed for t+1. Sometimes, that is too strong an assumption (or too weak in some sense). Hence, increase either the order (second-order Markov chain) or add information into the state variable(s) (R could include also Season, Humidity, Pressure, Location, instead of only Rain ) Note: It is possible to express an increase in order by increasing the number of state variables, keeping the order fixed - for the umbrella world you could use R = <RainYesterday, RainToday> When things get too complex, rather add another sensor (e.g., observe coats). X

Inference in temporal models - what can we use all this for? Filtering: Finding the belief state, or doing state estimation, i.e., computing the posterior distribution over the most recent state, using evidence up to this point: P( Xt e1:t) Predicting: Computing the posterior over a future state, using evidence up to this point: P( Xt+k e1:t) for some k>0 (can be used to evaluate course of action based on predicted outcome) Smoothing: Computing the posterior over a past state, i.e., understand the past, given information up to this point: P( Xk e1:t) for some k with 0 k < t Explaining: Find the best explanation for a series of observations, i.e., computing argmaxx1:t P( x1:t e1:t) - can be efficiently handled by Viterbi algorithm Learning: If sensor and / or transition model are not known, they can be learned from observations (by-product of inference in Bayesian network - both static or dynamic). Inference gives estimates, estimates are used to update the model, updated models provide new estimates (by inference). Iterate until converging - again, this is an instance of the EM-algorithm. 10

Filtering: Prediction & update (FORWARD-step) P( Xt+1 e1:t+1) = f( P( Xt e1:t), et+1) = f1:t+1 = P( Xt+1 e1:t, et+1) (decompose) = α P( et+1 Xt+1, e1:t)p( Xt+1 e1:t) (Bayes Rule) = α P( et+1 Xt+1) P( Xt+1 e1:t) (1. Markov assumption (sensor model), 2. one-step prediction) = α P( et+1 Xt+1) P( Xt+1 xt, e1:t) P( xt e1:t) (sum over atomic events for X) xt = α P( et+1 Xt+1) P( Xt+1 xt) P( xt e1:t) (Markov assumption) xt P( Xt e1:t) ( forward message, propagated recursively f1:t+1 = α FORWARD( f1:t, et+1) f1:0 = P( X0) through forward step function ) 11

Prediction - filtering without the update P( Xt+k+1 e1:t) = P( Xt+k+1 xt) P( xt+k e1:t) (k-step prediction) xt+k For large k the prediction gets quite blurry and will eventually converge into a stationary distribution at the mixing point, i.e., the point in time when this convergence is reached - in some sense this is when everything is possible. 12

Smoothing: explaining backward P( Xk e1:t) = fb( Xk, e1:k, P( ek+1:t Xk)) with 0 k < t (understand the past from the recent past) = P( Xk e1:k, ek+1:t) (decompose) = α P( Xk e1:k) P( ek+1:t Xk, e1:k) (Bayes Rule) = α P( Xk e1:k) P( ek+1:t Xk) (Markov assumption) = α f1:k bk+1:t (forward-message backward-message) 13

bk+1:t = P( ek+1:t Xk) Smoothing: calculating backward message = P( ek+1:t Xk, xk+1) P( xk+1 Xk) (conditioning on Xk+1, i.e., looking backward ) xk+1 = P( ek+1:t xk+1) P( xk+1 Xk) (cond. indep. - Markov assumption) xk+1 = P( ek+1, ek+2:t xk+1) P( xk+1 Xk) (decompose) xk+1 = P( ek+1 xk+1) P( ek+2:t xk+1) P( xk+1 Xk) (1. sensor, 2. backward msg, 3. transition model) xk+1 = BACKWARD( bk+2:t, ek+1) P( ek+1:t Xk) ( backward message, propagated recursively) bk+1:t = BACKWARD( bk+2:t, ek+1) (through backward step function ) bt+1:t = P( et+1:t Xt) = P( Xt) = 1 14

Smoothing in a nutshell : Forward-Backward-algorithm P( Xk e1:t) = fb( e1:k, P( ek+1:t Xk)) with 0 k < t understand the past from the recent past = α f1:k bk+1:t by first filtering (forward) until step k, then explaining backward from t to k+1 Obviously, it is a good idea to store the filtering (forward) results for later smoothing Drawback of the algorithm: not really suitable for online use (t is growing,...) Consequently, try with fixed-lag-smoothing (keeping a fixed-length window, BUT: simple Forward-Backward does not really do it efficiently - here we need HMMs) 15

HMM Hidden Markov models A specific class of models (sensor and transition) to be plugged into the previously discussed algorithms - which makes the algorithms more specific as well! Main idea: The state is represented by a single discrete random variable, taking on values that represent the (all) possible states of the world. Complex states, e.g., the location and the heading of a robot in a grid world can be merged into one variable; the possible values are then all possible tuples of the values for each original single variable. 16

HMM State transition and sensor model We get the following notation: Xt the state at time t, taking on values 1... S, with S the number of possible states / values. Et the observation at time t The transition model P( Xt Xt-1 ) is then expressed as S x S matrix T: Tij = P( Xt = j Xt-1 = i) in time step t The sensor model for the corresponding observations depending on the current state, i.e., P( et Xt = i) is then expressed as S x S diagonal matrix O in time step t with O e_t ij = P( et Xt = i) for i = j and O e_t ij = 0 for i j 17

Forward-backward equations as matrix-vector operations Forward-equation (recap) P( Xt+1 e1:t+1) = f( P( Xt e1:t), et+1) = f1:t+1 = α P( et+1 Xt+1) P( Xt+1 xt) P( xt e1:t) xt becomes f1:t+1 = α Ot+1 T T f1:t Backward-equation (recap) P( ek+1:t Xk) = bk+1:t = P( ek+1 xk+1) P( ek+2:t xk+1) P( xk+1 Xk) xk+1 becomes bk+1:t = TOk+1 bk+2:t Forward-Backward-equation is then still α f1:k bk+1:t 18

Smoothing in constant space Idea propagate both f and b in the same direction, hence avoiding to store the f1:k for a shifting / growing time slice k:t Propagate the forward-message f backward with f1:t = α (T T ) -1 O -1 t+1 f1:t+1 Start with computing ft:t in a standard forward-run, forgetting all the intermediate messages, then compute both f and b simultaneously backward to do smoothing for each step this should be done for (NOTE: works obviously only if T T and O can be inverted, i.e., every sensor reading must be possible in every state, though it can be very unlikely) X

Fixed-lag smoothing (online) Idea if we can do smoothing with constant space requirements, we can also find an efficient recursive algorithm for online smoothing (a shifting window ), independent of the length d of the investigated time slice t-d (with t growing). We need to compute α f1:t-d bt-d+1:t for time slice t-d. In t+1, when a new observation arrives, we need α f1:t-d+1 bt-d+1:t+1 for time slice t-d+1. We can get f1:t-d+1 from f1:t-d, applying standard filtering. For the backward message, some more inspection has to be done (bt-d+1:t+1 depends on the new evidence in t+1) but there is a way by looking at how bt-d+1:t relates to bt+1:t X

Fixed-lag smoothing (online) Backward recursion: apply the recursive equation for bt-d+1:t d times: t bt-d+1:t = ( TOi)bt+1:t = Bt-d+1:t 1 i=t-d+1 Then, after the next observation, this will be: t+1 bt-d+2:t+1 = ( TOi)bt+2:t+1 = Bt-d+2:t+1 1 i=t-d+2 Do some matrix division and get an incremental update for B (and ultimately bt-d+2:t+1): Bt-d+2:t+1 = O -1 t-d+1 T -1 Bt-d+1:t TOt+1 X

The full algorithm for fixed-lag smoothing function FIXED-LAG-SMOOTHING(e t, hmm, d) returns adistributionoverx t d inputs: e t,thecurrentevidencefortimestept hmm, ahiddenmarkovmodelwiths S transition matrix T d,thelengthofthelagforsmoothing persistent: t, thecurrenttime,initially1 f,theforwardmessagep(x t e 1:t ),initiallyhmm.prior B,thed-step backward transformation matrix, initially the identity matrix e t d:t,double-endedlistofevidencefromt d to t,initiallyempty local variables: O t d, O t,diagonalmatricescontainingthesensormodelinformation add e t to the end of e t d:t O t diagonal matrix containing P(e t X t ) if t>dthen f FORWARD(f,e t ) remove e t d 1 from the beginning of e t d:t O t d diagonal matrix containing P(e t d X t d ) B O 1 t d T 1 BTO t else B BTO t t t +1 if t>dthen return NORMALIZE(f B1) else return null X

Summary Inference in temporal models - Filtering and prediction (FORWARD) - Smoothing (FORWARD-BACKWARD) Hidden Markov Models - Simplified matrix representation for Forward-backward calculations 19

Assignment 3? (look on the course page...) 20