Hidden Markov Models,99,100! Markov, here I come!

Similar documents
Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

Hidden Markov models 1

Advanced Data Science

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

STA 4273H: Statistical Machine Learning

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

CS532, Winter 2010 Hidden Markov Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

CS711008Z Algorithm Design and Analysis

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Introduction to Artificial Intelligence (AI)

CS 188: Artificial Intelligence Fall 2011

CSE 473: Artificial Intelligence

STA 414/2104: Machine Learning

Linear Dynamical Systems

Linear Dynamical Systems (Kalman filter)

PROBABILISTIC REASONING OVER TIME

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Machine Learning for OR & FE

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Markov Chains and Hidden Markov Models

2D Image Processing. Bayes filter implementation: Kalman filter

Hidden Markov Models. George Konidaris

Hidden Markov Models (recap BNs)

Hidden Markov Models Hamid R. Rabiee

CSEP 573: Artificial Intelligence

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Hidden Markov Models. x 1 x 2 x 3 x K

CS 188: Artificial Intelligence Spring 2009

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

HMM : Viterbi algorithm - a toy example

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Hidden Markov Models

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. x 1 x 2 x 3 x K

Approximate Inference

2D Image Processing. Bayes filter implementation: Kalman filter

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

COMP90051 Statistical Machine Learning

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

CS 188: Artificial Intelligence Spring Announcements

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Note Set 5: Hidden Markov Models

CS 343: Artificial Intelligence

Lecture 11: Hidden Markov Models

Hidden Markov Models

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Reasoning Under Uncertainty: Bayesian networks intro

Dynamic Approaches: The Hidden Markov Model

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Parametric Models Part III: Hidden Markov Models

Probability and Time: Hidden Markov Models (HMMs)

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Our Status in CSE 5522

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

order is number of previous outputs

State Space and Hidden Markov Models

Lecture 12: Algorithms for HMMs

CS 188: Artificial Intelligence. Our Status in CS188

Probabilistic Robotics

Lecture 12: Algorithms for HMMs

CS 7180: Behavioral Modeling and Decision- making in AI

Intelligent Systems (AI-2)

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS 343: Artificial Intelligence

2 : Directed GMs: Bayesian Networks

Lecture 4: State Estimation in Hidden Markov Models (cont.)

HMM : Viterbi algorithm - a toy example

CSE 473: Artificial Intelligence

Bayesian networks Lecture 18. David Sontag New York University

Hidden Markov Models (I)

Basic math for biology

Hidden Markov Models. x 1 x 2 x 3 x N

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning for natural language processing

Introduction to Mobile Robotics Probabilistic Robotics

Intelligent Systems (AI-2)

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Temporal probability models. Chapter 15, Sections 1 5 1

Data-Intensive Computing with MapReduce

Hidden Markov Models Part 2: Algorithms

Lecture 3: ASR: HMMs, Forward, Viterbi

8: Hidden Markov Models

CS 5522: Artificial Intelligence II

Statistical Methods for NLP

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

CS 5522: Artificial Intelligence II

Transcription:

Hidden Markov Models,99,100! Markov, here I come! 16.410/413 Principles of Autonomy and Decision-Making Pedro Santana (psantana@mit.edu) October 7 th, 2015. Based on material by Brian Williams and Emilio Frazzoli.

Assignments Problem set 4 Out last Wednesday. Due at midnight tonight. Problem set 5 Out today and due in a week. Readings Today: Probabilistic Reasoning Over Time [AIMA], Ch. 15. 2/41

Today s topics 1. Motivation 2. Probability recap Bayes Rule Marginalization 3. Markov chains 4. Hidden Markov models 5. HMM algorithms Prediction Filtering Smoothing Decoding Learning (Baum-Welch) Won t be covered today and significantly more involved, but you might want to learn more about it. 3/41

1. Motivation Why are we learning this? 4/41

Robot navigation 5/41

Robust sensor fusion (visual tracking) 6/41

Natural language processing (NLP) Li-ve long and pros-per 7/41

2. Probability recap Probability is common sense reduced to calculation. Pierre-Simon Laplace 8/41

Bayes rule Joint Conditional Marginal Pr A, B = Pr A B Pr(B) A A, B: random variables B Pr A, B = Pr B A Pr(A) Pr A B Pr B = Pr B A Pr(A) Pr A B = Pr B A Pr(A) Pr B Bayes rule! Pr B A Pr(A) 9/41

Marginalization & graphical models A Pr(B A) A causes B B Marginalizes A out Prior on cause Pr B = Distribution of the effect B a Pr(A = a, B) = Pr B A = a Pr(A = a) a Conditioning on cause makes the computation easier. 10/41

Our goal for today How can we estimate the hidden state of a system from noisy sensor observations? 11/41

3. Markov chains Andrey Markov 12/41

State transitions over time State S S t : state at time t (random variable) S t =s: particular value of S t (not random) s S, S is the state space. Time S 0 S 1 S t S t+1 S t+2 13/41

State transitions over time Pr S 0, S 1,, S t, S t+1 = Pr(S 0:t+1 ) Pr S 0:t+1 = Pr(S 0 ) Pr(S 1 S 0 ) Pr(S 2 S 0:1 ) Pr S 3 S 0:2 Pr(S 4 S 0:3 ) Pr S t S 0:t 1 Past influences present models Models grow exponentially with time! 14/41

The Markov assumption Pr S t S 0:t 1 Path to S t isn t relevant, given knowledge of S t-1. Definition: Markov chain = Pr S t S t 1 Constant size! If a sequence of random variable S 0,S 1,,S t+1 is such that Pr S 0:t+1 = Pr(S 0 ) Pr(S 1 S 0 ) Pr(S 2 S 1 ) = Pr S 0 we say that S 0,S 1,,S t+1 form a Markov chain. t+1 i=1 Pr S i S i 1, 15/41

Markov chains S 0 S 1 S t S t+1 S t+2 S = Pr S t S t 1 : d d matrix T t Discrete set with d values. T t i,j = Pr S t = i S t 1 = j If T t does not depend on t Markov chain is stationary. T i,j = Pr S t = i S t 1 = j, t 16/41

(Very) Simple Wall Street L R H F S F L R H H Stock price* H H: High R: Rising F: Falling L: Low S: Steady T H k-1 R k-1 F k-1 L k-1 S k-1 H k 0.1 0.05 0.2 R k 0.5 0.8 0.25 F k 0.9 0.6 0.25 L k 0.1 S k 0.45 0.3 0.5 R S F L *Pedagogical example. In no circumstance shall the author be responsible for financial losses due to decisions based on this model. 17/41

4. Hidden Markov models (HMMs) Andrey Markov 18/41

Observing hidden Markov chains S 0 S 1 S t S t+1 S t+2 Hidden O 1 O t O t+1 O t+2 Observable Definition: Hidden Markov Model (HMM) A sequence of random variables O 1,O 2,,O t,, is an HMM if the distribution of O t is completely defined by the current (hidden) state S t according to Pr(O t S t ), where S t is part of an underlying Markov chain. 19/41

Hidden Markov models S 0 S 1 S t S t+1 S t+2 Hidden O 1 O t O t+1 O t+2 Observable O = Pr O t S t : d m matrix M Discrete set with m values. M i,j = Pr O t = j S t = i 20/41

The dishonest casino T F k-1 L k-1 F k 0.95 0.05 Fair L k 0.05 0.95 Loaded Hidden states Observations M 1 2 3 4 5 6 F k 1/6 1/6 1/6 1/6 1/6 1/6 L k 1/10 1/10 1/10 1/10 1/10 1/2 21/41

Queries Lower case: these are known values, not random variables. o 1,o 2,,ot = o 1:t Time 0 t Given the available history of observations, what s the belief about the current hidden state? Pr(S t o 1:t ) Filtering Given the available history of observations, what s the belief about a past hidden state? Pr S k o 1:t, k < t Smoothing 22/41

Queries Lower case: these are known values, not random variables. o 1,o 2,,ot = o 1:t Time 0 t Given the available history of observations, what s the belief about a future hidden state? Pr S k o 1:t, k > t Prediction Given the available history of observations, what s the most likely sequence of hidden states? s 0:t = arg max s 0:t Pr(S 0:t = s 0:t o 1:t ) Decoding 23/41

5. HMM algorithms Where we ll learn how to compute answers to the previously seen HMM queries. 24/41

Notation Random variable! Pr S t Probability distribution of S t Vector of d probability values. Pr S t = s = Pr s t Probability of observing S t = s according to Pr(S t ) Probability [0,1] 25/41

Filtering (forward) Given the available history of observations, what s the belief about the current hidden state? Pr S t o 1:t = p t Pr S t o 1:t = Pr S t o t, o 1:t 1 Pr o t S t, o 1:t 1 Pr(S t o 1:t 1 ) Bayes = Pr o t S t Pr(S t o 1:t 1 ) Obs. model Pr S t o 1:t 1 = = d i=1 d i=1 Pr(S t S t 1 = i, o 1:t 1 ) Pr(S t 1 = i o 1:t 1 ) Pr(S t S t 1 = i) Pr(S t 1 = i o 1:t 1 ) Recursion! Marg. Trans. model 26/41

Filtering Given the available history of observations, what s the belief about the current hidden state? 1. One-step prediction: Pr S t o 1:t 1 = p t = 2. Measurement update: i=1 p t [i] = ηpr(o t S t = i) p t [i] 3. Normalize belief (to get rid of η): p t i p t i η d d, η = j=1 Pr S t o 1:t = p t Pr(S t S t 1 = i) Pr(S t 1 = i o 1:t 1 ) = T p t j p t 1 27/41

Prediction Given the available history of observations, what s the belief about a future hidden state? Pr S k o 1:t, k > t Pr S t+1 o 1:t = T p t Previous slide. Pr S t+2 o 1:t = d i=1 Pr(S t+2 S t+1 = i) Pr S t+1 = i o 1:t = T 2 p t Pr S k o 1:t = T k t p t 28/41

Smoothing (forward-backward) Given the available history of observations, what s the belief about a past hidden state? Pr S k o 1:t, k < t Pr S k o 1:t Pr(o k+1:t S k ) = = Pr S k o 1:k, o k+1:t Pr(o k+1:t S k, o 1:k )Pr S k o 1:k = Pr(o k+1:t S k )Pr S k o 1:k = = d i=1 d i=1 d i=1 Filtering! Pr o k+1:t S k+1 = i, S k Pr(S k+1 = i S k ) Pr o k+2:t, o k+1 S k+1 = i Pr(S k+1 = i S k ) Bayes Obs. model Marg. Pr o k+2:t S k+1 = i Pr(o k+1 S k+1 = i)pr(s k+1 = i S k ) Recursion! Obs. model 29/41

Smoothing Given the available history of observations, what s the belief about a past hidden state? Pr S k o 1:t = p k,t, k < t 1. Perform filtering from 0 to k (forward): Pr S k o 1:k = p k 2. Compute the backward recursion from t to k: Pr(o k+1:t S k ) = b k,t, b m 1,t [i] = d j=1 b t,t = 1 b m,t [j] Pr(o m S m = j)pr(s m = j S m 1 = i), k + 1 m t 3. Combine the two results and normalize: p k,t [i] = b k,t [i] p k [i], p k,t i p k,t i η, η = d j=1 p k,t j 30/41

Decoding Given the available history of observations, what s the most likely sequence of hidden states so far? s 0:t = arg max Pr(S 0:t = s 0:t o 1:t ) s 0:t 1 2 d 31/41

Decoding (simple algorithm) Given the available history of observations, what s the most likely sequence of hidden states so far? s 0:t = arg max Pr(S 0:t = s 0:t o 1:t ) s 0:t Pr(s 0:t o 1:t ) Pr(o 1:t s 0:t )Pr(s 0:t ) t = Pr(s 0 ) Pr s i s i 1 Pr(o i s i ) i=1 Bayes HMM model 32/41

Decoding (simple algorithm) Given the available history of observations, what s the most likely sequence of hidden states so far? s 0:t = arg max Pr(S 0:t = s 0:t o 1:t ) s 0:t Compute all possible state trajectories from 0 to t. ॻ 0:t = {s 0:t s i S,i=0,, t} How big is ॻ 0:t? Choose the most likely trajectory according to s 0:t = arg max Pr s 0 s 0:t ॻ 0:t t i=1 Pr s i s i 1 Pr(o i s i ) d t+1 Can we do better? 33/41

Decoding (the Viterbi algorithm) Given the available history of observations, what s the most likely sequence of hidden states so far? s 0:t = arg max Pr(S 0:t = s 0:t o 1:t ) s 0:t Pr(S 0:t o 1:t ) = Pr(S 0:t o 1:t 1, o t ) t-1 t Pr(o t S t )Pr(S t, S 0:t 1 o 1:t 1 ) = Pr o t S t Pr(S t S t 1 )Pr(S 0:t 1 o 1:t 1 ) Recursion! From all paths arriving at s t-1, record only the most likely one. max s 0:t Pr s 0:t o 1:t = max s t,s t 1 Pr o t s t Pr s t s t 1 max s 0:t 1 Pr(s 0:t 1 o 1:t 1 ) 34/41

Decoding (the Viterbi algorithm) Given the available history of observations, what s the most likely sequence of hidden states so far? δ k [s]: most likely path ending in s t = s δ 0 [s]: (s), l 0 [s]: Pr(S 0 = s) 1. Expand paths in δ k according to the transition model 2. Update likelihood: l k [s]: likelihood of δ k [s] (unnormalized probability) pred k+1 s = arg max Pr s k+1 = s s k = s l k s, s = 1,, d s δ k+1 s = δ k pred k+1 s. append(s) l k+1 s = Pr(o k+1 s k+1 = s) Pr s k+1 = s s k = pred k+1 s 3. When k=t, choose δ t s with the highest l t s. l k pred k+1 s 35/41

Dishonest casino example T F k-1 L k-1 F k 0.95 0.05 Fair L k 0.05 0.95 Loaded Hidden states Observations M 1 2 3 4 5 6 F k 1/6 1/6 1/6 1/6 1/6 1/6 L k 1/10 1/10 1/10 1/10 1/10 1/2 36/41

Dishonest casino example Pr S 0 = 0.8 0.2 Fair Loaded Filtering Smoothing Fair Loaded Fair Loaded t=0 0.8000 0.2000 0.7382 0.2618 Observations = 1,2,4,6,6,6,3,6 T F k-1 L k-1 F k 0.95 0.05 L k 0.05 0.95 M 1 2 3 4 5 6 F k 1/6 1/6 1/6 1/6 1/6 1/6 L k 1/10 1/10 1/10 1/10 1/10 1/2 t=1 0.8480 0.1520 0.6940 0.3060 t=2 0.8789 0.1211 0.6116 0.3884 t=3 0.8981 0.1019 0.4679 0.5321 t=4 0.6688 0.3312 0.2229 0.7771 t=5 0.3843 0.6157 0.1444 0.8556 t=6 0.1793 0.8207 0.1265 0.8735 t=7 0.3088 0.6912 0.1449 0.8551 t=8 0.1399 0.8601 0.1399 0.8601 Coincidence? 37/41

Dishonest casino example Pr S 0 = 0.8 0.2 Fair Loaded Observations = 1,2,4,6,6,6,3,6 Filtering (MAP): Smoothing (MAP): Decoding: ['Fair ', 'Fair ', 'Fair ', 'Fair ', 'Fair ', 'Loaded', 'Loaded', 'Loaded', 'Loaded'] ['Fair ', 'Fair ', 'Fair ', 'Loaded ', 'Loaded ', 'Loaded', 'Loaded', 'Loaded', 'Loaded'] t=0: ['Fair'] t=1: ['Fair', 'Fair'] t=2: ['Fair', 'Fair', 'Fair'] t=3: ['Fair', 'Fair', 'Fair', 'Fair'] t=4: ['Fair', 'Fair', 'Fair', 'Fair', 'Fair'] t=5: ['Fair', 'Fair', 'Fair', 'Fair', 'Fair', 'Fair'] t=6: ['Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded'] t=7: ['Fair', 'Fair', 'Fair', 'Fair', 'Fair', 'Fair', 'Fair', 'Fair'] t=8: ['Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded', 'Loaded'] 38/41

Borodovsky & Ekisheva (2006), pp 80-81 DNA A G T C A T G H: High genetic content (coding DNA) L: Low genetic content (non-coding DNA) H L T M A C G T H k-1 L k-1 H k 0.5 0.4 L k 0.5 0.6 H k 0.2 0.3 0.3 0.2 L k 0.3 0.2 0.2 0.3 39/41

Borodovsky & Ekisheva (2006), pp 80-81 Pr S 0 = 0.5 0.5 High Low Filtering Smoothing H L H L t=0 0.5000 0.5000 0.5113 0.4887 Observations = G,G,C,A,C,T,G,A,A T H k-1 L k-1 H k 0.5 0.4 L k 0.5 0.6 M A C G T H k 0.2 0.3 0.3 0.2 L k 0.3 0.2 0.2 0.3 t=1 0.5510 0.4490 0.5620 0.4380 t=2 0.5561 0.4439 0.5653 0.4347 t=3 0.5566 0.4434 0.5478 0.4522 t=4 0.3582 0.6418 0.3668 0.6332 t=5 0.5368 0.4632 0.5278 0.4722 t=6 0.3563 0.6437 0.3648 0.6352 t=7 0.5366 0.4634 0.5259 0.4741 t=8 0.3563 0.6437 0.3474 0.6526 t=9 0.3398 0.6602 0.3398 0.6602 40/41

Borodovsky & Ekisheva (2006), pp 80-81 Pr S 0 = 0.5 0.5 Observations = G,G,C,A,C,T,G,A,A Filtering (MAP): Smoothing (MAP): Decoding: t=0: t=1: t=2: t=3: t=4: t=5: t=6: t=7: t=8: t=9: ['H/L', 'H', 'H', 'H', 'L', 'H', 'L', 'H', 'L', 'L'] ['H', 'H', 'H', 'H', 'L', 'H', 'L', 'H', 'L', 'L'] ['H'] ['H', 'H'] ['H', 'H', 'H'] ['H', 'H', 'H', 'H'] ['H', 'H', 'H', 'H', 'L'] ['H', 'H', 'H', 'H', 'L', 'L'] ['H', 'H', 'H', 'H', 'L', 'L', 'L'] ['H', 'H', 'H', 'H', 'L', 'L', 'L', 'H'] ['H', 'H', 'H', 'H', 'L', 'L', 'L', 'L', 'L'] ['H', 'H', 'H', 'H', 'L', 'L', 'L', 'L', 'L', 'L'] 41/41