CS532, Winter 2010 Hidden Markov Models

Similar documents
Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov models 1

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Bayesian Networks BY: MOHAMAD ALSABBAGH

CSE 473: Artificial Intelligence

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

PROBABILISTIC REASONING OVER TIME

Temporal probability models. Chapter 15

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Introduction to Artificial Intelligence (AI)

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

CSEP 573: Artificial Intelligence

Hidden Markov Models (recap BNs)

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 188: Artificial Intelligence Spring Announcements

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models,99,100! Markov, here I come!

Note Set 5: Hidden Markov Models

CSE 473: Artificial Intelligence Spring 2014

CS 188: Artificial Intelligence Spring 2009

CS 188: Artificial Intelligence Fall 2011

Markov Chains and Hidden Markov Models

CS 5522: Artificial Intelligence II

Markov Models and Hidden Markov Models

Our Status in CSE 5522

CS 7180: Behavioral Modeling and Decision- making in AI

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CS 343: Artificial Intelligence

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS 188: Artificial Intelligence

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

STA 4273H: Statistical Machine Learning

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

COMP90051 Statistical Machine Learning

Linear Dynamical Systems

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

CS 188: Artificial Intelligence. Our Status in CS188

Probabilistic Robotics

Hidden Markov Models. George Konidaris

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Graphical Models Seminar

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

CS 5522: Artificial Intelligence II

Introduction to Artificial Intelligence Midterm 2. CS 188 Spring You have approximately 2 hours and 50 minutes.

Approximate Inference

CS 188: Artificial Intelligence Spring Announcements

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

L23: hidden Markov models

Final Exam December 12, 2017

CS 343: Artificial Intelligence

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Intelligent Systems (AI-2)

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Advanced Data Science

Intelligent Systems (AI-2)

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

9 Forward-backward algorithm, sum-product on factor graphs

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS 188: Artificial Intelligence Fall 2011

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Linear Dynamical Systems (Kalman filter)

Bayesian Machine Learning - Lecture 7

Reasoning Under Uncertainty: Bayesian networks intro

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Statistical NLP: Hidden Markov Models. Updated 12/15

Introduction to Artificial Intelligence (AI)

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Lecture 4: State Estimation in Hidden Markov Models (cont.)

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Dynamic Approaches: The Hidden Markov Model

Temporal probability models. Chapter 15, Sections 1 5 1

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Hidden Markov Models

Introduction to Machine Learning CMU-10701

Bayesian Methods in Artificial Intelligence

Artificial Intelligence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Lecture 3: ASR: HMMs, Forward, Viterbi

STA 414/2104: Machine Learning

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

order is number of previous outputs

CS711008Z Algorithm Design and Analysis

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

Introduction to Mobile Robotics Probabilistic Robotics

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

Our Status. We re done with Part I Search and Planning!

Conditional Random Field

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

CS 188: Artificial Intelligence

Transcription:

CS532, Winter 2010 Hidden Markov Models Dr. Alan Fern, afern@eecs.oregonstate.edu March 8, 2010 1 Hidden Markov Models The world is dynamic and evolves over time. An intelligent agent in such a world needs to be able to make observations about the world using its sensors and use those observations to make inferences about the state of affairs. Apart from artificial intelligence, there are many applications in engineering and science where noisy observations of a system or process are available and we would like to infer something about the underlying state of the system. This general problem of making inferences about the hidden state of a system from noisy observations is one of the primary focusses of the area of probabilistic temporal reasoning. To formalize this type of reasoning process we will model temporal processes using the notions of states and observations over time: X t is a set of state variables at time t of the process of interest. For example, X t = { hungry t, wet t, tired t } might be state variables describing a baby. E t is a set of evidence variables, or observations, at time t. For example, E t = { cryloud t, crysoft t, laugh t, wavearm t, squirming t }. As time pases the process state evolves and each state produces a corresponding observation in terms of the evidence variables. This gives rise to a sequence of observations,,, E T and states X 1,,, X T. We will assume the ability to directly observe the evidence variables, but the state variables are often hidden though those are the variables of primary interest. The goal then is to make inferences about the state variables given the observation sequence. In order to make such inferences, it is common to assume that: the process follows a probabilistic model, the evolution of process states is not affected by the evidence, the following first-order Markov condition holds: t, P(X t X 0,, X t 1 ) = P(X t X t 1 ) 1

the process is stationary, like a proposal distribution of MCMC: P(X t X t 1 ) = P(X t 1 X t ) Under these assumptions we can encode the state evolution using just a prior on X 0, P(X 0 ), which gives the probability of the intial state being equal to X 0 and P(X t X t 1 ) which gives the probability of the state being X t when the previous state was X t 1. It is also common to assume that evidence E t at time t only depends on X t : P(E t E 0:t 1, X 0:t ) = P(E t X t ) Also, assume that the observation model is a stationary distribution. That is, t, v P(E t X t ) = P(E v X v ) From the above we will specify a first-order Markov process using the following three components: Initial State Prior: P(X 0 ) Transition Prob.: P(X t X t 1 ) Observation Prob.: P(E t X t ) Given this model we draw a graphical representation of the first-order Markov process as follows: X 4 X 5 X T E 4 E 5 E T Under this model the joint probability of a pair of observation and state sequences is given by the following: P(X 0,, X T,,, E T ) = P(X 0 ) T P(X t X t 1 )P(E t X t ) A Hidden Markov Model (HMM) is a first-order Markov process where X t and E t contain a single discrete variable each. HMMs are widely used in speech recognition, computer vision, grammatical inference, bioinformatics, communications, etc. Note that even when X t and E t contain multiple variables, we can think of them as forming a single joint variable with many values. Thus, HMMs are really quite general from a representational perspective. t=1 2

2 Inference Problems for HMMs We are typically interested in three types of temporal inference problems. 2.1 Filtering or Monitoring Given evidence up to time t, what is the distribution over X t : P(X t e 1:t ) X 4 X 5 X T E 4 E 5 E T Ideally, we would like to have an incremental method to compute P(X t+1 E t+1 = e t+1 ) from P(X t E t = e t ) since often filtering will be done on streaming data and must be fast. 2.2 Prediction P(X t+k e 1:t ) X t X t+1 X t+2 X t+k E t E t+1 E t+2 E t+k This inference problem is similar to what a meteorologist must do each day. Predict the future weather from past and current observations. The accuracy of the predictions will typically degrade as the value of k increases. 2.3 Smoothing P(X t e 1:t+k ) X t X t+1 X t+2 X t+k E t E t+1 E t+2 E t+k 3

Here we want to get the distribution of X t given past and future observations. This is similar to filtering, but allows for the use of future observations. This inference scenario arises when one is storing evidence sequences and wants to computing the most accurate distribution over X t possible. By reasoning about both the time before and after t higher accuracy can be achieved than in pure filtering where evidence is not consider after time t. 2.4 Most Likely Sequence arg max X 0:t P(X 0:t e 0:t ) Here we want to compute the most likely state sequence for a given observation sequence. 3 Filtering with HMMs We want an online algorithm that given P(X t e 1:t ) can efficiently compute P(X t+1 e 1:t+1 ) upon observing e t+1. Note that for t = 0 the distribution is given by the initial state distribution of the model. P(X t+1 e 1:t+1 ) = P(X t+1 e 1:t, e t+1 ) = P(e t+1 X t+1, e 1:t )P(X t+1 e 1:t )/P(e t+1 e 1:t ) = αp(e t+1 X t+1, e 1:t )P(X t+1 e 1:t ) = αp(e t+1 X t+1 )P(X t+1 e 1:t ) P(X t+1 e 1:t ) = X t P(X t+1, X t e 1:t ) = X t P(X t+1 X t )P(X t e 1:t ) P(X t+1 e 1:t+1 ) = αp(e t+1 X t+1 ) X t P(X t+1 X t )P(X t e 1:t ) For those familiar with Bayes net inference, this algorithm is just a special case of belief propagation. What is the complexity of each filtering step? We must compute a summation over values of X t for each value of X t+1. So, time complexity is O(k 2 ), where k is the number of states. Space complexity is k, since we just need to store P(X t+1 e 1:t+1 ). The book shows an example calculation on page 543. 4

4 Prediction with HMMs We can get a recursive form for prediction, allowing us to easily compute the prediction for t+k+1 from the prediction for t + k. P(X t+k+1 e 1:t ) = X t+k P(X t+k+1 X t+k )P(X t+k e 1:t ) The base case of k = 0 corresponds exactly to the filtering problem P(X t e 1:t ), which we already described. 5 Smoothing with HMMs We want to compute: P(X k e 1:t ) for each 1 k t. Note that P(X k e 1:t ) is just a marginal distribution (or belief) at X k. This inference problem can be solved with a general technique known as belief propagation for general graphical models. We will derive this algorithm for the special case of HMMs. The resulting algorithm is known as the forward-backward algorithm. P(X k e 1:t ) = P(X k e 1:k, e k+1:t ) = P(X k e 1:k )P(e k+1:t X k, e 1:k )/P(e k+1:t e 1:k ) = αp(x k e 1:k )P(e k+1:t X k, e 1:k ) = αp(x k e 1:k )P(e k+1:t X k ) The above shows that we need to compute two factors for the smoothing problem. We can compute the first factor P(X k e 1:k ) for each k by running a filtering pass of the sequence as already described. For the second factor we can derive: P(e k+1:t X k ) P(e k+1:t, X k+1 X k ) P(e k+1:t X k+1, X k )P(X k+1 X k ) P(e k+1:t X k+1 )P(X k+1 X k ) P(e k+1, e k+2:t X k+1 )P(X k+1 X k ) P(e k+1 X k+1 )P(e k+2:t X k+1 )P(X k+1 X k ) The second term P(e k+2:t X k+1 ) can then be computed recursively. In other words, P(e k+1:t X k ) is computed for each k in a backward pass over the sequence initialized with P(e t+1:t X t ) = 1 for all values of X t. The book provides an example on page 545. 5

5.1 Most Likely Sequence We want to compute X 1:t = argmax P( e 1:t ) This is not the same as computing argmaxp(x t e 1:t ) for each i. 1 We will use the Viterbi algorithm for this problem. The Viterbi algorithm is simple the maxproduct algorithm applied using variable elimination order X 1,, X n. See the book for detailed equations. arg max P( e 1:t) = arg max logp( e 1:e) = arg max logp(x 0)P(X 1 X 0)P(e 1 X 1) P(x t X t 1)P(X t e t) = arg max logp(x 0) + logp(x 1 X 0)P(e 1 X 1) + + logp(x t X t 1)P(X t e t) = arg max W 0(X 0) + W 1(X 0, X 1) + + W t(x t 1, X t) The Viterbi algorithm is often visualized using a Viterbi trellis: Start W 0(0) W 0(1) 0 W 1(0, 0) 1 W 2(0,0) 2 t 1 W t(0, 0) t W 1(0, 1) W 1(1, 0) W 1(1, 1) W 2(0,1) W 2(1,0) W 2(1,1) W t(0, 1) W t(1, 0) W t(1, 1) W 0(0) W 0(1) End The problem can thus be viewed as finding shortest path in trellis from start to finish. 1 You will show this in the homework. 6