Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Similar documents
Introduction to Machine Learning CMU-10701

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Advanced Data Science

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

STA 4273H: Statistical Machine Learning

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Lecture 9. Intro to Hidden Markov Models (finish up)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models,99,100! Markov, here I come!

Hidden Markov Models. x 1 x 2 x 3 x K

STA 414/2104: Machine Learning

Hidden Markov Models. x 1 x 2 x 3 x K

Linear Dynamical Systems

2 : Directed GMs: Bayesian Networks

Basic math for biology

Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x N

CSCE 471/871 Lecture 3: Markov Chains and

Graphical Models Seminar

COMP90051 Statistical Machine Learning

CS711008Z Algorithm Design and Analysis

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

Hidden Markov models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models (I)

Stephen Scott.

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Linear Dynamical Systems (Kalman filter)

HIDDEN MARKOV MODELS

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Hidden Markov Models

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Note Set 5: Hidden Markov Models

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Multiscale Systems Engineering Research Group

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Brief Introduction of Machine Learning Techniques for Content Analysis

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Models for biological sequence analysis

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Dynamic Approaches: The Hidden Markov Model

Hidden Markov Models and Gaussian Mixture Models

Steven L. Scott. Presented by Ahmet Engin Ural

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Statistical Sequence Recognition and Training: An Introduction to HMMs

Data Mining in Bioinformatics HMM

Hidden Markov Models. Terminology, Representation and Basic Problems

HMM: Parameter Estimation

Computational Genomics and Molecular Biology, Fall

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models

Hidden Markov Models Part 2: Algorithms

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

6 Markov Chains and Hidden Markov Models

Hidden Markov Models

Hidden Markov Models in Language Processing

Administrivia. What is Information Extraction. Finite State Models. Graphical Models. Hidden Markov Models (HMMs) for Information Extraction

8: Hidden Markov Models

Approximate Inference

Directed Probabilistic Graphical Models CMSC 678 UMBC

Master 2 Informatique Probabilistic Learning and Data Analysis

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Statistical Methods for NLP

Lecture 12: Algorithms for HMMs

8: Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Cheng Soon Ong & Christian Walder. Canberra February June 2018

order is number of previous outputs

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Lecture 7 Sequence analysis. Hidden Markov Models

Info 2950, Lecture 25

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

CS 7180: Behavioral Modeling and Decision- making in AI

Statistical NLP: Hidden Markov Models. Updated 12/15

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

Hidden Markov models 1

Lecture 12: Algorithms for HMMs

Machine Learning for OR & FE

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Sean Escola. Center for Theoretical Neuroscience

Hidden Markov Models. George Konidaris

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Transcription:

Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing

i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech Characters in a sentence Base pairs along a DNA strand 2

Markov Models Joint distribution of n arbitrary random variables Chain rule Markov Assumption (m th order) Current observation only depends on past m observations 3

Markov Models Markov Assumption 1 st order 2 nd order 4

Markov Models Markov Assumption 1 st order # parameters in stationary model K-ary variables O(K 2 ) m th order O(K m+1 ) n-1 th order O(K n ) no assumptions complete (but directed) graph Homogeneous/stationary Markov model (probabilities don t depend on n 5 )

Hidden Markov Models Distributions that characterize sequential data with few parameters but are not limited by strong Markov assumptions. S 1 S 2 S T-1 S T O 1 O 2 O T-1 O T Observation space Hidden states O t ϵ {y 1, y 2,, y K } S t ϵ {1,, I} 6

Hidden Markov Models S 1 S 2 S T-1 S T O 1 O 2 O T-1 O T 1 st order Markov assumption on hidden states {S t } t = 1,, T (can be extended to higher order). Note: O t depends on all previous observations {O t-1, O 1 } 7

Hidden Markov Models Parameters stationary/homogeneous markov model (independent of time t) Initial probabilities p(s 1 = i) = π i S 1 S 2 S T-1 S T O 1 O 2 O T-1 O T Transition probabilities p(s t = j S t-1 = i) = p ij Emission probabilities p(o t = y S t = i) = 8

HMM Example The Dishonest Casino A casino has two dices: Fair dice P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded dice P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = ½ Casino player switches back-&- forth between fair and loaded die with 5% probability 9

HMM Problems 10

HMM Example L F F F L L L F 11

State Space Representation Switch between F and L with 5% probability 0.05 0.95 L F 0.95 HMM Parameters Initial probs Transition probs 0.05 P(S 1 = L) = 0.5 = P(S 1 = F) P(S t = L/F S t-1 = L/F) = 0.95 P(S t = F/L S t-1 = L/F) = 0.05 Emission probabilities P(O t = y S t = F) = 1/6 P(O t = y S t = L) = 1/10 = 1/2 y = 1,2,3,4,5,6 y = 1,2,3,4,5 y = 6 12

HMM Algorithms Evaluation What is the probability of the observed sequence? Forward Algorithm Decoding What is the probability that the third roll was loaded given the observed sequence? Forward-Backward Algorithm What is the most likely die sequence given the observed sequence? Viterbi Algorithm Learning Under what parameterization is the observed sequence most probable? Baum-Welch Algorithm (EM) 14

Evaluation Problem Given HMM parameters sequence find probability of observed sequence & observation S 1 S 2 S T-1 S T O 1 O 2 O T-1 O T requires summing over all possible hidden state values at all times K T exponential # terms! Instead: αk T Compute recursively 15

Forward Probability Compute forward probability α k t recursively over t S 1 S t-1 S t... Introduce S t-1 Chain rule Markov assumption O 1 O t-1 O t 16

Forward Algorithm Can compute α t k for all k, t using dynamic programming: Initialize: k α 1 = p(o 1 S 1 = k) p(s 1 = k) for all k Iterate: for t = 2,, T i α t k = p(o t S t = k) α t-1 p(s t = k S t-1 = i) i for all k Termination: k = α T k 17

Decoding Problem 1 Given HMM parameters sequence & observation find probability that hidden state at time t was k Compute recursively k α t k β t S 1 S t-1 S t S t+1 S T-1 S T O 1 O t-1 O t O t+1 O T-1 O T 18

Backward Probability Compute forward probability β k t recursively over t S t S t+1 S t+2 S T... Introduce S t+1 Chain rule Markov assumption O t O t+1 O t+2 O T 19

Backward Algorithm Can compute β t k for all k, t using dynamic programming: Initialize: β T k = 1 for all k Iterate: for t = T-1,, 1 for all k Termination: 20

Most likely state vs. Most likely sequence Most likely state assignment at time t E.g. Which die was most likely used by the casino in the third roll given the observed sequence? Most likely assignment of state sequence E.g. What was the most likely sequence of die rolls used by the casino given the observed sequence? Not the same solution! MLA of x? MLA of (x,y)? 21

Decoding Problem 2 Given HMM parameters sequence & observation find most likely assignment of state sequence k V T Compute recursively Vk T - probability of most likely sequence of states ending at state S T = k 22

Viterbi Decoding Compute probability V k t recursively over t... Bayes rule Markov assumption S 1 O 1 S t-1 O t-1 S t O t 23

Viterbi Algorithm Can compute V t k for all k, t using dynamic programming: Initialize: k V 1 = p(o 1 S 1 =k)p(s 1 = k) for all k Iterate: for t = 2,, T for all k Termination: Traceback: 24

Computational complexity What is the running time for Forward, Backward, Viterbi? O(K 2 T) linear in T instead of O(K T ) exponential in T! 25

Learning Problem Given HMM with unknown parameters and observation sequence find parameters that maximize likelihood of observed data hidden variables state sequence But likelihood doesn t factorize since observations not i.i.d. EM (Baum-Welch) Algorithm: E-step Fix parameters, find expected state assignments M-step Fix expected state assignments, update parameters 26

Baum-Welch (EM) Algorithm Start with random initialization of parameters E-step Fix parameters, find expected state assignments Forward-Backward algorithm 27

Baum-Welch (EM) Algorithm Start with random initialization of parameters E-step -1 = expected # times in state i = expected # transitions from state i M-step = expected # transitions from state i to j 28

Some connections HMM vs Linear Dynamical Systems (Kalman Filters) HMM: States are Discrete Observations Discrete or Continuous Linear Dynamical Systems: Observations and States are multivariate Gaussians whose means are linear functions of their parent states (see Bishop: Sec 13.3) 29

HMMs.. What you should know Useful for modeling sequential data with few parameters using discrete hidden states that satisfy Markov assumption Representation - initial prob, transition prob, emission prob, State space representation Algorithms for inference and learning in HMMs Computing marginal likelihood of the observed sequence: forward algorithm Predicting a single hidden state: forward-backward Predicting an entire sequence of hidden states: viterbi Learning HMM parameters: an EM algorithm known as Baum- Welch 30