Hidden Markov Models Hamid R. Rabiee

Similar documents
Hidden Markov Modelling

Multiscale Systems Engineering Research Group

CS 7180: Behavioral Modeling and Decision- making in AI

Hidden Markov Model and Speech Recognition

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Lecture 11: Hidden Markov Models

Data Mining in Bioinformatics HMM

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Parametric Models Part III: Hidden Markov Models

ASR using Hidden Markov Model : A tutorial

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Lecture 3: ASR: HMMs, Forward, Viterbi

CS 4495 Computer Vision

Hidden Markov Models

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Hidden Markov Models NIKOLAY YAKOVETS

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models

Hidden Markov Models and Gaussian Mixture Models

L23: hidden Markov models

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Conditional Random Field

The main algorithms used in the seqhmm package

Data-Intensive Computing with MapReduce

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Hidden Markov Models (HMMs)

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Terminology, Representation and Basic Problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Hidden Markov Models. Terminology and Basic Algorithms

Introduction to Machine Learning CMU-10701

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Statistical Methods for NLP

Note Set 5: Hidden Markov Models

COMP90051 Statistical Machine Learning

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

STA 414/2104: Machine Learning

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models,99,100! Markov, here I come!

Statistical Machine Learning from Data

Hidden Markov Models

HMM part 1. Dr Philip Jackson

STA 4273H: Statistical Machine Learning

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Models and Gaussian Mixture Models

On Optimal Coding of Hidden Markov Sources

Basic math for biology

Hidden Markov Models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Lecture 13: Structured Prediction

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Statistical Sequence Recognition and Training: An Introduction to HMMs

Advanced Data Science

Dynamic Approaches: The Hidden Markov Model

Dept. of Linguistics, Indiana University Fall 2009

Sequence Labeling: HMMs & Structured Perceptron

Lecture 3: Machine learning, classification, and generative models

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Computational Genomics and Molecular Biology, Fall

Lab 3: Practical Hidden Markov Models (HMM)

Department of Mathematics & Statistics, ARID Agriculture University Rawalpindi, Pakistan 2,4

A New OCR System Similar to ASR System

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Hidden Markov Models

Bioinformatics 2 - Lecture 4

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Machine Learning for OR & FE

Machine Learning for natural language processing

Hidden Markov Models. Dr. Naomi Harte

Cheng Soon Ong & Christian Walder. Canberra February June 2018

EECS730: Introduction to Bioinformatics

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Data Analyzing and Daily Activity Learning with Hidden Markov Model

A Revealing Introduction to Hidden Markov Models

Temporal Modeling and Basic Speech Recognition

1 What is a hidden Markov model?

p(d θ ) l(θ ) 1.2 x x x

Master 2 Informatique Probabilistic Learning and Data Analysis

O 3 O 4 O 5. q 3. q 4. Transition

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Today s Lecture: HMMs

Pair Hidden Markov Models

Learning from Sequential and Time-Series Data

10/17/04. Today s Main Points

Transcription:

Hidden Markov Models Hamid R. Rabiee 1

Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However most of the times these states are not directly observable, but there exist some observations giving information about the sequence of states. We assume that, conditioned on state Z k the observation X K is independent of other states and observations. 2

HMM Applications Some applications of HMM are: Speech recognition and processing Recognizing spoken words and phrases Speech synthesis Many applications in this field Text processing Parsing raw records into structured records Part-of-Speech Tagging Bioinformatics Protein sequence prediction Financial Stock market forecasts (price pattern prediction) Comparison shopping services 3

Example (Tracking) Recall the tracking problem discussed in previous slides. We modeled the object movement by a Markov process as: S t = A t S t 1 + v t ; v t ~N(0, Σ); S t = (X t, Y t, X t, Y t ) We don't know the exact location of the object, though there exists some observable information about the state sequence (obtained via sensors over time). The observation equation depends on the sensor. A simple case is: X t = CS t + Dw t ; w t ~N(0, Σ e ) Where C and D are two fixed matrices and w t is the sensor error at time t (modeled as a Gaussian additive noise). For solving the tracking problem we need to find P S 1, S 2, X 1, X 2,. 4

Specification of an HMM Components N - number of states Q = {q 1 ; q 2 ; ;q T } - set of states over time M - number of symbols (observables) O = {o 1 ; o 2 ; ;o T } - set of symbols over time A - the state transition probability matrix a ij = P(q t+1 = j q t = i) B- observation probability distribution b j (k) = P(o t = k q t = j) 1 k M π - the initial state distribution Full HMM is thus specified as a triplet: λ = (A,B,π) 5

Central problems in HMM modelling Problem 1: Evaluation: Probability of occurrence of a particular observation sequence, O = {o 1,,o k }, given the model - P(O λ) Complicated hidden states Useful in sequence classification Problem 2: Decoding: Optimal state sequence to produce given observations, O = {o 1,,o k }, given model Optimality criterion Useful in recognition problems Problem 3: Learning: Determine optimum model, given a training set of observations 6

Problem 1: Naïve solution State sequence Q = (q 1, q T ) Assume independent observations: Observations are mutually independent, given the hidden states. (Joint distribution of independent variables factorises into marginal distributions of the independent variables.) Observe that: And finally: T P(O q, P(o q, ) b (o )b (o )...b (o ) i1 t t q1 1 q2 2 qt T P(q ) a a...a q q q q q q q 1 1 2 2 3 T1 T P(O ) P(O q, )P(q ) q 7

Problem 1: Efficient solution Forward algorithm: A Dynamic Programming approach Define auxiliary forward variable α: t(i) P(o 1,...,o t qt i, ) α t (i) is the probability of observing a partial sequence of observables o 1, o t such that at time t, state q t =i Recursive algorithm: Initialise: (i) b (o ) 1 i i 1 Calculate: Obtain: N (j) [ (i)a ] b (o ) t1 t ij j t1 i1 N P(O ) i1 T (i) 8

Problem 1: Alternative solution Backward algorithm: Again Dynamic Programming Define auxiliary forward variable β: t (i) : the probability of observing a sequence of observables o t+1,,o T given state q t =i at time t, and Recursive algorithm: Initialise: Calculate: Terminate: (i) P(o,o,...,o q i, ) t t1 t2 T t (j) 1 T N t ( i ) t ( j ) a ( ) 1 ijb j ot 1 p(o ) j 1 N i1 1 (i) 9

Problem 2: Decoding Choose state sequence to maximise probability of observation sequence Viterbi algorithm - inductive algorithm that keeps the best state sequence at each instance Utilizes dynamic programming State sequence to maximise: (i) maxp(q,q,...,q i,o,o,...o ) Define auxiliary variable δ: P(q,q,...q O, ) 1 2 T t 1 2 t 1 2 t q δ t (i) the probability of the most probable path ending in state q t =i 10

Problem 2: Decoding Recurrent property: Algorithm: To get state seq, need to keep track of the argument that maximises this, for each t and j. Done via the array ψ t (j). 1. Initialise: 2. Recursion: 3. Terminate: (j) max( (i)a )b (o ) 1 i i 1 t1 t ij j t1 i (i) b (o ), 1 i N (i) 0 1 (j) max( (i)a )b (o ) t t1 ij j t 1 i N (j) arg max( (i)a ) t t1 ij 1 i N P max (i) 1 i N q arg max (i) T P* gives the state-optimised probability Q* is the optimal state sequence (Q* = {q1*,q2*,,qt*}) 4. Backtrack state sequence: q t = ψ t+1 q t+1 T 1 i N T 2 t T,1 j N ; t = T 1, T 2,, 1 11

Problem 3: Learning Training HMM to encode observation sequence such that HMM should identify a similar observation sequence in future Maximum likelihood criterion: Find λ=(a,b,π), maximising P(O λ) General algorithm: Initialise: λ 0 Compute new model λ, using λ 0 and observed sequence O Then o Repeat steps 2 and 3 until: logp(o ) logp(o 0) d We don t cover the learning algorithms in this course. 12

Word Recognition Example Typed word recognition, assume all characters are separated. Character recognizer outputs probability of the image being particular character, P (image character). a b c 0.5 0.03 0.005 z 0.31 Hidden state Observation 13

Word Recognition Example Hidden states of HMM = characters. Observations = typed images of characters segmented from the image. Note that there is an infinite number of observations. Observation probabilities = character recognizer scores. Transition probabilities will be defined differently in two subsequent models. 14

0.5 0.03 0.4 0.6 Word Recognition Example If lexicon is given, we can construct separate HMM models for each lexicon word. Amherst a m h e r s t Buffalo b u f f a l o Here recognition of word image is equivalent to the problem of evaluating few HMM models. This is an application of Evaluation problem. 15

Word Recognition Example We can construct a single HMM for all words. Hidden states = all characters in the alphabet. Transition probabilities and initial probabilities are calculated from language model. Observations and observation probabilities are as before. a f m o r t b h e s v Here we have to determine the best sequence of hidden states, the one that most likely produced word image. (an application of Decoding problem) 16

Acknowledgement Thanks to Jafar Muhammadi for preparation of slides. 17

Further Reading L. R. Rabiner, "A tutorial on Hidden Markov Models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, pp. 257-286, 1989. R. Dugad and U. B. Desai, "A tutorial on Hidden Markov models," Signal Processing and Artifical Neural Networks Laboratory, Dept of Electrical Engineering, Indian Institute of Technology, Bombay Technical Report No.: SPANN-96.1, 1996. Andrew W. Moore, HMM tutorial, www.autonlab.org/tutorials. 18