Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Similar documents
Hidden Markov Models

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Hidden Markov Models

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

( ) () we define the interaction representation by the unitary transformation () = ()

Clustering (Bishop ch 9)

Fall 2010 Graduate Course on Dynamic Learning

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Solution in semi infinite diffusion couples (error function analysis)

Department of Economics University of Toronto

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Lecture VI Regression

Part II CONTINUOUS TIME STOCHASTIC PROCESSES

Lecture 6: Learning for Control (Generalised Linear Regression)

Mechanics Physics 151

Comb Filters. Comb Filters

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Mechanics Physics 151

Lecture 2 M/G/1 queues. M/G/1-queue

Advanced time-series analysis (University of Lund, Economic History Department)

(,,, ) (,,, ). In addition, there are three other consumers, -2, -1, and 0. Consumer -2 has the utility function

Dishonest casino as an HMM

Graduate Macroeconomics 2 Problem set 5. - Solutions

( ) [ ] MAP Decision Rule

Reinforcement Learning

Displacement, Velocity, and Acceleration. (WHERE and WHEN?)

Machine Learning 4771

CS286.2 Lecture 14: Quantum de Finetti Theorems II

CHAPTER 2: Supervised Learning

Lecture 11 SVM cont

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

An introduction to Support Vector Machine

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

Robustness Experiments with Two Variance Components

Chapter 6: AC Circuits

WiH Wei He

Let us start with a two dimensional case. We consider a vector ( x,

FTCS Solution to the Heat Equation

CHAPTER 10: LINEAR DISCRIMINATION

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Foundations of State Estimation Part II

Notes on the stability of dynamic systems and the use of Eigen Values.

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Clustering with Gaussian Mixtures

Two Coupled Oscillators / Normal Modes

Testing a new idea to solve the P = NP problem with mathematical induction

Modélisation de la détérioration basée sur les données de surveillance conditionnelle et estimation de la durée de vie résiduelle

EP2200 Queuing theory and teletraffic systems. 3rd lecture Markov chains Birth-death process - Poisson process. Viktoria Fodor KTH EES

Pattern Classification (III) & Pattern Verification

Advanced Macroeconomics II: Exchange economy

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

NPTEL Project. Econometric Modelling. Module23: Granger Causality Test. Lecture35: Granger Causality Test. Vinod Gupta School of Management

Chapter Lagrangian Interpolation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Epistemic Game Theory: Online Appendix

Linear Response Theory: The connection between QFT and experiments

Computing Relevance, Similarity: The Vector Space Model

Let s treat the problem of the response of a system to an applied external force. Again,

Digital Speech Processing Lecture 20. The Hidden Markov Model (HMM)

Volatility Interpolation

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

Viterbi Algorithm: Background

Scattering at an Interface: Oblique Incidence

Mechanics Physics 151

A Tour of Modeling Techniques

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

Example: MOSFET Amplifier Distortion

Machine Learning 2nd Edition

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

, t 1. Transitions - this one was easy, but in general the hardest part is choosing the which variables are state and control variables

Multi-Modal User Interaction Fall 2008

Introduction to Boosting

2/20/2013. EE 101 Midterm 2 Review

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

TSS = SST + SSE An orthogonal partition of the total SS

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

d = ½(v o + v f) t distance = ½ (initial velocity + final velocity) time

Genetic Algorithm in Parameter Estimation of Nonlinear Dynamic Systems

V The Fourier Transform

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

Normal Random Variable and its discriminant functions

On One Analytic Method of. Constructing Program Controls

Robust and Accurate Cancer Classification with Gene Expression Profiling

Comparing Means: t-tests for One Sample & Two Related Samples

January Examinations 2012

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

Estimation of Poses with Particle Filters

Position, Velocity, and Acceleration

Physical Limitations of Logic Gates Week 10a

Transcription:

Hdden Markov Models Followng a lecure by Andrew W. Moore Carnege Mellon Unversy www.cs.cmu.edu/~awm/uorals

A Markov Sysem Has N saes, called s, s 2.. s N s 2 There are dscree meseps, 0,, s s 3 N 3 0 Hdden Markov Models: Slde 2

A Markov Sysem Has N saes, called s, s 2.. s N N 3 0 Curren Sae s 2 s s 3 There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } q q 0 s 3 Hdden Markov Models: Slde 3

A Markov Sysem Curren Sae Has N saes, called s, s 2.. s N N 3 q q s 2 s 2 s s 3 There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } Beween each mesep, he nex sae s chosen randomly. Hdden Markov Models: Slde 4

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s s 3 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 A Markov Sysem Has N saes, called s, s 2.. s N There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } Beween each mesep, he nex sae s chosen randomly. The curren sae deermnes he probably dsrbuon for he nex sae. Hdden Markov Models: Slde 5

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 /2 s 2 s /3 s 3 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 /2 Ofen noaed wh arcs beween saes A Markov Sysem Has N saes, called s, s 2.. s N There are dscree meseps, 0,, On he h mesep he sysem s n exacly one of he avalable saes. Call q Noe: q {s, s 2.. s N } Beween each mesep, he nex sae s chosen randomly. The curren sae deermnes he probably dsrbuon for he nex sae. Hdden Markov Models: Slde 6

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) /2 N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s /3 s 3 /2 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 Markov Propery q + s condonally ndependen of { q -, q -2, q, q 0 } gven q. In oher words: P(q + s j q s ) P(q + s j q s,any earler hsory) Queson: wha would be he bes Bayes Ne srucure o represen he Jon Dsrbuon of ( q 0, q, q 3,q 4 )? Hdden Markov Models: Slde 7

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) /2 N 3 q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s /3 s 3 /2 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 Markov Propery q + s condonally ndependen of { q -, q -2, q, q 0 } gven q. In oher words: P(q + s j q s ) P(q + s j q s,any earler hsory) Queson: wha would be he bes Bayes Ne srucure o represen he Jon Dsrbuon of ( q 0, q, q 2,q 3,q 4 )? Hdden Markov Models: Slde 8

P(q + s q s ) 0 P(q + s 2 q s ) 0 P(q + s 3 q s ) /2 Each of hese N probably 3 ables s dencal q q s 2 P(q + s q s 2 ) /2 P(q + s 2 q s 2 ) /2 P(q + s 3 q s 2 ) 0 s 2 s /3 s 3 /2 P(q + s q s 3 ) /3 P(q + s 2 q s 3 ) P(q + s 3 q s 3 ) 0 Markov Propery q + s condonally ndependen of { q -, q -2, q, q 0 } gven q. In oher words: P(q + s j q s ) P(q + s j q s,any earler hsory) Queson: wha would be he bes Bayes Ne srucure o represen he Jon Dsrbuon of ( q 0, q, q 2,q 3,q 4 )? Noaon: a j P( q+ s j q s ) Hdden Markov Models: Slde 9

A Blnd Robo A human and a robo wander around randomly on a grd H R STATE q Locaon of Robo, Locaon of Human Noe: N (num. saes) 8 * 8 324 Hdden Markov Models: Slde 0

Dynamcs of Sysem q 0 H Typcal Quesons: Wha s he expeced me unl he human s crushed lke a bug? R Each mesep he human moves randomly o an adjacen cell. And Robo also moves randomly o an adjacen cell. Wha s he probably ha he robo wll h he lef wall before hs he human? Wha s he probably Robo crushes human on nex me sep? Hdden Markov Models: Slde

Example Queson I s currenly me, and human remans uncrushed. Wha s he probably of crushng occurrng a me +? If robo s blnd: We can compue hs n advance. If robo s omnpoen: (I.E. If robo knows sae a me ), can compue drecly. If robo has some sensors, bu ncomplee sae nformaon Hdden Markov Models are applcable! We ll do hs frs Too Easy. We won do hs Man Body of Lecure Hdden Markov Models: Slde 2

Wha s P(q s)? slow, supd answer Sep : Work ou how o compue P(Q) for any pah Q q q 2 q 3.. q Gven we know he sar sae q (.e. P(q )) P(q q 2.. q ) P(q q 2.. q - ) P(q q q 2.. q - ) P(q q 2.. q - ) P(q q - ) P(q 2 q )P(q 3 q 2 ) P(q q - ) WHY? Sep 2: Use hs knowledge o ge P(q s) P( q s) Q" Pahs of! P( Q) lengh ha end n s Compuaon s exponenal n Hdden Markov Models: Slde 3

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon! p 0 ( )! j p + + ( j) P( q s j ) Hdden Markov Models: Slde 4

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) Hdden Markov Models: Slde 5

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Hdden Markov Models: Slde 6

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Remember, a j P( q+ s j q s ) N! P( q+ s j q s ) P( q s )! a N j p ( ) Hdden Markov Models: Slde 7

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Compuaon s smple. Jus fll n hs able n hs order: 0 : fnal p () 0 p (2) p (N) 0 N! P( q+ s j q s ) P( q s )! a N j p ( ) Hdden Markov Models: Slde 8

Wha s P(q s)? Clever answer For each sae s, defne p () Prob. sae s s a me P(q s ) Easy o do nducve defnon # $ ( f s s he sar sae p0 ) "! 0 oherwse! j p + + ( j) P( q s j ) N " P( q + s j! q s ) Cos of compung P () for all saes S s now O( N 2 ) The supd way was O(N ) Ths was a smple example I was mean o warm you up o hs rck, called Dynamc Programmng, because HMMs do many rcks lke hs. N! P( q+ s j q s ) P( q s )! a N j p ( ) Hdden Markov Models: Slde 9

Hdden Sae I s currenly me, and human remans uncrushed. Wha s he probably of crushng occurrng a me +? If robo s blnd: We can compue hs n advance. If robo s omnpoen: (I.E. If robo knows sae a me ), can compue drecly. If robo has some sensors, bu ncomplee sae nformaon Hdden Markov Models are applcable! We ll do hs frs Too Easy. We won do hs Man Body of Lecure Hdden Markov Models: Slde 20

Hdden Sae The prevous example red o esmae P(q s ) uncondonally (usng no observed evdence). Suppose we can observe somehng ha s affeced by he rue sae. Example: Proxmy sensors. (ell us he conens of he 8 adjacen squares) H R 0 W H W W W denoes WALL True sae q Wha he robo sees: Observaon O Hdden Markov Models: Slde 2

Nosy Hdden Sae Example: Nosy Proxmy sensors. (unrelably ell us he conens of he 8 adjacen squares) H True sae q R 0 W H W W Uncorruped Observaon W denoes WALL W H H W W Wha he robo sees: Observaon O Hdden Markov Models: Slde 22

Nosy Hdden Sae Example: Nosy Proxmy sensors. (unrelably ell us he conens of he 8 adjacen squares) H R 0 True sae q O s nosly deermned dependng on he curren sae. Assume ha O s condonally ndependen of {q -, q -2, q, q 0, O -, O -2, O, O 0 } gven q. In oher words: P(O X q s ) P(O X q s,any earler hsory) 2 W H W W Uncorruped Observaon W H H W denoes WALL W W Wha he robo sees: Observaon O Hdden Markov Models: Slde 23

Nosy Hdden Sae Example: Nosy Proxmy sensors. (unrelably ell us he conens of he 8 adjacen squares) H R 0 True sae q O s nosly deermned dependng on he curren sae. Assume ha O s condonally ndependen of {q -, q -2, q, q 0, O -, O -2, O, O 0 } gven q. In oher words: P(O X q s ) P(O X q s,any earler hsory) 2 W H W W Uncorruped Observaon W H H W denoes WALL W W Wha he robo sees: Observaon O Hdden Markov Models: Slde 24

Hdden Markov Models Our robo wh nosy sensors s a good example of an HMM Queson : Sae Esmaon Wha s P(q T S O O 2 O T ) I wll urn ou ha a new cue D.P. rck wll ge hs for us. Queson 2: Mos Probable Pah Gven O O 2 O T, wha s he mos probable pah ha I ook? And wha s ha probably? Ye anoher famous D.P. rck, he VITERBI algorhm, ges hs. Queson 3: Learnng HMMs: Gven O O 2 O T, wha s he maxmum lkelhood HMM ha could have produced hs srng of observaons? Very very useful. Uses he E.M. Algorhm Hdden Markov Models: Slde 25

Are H.M.M.s Useful? You be!! Robo plannng + sensng when here s uncerany Speech Recognon/Undersandng Phones Words, Sgnal phones Human Genome Projec Complcaed suff your lecurer knows nohng abou. Consumer decson modelng Economcs & Fnance. Plus a leas 5 oher hngs I haven hough of. Hdden Markov Models: Slde 26

HMM Noaon (from Rabner s Survey) The saes are labeled S S 2.. S N *L. R. Rabner, "A Tuoral on Hdden Markov Models and Seleced Applcaons n Speech Recognon," Proc. of he IEEE, Vol.77, No.2, pp.257--286, 989. For a parcular ral. Le T T be he number of observaons s also he number of saes passed hrough O O O 2.. O T s he sequence of observaons Q q q 2.. q T s he noaon for a pah of saes λ N,M,{π, },{a j },{b (j)} s he specfcaon of an HMM Hdden Markov Models: Slde 27

HMM Formal Defnon An HMM, λ, s a 5-uple conssng of N he number of saes M he number of possble observaons {π, π 2,.. π N } The sarng sae probables P(q 0 S ) π Ths s new. In our prevous example, sar sae was deermnsc a a 22 a N a 2 a 22 a 2N : : : a N a N2 a NN b () b (2) b (M) b 2 () b 2 (2) b 2 (M) : : : b N () b N (2) b N (M) The sae ranson probables P(q + S j q S )a j The observaon probables P(O k q S )b (k) Hdden Markov Models: Slde 28

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π /2 π 2 /2 /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. a 0 a 2 /3 a 3 a 2 /3 a 22 0 a 3 a 3 /3 a 32 /3 a 3 /3 b (X) /2 b (Y) /2 b (Z) 0 b 2 (X) 0 b 2 (Y) /2 b 2 (Z) /2 b 3 (X) /2 b 3 (Y) 0 b 3 (Z) /2 Hdden Markov Models: Slde 29

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween S and S 2 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 O 0 b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 O O 2 Hdden Markov Models: Slde 30

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween X and Y a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 O O 2 Hdden Markov Models: Slde 3

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ S 2 q 0 q q 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: Goo S 3 wh probably or S 2 wh prob. /3 S O 0 O O 2 X Hdden Markov Models: Slde 32

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween Z and X a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 X b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 S 3 O O 2 Hdden Markov Models: Slde 33

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ S 2 q 0 q q 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: Each of he hree nex saes s equally lkely S S 3 O 0 O O 2 X X Hdden Markov Models: Slde 34

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: 50-50 choce beween Z and X a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 X b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 S 3 S 3 O O 2 X Hdden Markov Models: Slde 35

Here s an HMM S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 S 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ q 0 S O 0 X b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ q q 2 S 3 S 3 O O 2 X Z Hdden Markov Models: Slde 36

Sae Esmaon S XY /3 /3 /3 ZX S 3 N 3 M 3 π ½ π 2 ½ /3 π 3 0 Z Y /3 a 0 a 2 ⅓ a 3 ⅔ a 2 ⅓ a 22 0 a 3 ⅔ a 3 ⅓ a 32 ⅓ a 3 ⅓ b (X) ½ b (Y) ½ b (Z) 0 b 2 (X) 0 b 2 (Y) ½ b 2 (Z) ½ b 3 (X) ½ b 3 (Y) 0 b 3 (Z) ½ S 2 q 0 q q 2 Sar randomly n sae or 2 Choose one of he oupu symbols n each sae a random. Le s generae a sequence of observaons: Ths s wha he observer has o work wh??? O 0 O O 2 X X Z Hdden Markov Models: Slde 37

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) S XY /3 /3 /3 ZX S 3 /3 Z Y /3 S 2 How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? Hdden Markov Models: Slde 38

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) S XY /3 P(Q) P(q,q 2,q 3 ) /3 /3 ZX S 3 /3 Z Y /3 S 2 How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? P(q ) P(q 2,q 3 q ) (chan rule) P(q ) P(q 2 q ) P(q 3 q 2,q ) (chan) P(q ) P(q 2 q ) P(q 3 q 2 ) (why?) Example n he case Q S S 3 S 3 : /2 * * /3 /9 Hdden Markov Models: Slde 39

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? S P(O Q) XY /3 P(O O 2 O 3 q q 2 q 3 ) P(O q ) P(O 2 q 2 ) P(O 3 q 3 ) (why?) Example n he case Q S S 3 S 3 : P(X S ) P(X S 3 ) P(Z S 3 ) /2 * /2 * /2 /8 /3 /3 ZX /3 S 3 Z Y /3 S 2 Hdden Markov Models: Slde 40

Prob. of a seres of observaons Wha s P(O) P(O O 2 O 3 ) P(O X ^ O 2 X ^ O 3 Z)? Slow, supd way: P( O)! Q" Pahs of lengh 3! Q" Pahs of lengh 3 P( O # Q) P( O Q) P( Q) How do we compue P(Q) for an arbrary pah Q? How do we compue P(O Q) for an arbrary pah Q? S P(O Q) XY /3 /3 /3 ZX P(O O 2 O 3 q q 2 q 3 ) /3 S 3 Z Y /3 P(O) would need 27 P(Q) S 2 P(O q ) P(O 2 q 2 ) P(O 3 q 3 ) (why?) compuaons and 27 P(O Q) compuaons Example n he case Q S S 3 S 3 : P(X S ) P(X S 3 ) P(Z S 3 ) /2 * /2 * /2 /8 So le s be smarer A sequence of 20 observaons would need 3 20 3.5 bllon compuaons and 3.5 bllon P(O Q) compuaons Hdden Markov Models: Slde 4

The Prob. of a gven seres of observaons, non-exponenal-cos-syle Gven observaons O O 2 O T Defne α () P(O O 2 O q S λ) where T α () Probably ha, n a random ral, We d have seen he frs observaons We d have ended up n S as he h sae vsed. In our example, wha s α 2 (3)? Hdden Markov Models: Slde 42

Hdden Markov Models: Slde 43 α (): easy o defne recursvely α () P(O O 2 O T q S λ) (α () can be defned supdly by consderng all pahs lengh. How?) ( ) ( ) ( ) ( ) ( ) ( )!! + + + j S q O O O O j S q O S q S q O 2... P wha? P P P " "

Hdden Markov Models: Slde 44 α (): easy o defne recursvely α () P(O O 2 O T q S λ) (α () can be defned supdly by consderng all pahs lengh. How?) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) O b a S q O S q S q S q S q O S q O O O S q O O O S q O S q O S q O O O S q O O O O j S q O S q S q O j j j j j N j N j j!!!!! " " " " " + + + + + + + + + + + + + # # # # # # # 2 2 2 2 P P, P... P..., P... P... P wha? P P P

n our example!!! ( ) P( O O O % q S #) 2.. ( ) b ( O )" ( j) a b ( O )! ( ) + $ j j + S XY /3 /3 /3 ZX S 3 /3 Z Y /3 S 2 WE SAW O O 2 O 3 X X Z!!! 2 3 4 ( )! ( 2) 0! ( 3) ( ) 0! ( 2) 0! ( 3) 72 ( ) 0! ( 2)! ( 3) 3 2 2 3 0 2 72 Hdden Markov Models: Slde 45

Easy Queson We can cheaply compue α ()P(O O 2 O q S ) (How) can we cheaply compue P(O O 2 O )? (How) can we cheaply compue P(q S O O 2 O ) Hdden Markov Models: Slde 46

Easy Queson We can cheaply compue α ()P(O O 2 O q S ) (How) can we cheaply compue P(O O 2 O )? N! " ( ) (How) can we cheaply compue P(q S O O 2 O ) " ( ) N! j " ( j) Hdden Markov Models: Slde 47

Mos probable pah gven observaons Wha's mos probable pah gven Wha s Slow, supd answer : argmax Q Q Q argmax argmax argmax Q P P P ( Q O O... O ) ( Q O O... O ) ( O O... O Q) P P T ( O O... O ) P( Q) ( O O... O Q) P( Q) 2 2 2 2 T T T 2 O O T 2?... O T,.e. Hdden Markov Models: Slde 48

Effcen MPP compuaon We re gong o compue he followng varables: δ () max P(q q 2.. q - q S O.. O ) q q 2..q - The Probably of he pah of Lengh - wh he maxmum chance of dong all hese hngs: OCCURING and ENDING UP IN STATE S and PRODUCING OUTPUT O O DEFINE: So: mpp () ha pah δ () Prob(mpp ()) Hdden Markov Models: Slde 49

" mpp " The Verb Algorhm ( ) P( q q... q # q S # O O.. O ) ( ) P( q q... q # q S # O O.. O ) ( ) one choce P( q S # O ) q q P q q ( q S ) P( O q S ) 2 arg max 2 max max... q... q ( O ) $ $! b Now, suppose we have all he δ () s and mpp () s for all. 2 2 $ $ HOW TO GET δ + (j) and mpp + (j)? 2 2 mpp () Probδ () mpp (2) : mpp (N) S? S 2 : Probδ (2) S N Probδ (N) S j q q + Hdden Markov Models: Slde 50

The Verb Algorhm me me + S : S j S : The mos prob pah wh las wo saes S S j s he mos prob pah o S, followed by ranson S S j Hdden Markov Models: Slde 5

The Verb Algorhm me me + S : S j S : The mos prob pah wh las wo saes S S j he mos prob pah o S, followed by ranson S S j Wha s he prob of ha pah? δ () x P(S S j O + λ) δ () a j b j (O + ) SO The mos probable pah o S j has S * as s penulmae sae where *argmax δ () a j b j (O + ) s Hdden Markov Models: Slde 52

The Verb Algorhm me me + S : S j S : The mos prob pah wh las wo saes S S j he mos prob pah o S, followed by ranson S S j Wha s he prob of ha pah? δ () x P(S S j O + λ) δ () a j b j (O + ) SO The mos probable pah o S j has S * as s penulmae sae where *argmax δ () a j b j (O + ) s Summary: δ + (j) δ (*) a j b j (O + ) mpp + (j) mpp + (*)S * } wh * defned o he lef Hdden Markov Models: Slde 53

Wha s Verb used for? Sgnal words Classc Example Speech recognon: HMM observable s sgnal Hdden sae s par of word formaon Wha s he mos probable word gven hs sgnal? UTTERLY GROSS SIMPLIFICATION In pracce: many levels of nference; no one bg jump. Hdden Markov Models: Slde 54

HMMs are used and useful Bu how do you desgn an HMM? Occasonally, (e.g. n our robo example) s reasonable o deduce he HMM from frs prncples. Bu usually, especally n Speech or Genecs, s beer o nfer from large amouns of daa. O O 2.. O T wh a bg T. Observaons prevously n lecure O O 2.. O T Observaons n he nex b O O 2.. O T Hdden Markov Models: Slde 55

Inferrng an HMM Remember, we ve been dong hngs lke P(O O 2.. O T λ ) Tha λ s he noaon for our HMM parameers. Now We have some observaons and we wan o esmae λ from hem. AS USUAL: We could use () MAX LIKELIHOOD λ argmax P(O.. O T λ) λ () BAYES Work ou P( λ O.. O T ) and hen ake E[λ] or max P( λ O.. O T ) λ Hdden Markov Models: Slde 56

Max lkelhood HMM esmaon Defne γ () P(q S O O 2 O T, λ ) ε (,j) P(q S q + S j O O 2 O T,λ ) γ () and ε (,j) can be compued effcenly,j, (Deals n Rabner paper) T "! $ T "!# ( ) (, j) Expeced number of ransons ou of sae durng he pah Expeced number of ransons from sae o sae j durng he pah Hdden Markov Models: Slde 57

% $ T # " T # " ( ) P( q S OO 2.. OT,&) (, j) P( q S! q S O O.. O,&) % $ ( ) + j 2 T expeced number of ransons ou of sae durng pah (, j) expeced number of ransons ou of and no j durng pah HMM esmaon Noce Esmae of b a (, j) ( ) Prob We can re - esmae (, j) ( ) We can also re - esmae j j T * ) T * )! ) ) ' expeced frequency$ % " & ( j # ' expeced frequency$ % " & # ( Nex sae S Ths sae S ) ( O )!L (See Rabner) k, +, + Hdden Markov Models: Slde 58 j

EM for HMMs If we knew λ we could esmae EXPECTATIONS of quanes such as Expeced number of mes n sae Expeced number of ransons j If we knew he quanes such as Expeced number of mes n sae Expeced number of ransons j We could compue he MAX LIKELIHOOD esmae of λ {a j },{b (j)}, π Roll on he EM Algorhm Hdden Markov Models: Slde 59

EM 4 HMMs. Ge your observaons O O T 2. Guess your frs λ esmae λ(0), k0 3. k k+ 4. Gven O O T, λ(k) compue γ (), ε (,j) T, N, j N 5. Compue expeced freq. of sae, and expeced freq. j 6. Compue new esmaes of a j, b j (k), π accordngly. Call hem λ(k+) 7. Goo 3, unless converged. Also known (for he HMM case) as he BAUM-WELCH algorhm. Hdden Markov Models: Slde 60

Bad News There are los of local mnma Good News The local mnma are usually adequae models of he daa. Noce EM does no esmae he number of saes. Tha mus be gven. Ofen, HMMs are forced o have some lnks wh zero probably. Ths s done by seng a j 0 n nal esmae λ(0) Easy exenson of everyhng seen oday: HMMs wh real valued oupus Hdden Markov Models: Slde 6

Bad News There are los of local mnma Trade-off beween oo few saes (nadequaely modelng he srucure n he daa) and oo many (fng he nose). Thus #saes s a regularzaon parameer. Good News Blah blah blah bas varance radeoff blah blah cross-valdaon blah blah.aic, The local mnma BIC.blah are usually blah adequae (same ol same models ol ) of he daa. Noce EM does no esmae he number of saes. Tha mus be gven. Ofen, HMMs are forced o have some lnks wh zero probably. Ths s done by seng a j 0 n nal esmae λ(0) Easy exenson of everyhng seen oday: HMMs wh real valued oupus Hdden Markov Models: Slde 62

Wha You Should Know Wha s an HMM? Compung (and defnng) α () The Verb algorhm Oulne of he EM algorhm To be very happy wh he knd of mahs and analyss needed for HMMs DON T PANIC: sars on p. 257. Farly horough readng of Rabner* up o page 266* [Up o bu no ncludng IV. Types of HMMs ]. *L. R. Rabner, "A Tuoral on Hdden Markov Models and Seleced Applcaons n Speech Recognon," Proc. of he IEEE, Vol.77, No.2, pp.257--286, 989. Hdden Markov Models: Slde 63