Hidden Markov Models

Similar documents
Reinforcement Learning

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Reinforcement learning II

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

2D1431 Machine Learning Lab 3: Reinforcement Learning

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

LECTURE NOTE #12 PROF. ALAN YUILLE

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

19 Optimal behavior: Game theory

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Bellman Optimality Equation for V*

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Convert the NFA into DFA

Operations with Polynomials

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Module 9: Tries and String Matching

Module 9: Tries and String Matching

Where did dynamic programming come from?

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

38.2. The Uniform Distribution. Introduction. Prerequisites. Learning Outcomes

CS 188: Artificial Intelligence Fall 2010

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

CS 188: Artificial Intelligence

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Java II Finite Automata I

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

1 The fundamental theorems of calculus.

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

Monte Carlo method in solving numerical integration and differential equation

Tests for the Ratio of Two Poisson Rates

Nondeterminism and Nodeterministic Automata

Chapters 4 & 5 Integrals & Applications

Automata and Languages

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Fingerprint idea. Assume:

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

Classification Part 4. Model Evaluation

Non-deterministic Finite Automata

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Review of Calculus, cont d

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

Recursively Enumerable and Recursive. Languages

Acceptance Sampling by Attributes

Administrivia CSE 190: Reinforcement Learning: An Introduction

Minimal DFA. minimal DFA for L starting from any other

Theory of Computation Regular Languages

Non-deterministic Finite Automata

1 Probability Density Functions

Spectral Regularization for Max-Margin Sequence Tagging

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Matrices and Determinants

Regular expressions, Finite Automata, transition graphs are all the same!!

Chapter 11. Sequence and Series

Non-Linear & Logistic Regression

Problem Set 9. Figure 1: Diagram. This picture is a rough sketch of the 4 parabolas that give us the area that we need to find. The equations are:

NUMERICAL INTEGRATION

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Fundamentals of Computer Science

New Expansion and Infinite Series

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

Continuous Random Variables

Chapter 5 : Continuous Random Variables

Math 31S. Rumbos Fall Solutions to Assignment #16

1 The fundamental theorems of calculus.

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

1 Online Learning and Regret Minimization

Worked out examples Finite Automata

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

{ } = E! & $ " k r t +k +1

Interpreting Integrals and the Fundamental Theorem

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

Chapter 4: Dynamic Programming

Physics 202H - Introductory Quantum Physics I Homework #08 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/11/15

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Lexical Analysis Finite Automate

Learning Partially Observable Markov Models from First Passage Times

CS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power

Finite Automata-cont d

This lecture covers Chapter 8 of HMU: Properties of CFLs

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

CS 275 Automata and Formal Language Theory

Lexical Analysis Part III

Good-for-Games Automata versus Deterministic Automata.

Name Solutions to Test 3 November 8, 2017

MATH 144: Business Calculus Final Review

5.1 Definitions and Examples 5.2 Deterministic Pushdown Automata

Line Integrals. Partitioning the Curve. Estimating the Mass

Learning Moore Machines from Input-Output Traces

The Thermodynamics of Aqueous Electrolyte Solutions

We partition C into n small arcs by forming a partition of [a, b] by picking s i as follows: a = s 0 < s 1 < < s n = b.

Research Article Moment Inequalities and Complete Moment Convergence

Transcription:

Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1

Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems: Evlution: Forwrd lgorithm Bckwrd lgorithm Decoding: Lerning: Viterbi lgorithm Forwrd-Bckwrd Algorithm Speech recogition with HMM Isolted word recognizer Mesured performnce Conclusion 2

Mrkov Models - A System of n sttes : the system is t time t in stte w(t) - Chnges of stte re nondeterministic but depending on the previous sttes In First order,..., n-order model chnges depend on the previous 1,..., n sttes - Trnsition probbilities: Often shown s stte trnsition mtrix - Vector of initil probbilities ϖ 3

Exmple: Wether system Three sttes: sunny, cloudy, riny Trnsition Mtrix A A = Ê 0,5 Á Á0,375 Á Ë 0,125 0,25 0,125 0,625 0,25 ˆ 0,375 0,375 Stte vector p = (1,0,0) T 4

Hidden Mrkov Models - n invisible sttes - every stte emits t time t visible symbol/stte v(t) - System genertes sequence of symbols (sttes) V T = { v(1), v(2), v(3),..., v(n)} - Trnsition probbility: P(w(t+1)= w(t)=i) = i = Trnsition mtrix A - Probbility of emitting v(t) in stte w(t)=: P( v(t) w(t)=) = b (k) = Confusion mtrix B - Vector of initil probbilities ϖ - Normliztion conditions  i= 1 for ll i  b k) = 1 ( for ll k 5

Exmple of hidden Mrkov model: Indirect observtion of the wether using piece of seweed 6

Trnsition mtrix A sun clouds rin sun clouds rin Confusion mtrix B Ê 0,5 Á Á0,375 Á Ë 0,125 0,25 0,125 0,625 0,25 ˆ 0,375 0,375 sun clouds rin Stte vector Dry Dryish Dmp Soggy Ê 0,6 0,2 0,15 0,05ˆ Á Á0,25 0,25 0,25 0,25 Á Ë0,05 0,1 0,35 0,5 p = (1,0,0) T 7

Types of Hidden Mrkov Models Ergodic All sttes cn be reched within one step from everywhere Trnsition mtrix A entries re never zero A = Ê Á Á Á Á Ë 11 21 31 41 12 22 32 42 13 23 33 43 14 24 34 44 ˆ 8

Left-right Models No bckleding trnsitions i = 0 < i Additionl constrints: i = 0 > i + (f.e. = 2 => no umps of more thn 2 sttes) A = Ê Á Á 0 Á 0 Á Ë 0 11 12 22 0 0 13 23 33 0 0 24 34 44 ˆ 9

Applictions using HMM Speech recognition Lnguge modelling Protein sequence nlysis Recognition of hnd writings Finncil/Economic Models 10

Three centrl problems 1. Evlution Given HMM (A,B,ϖ) : Find the probbility tht sequence of visible sttes V T ws generted by tht model 2. Decoding Given HMM (A,B,ϖ) nd set of observtions: Find the most probble sequence of hidden sttes tht led to those observtions 3. Lerning Given the number of visible nd hidden sttes nd some sequences of trining observtions: Find the prmeters i nd b (k) 11

Evlution: Probbility tht the system M produces sequence V T P( V T ) = r mx  r= 1 P( V T w T r ) P( w For ll possible sequences w rt ={w(1), w(2),..., w(t)} Tht mens: Tke ll sequences of hidden sttes, clculte the probbility tht they clculted V T nd dd them N T  T P( V ) = P( v( t) w( t)) P( w( t) w( t r= 1 t= 1 T r ) -1)) N is the number of sttes, T is the number of visible symbols / steps to go Problem: Complexity is o(n T T) 12

The Forwrd logithm: Clculte the Problem recursively ( t) Ï Ô N = Ì Ô Ó Â i= 1 p b ( t i ( v(0)) -1) i b ( v( t)) t=0 nd = initil stte else b (v(t)) mens the probbility to emit the stte selected by v(t) (t) is the probbility tht the model is in stte nd hs produced the first t elements of V T 13

Forwrd lgorithm initilize (0)= p()b (v(0)), t=0 i,b,visible sequence V T for t <= t + 1 ( t) ææ Â = i ( t -1) N i 1 i b ( v( t)) for ll N until t = T return P(V T ) = finl (T) end Complexity of this lgorithm: o(n 2 T) 14

Exmple: Forwrd lgorithm Stte 1 Stte 2 Stte 3 t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 15

Link to Jv pplet exmple 16

Bckwrd lgorithm initilize b i (T)= 1, t=t i, b k, visible sequence V T for t <= t - 1 until t = 1 return P(V T ) = b i (0) end N b ( t) ææ Â = b ( t + 1) b ( v( t i 1 i i + 1)) for ll < N b i (t) is the probbility tht the model is in stte nd will produce the lst T - t elements of V T 17

Decoding (Viterbi Algorithm) Finds the sequence of hidden sttes in N-stte model M tht most probble generted sequence V T = {v(0),v(1),...,v(t)} of visible sttes. Cn be recursively clculted with the Viterbi lgorithm: d i (t) = mx P(w(1),..., w(t)=i, v(1),...,v(t) M ) over ll pthes w recursively: d (t) = mx i [d i (t-1) i ] b (v(t)) with d (0) = p(i) b i (v(0)) d i (t) is the mximl probbility long pth w(1),..., w(t) = i to generte the sequence v(1),...,v(t) (prtil best pth) to keep trck of the pth mximizing d i (t) the rry y (t) is used 18

Sequence clcultion: initilize d i (0) = p(i)b i (v(0)), y i (0)=0, t=0 für lle i for t <= t + 1 for ll sttes d (t)= mx 1 i N [d i (t-1) i ] b (v(t)) y (t)= rg mx 1 i N [d i (t-1) i ] until t = T Sequence Termintion: w(t) = rg mx 1 i N [d i (t-1) i ] Sequence bcktrcking: for t = T-1, t <= t - 1 w(t) = y w(t+1) (t+1) until t = 0 19

Exmple: 20

Link to Jv pplet exmple 21

Positive spects of the Viterbi Algorithm Reduction of computtionl complexity by recursion Tkes the entire context to find optiml solution, therefore lower error rtes with noisy dt 22

Lerning (Forwrd Bckwrd lgorithm) determine the N-stte model prmeters i nd b k bsed on trining sequence by itertively clculting better vlues Definition: i i + i ( t) x ( t) = i i b Probbility of trnsition w i (t) to w (t+1) given the model M generted V T by ny pth T x ( t) = P( w ( t), w ( t 1) V, M ) x ( t) i ( v( t P( V ( t) T b + 1)) b ( t M ) + 1) ( v( t + 1)) b ( t + 1) i i = Â N Â N = = + + k l ki ( t) klbl ( v( t 1)) bl ( t 1 1 1) N g = Â i ( t) = x 1 i ( t) Probbility of being in stte w i t time t 23

24

 T - 1 g ( t= 1 i t )  T t = g i ( t 1  T - 1 x ( t= 1 i t ) ) Expected number of trnsitions from w i to ny other stte Expected number of times in w i Expected number of trnsitions from w i to w which gives us better vlues for i ' i   T -1 t= 1 = T -1 t= 1 x ( t) i g ( t) i T -1  t= 1 = T - 1 N  t= 1 k = 1 ( t) it k i ( t) b ( v( t + 1)) b ( t ik i b k ( v( t + 1)) b ( t k + 1) + 1) 25

26 Â Â = = = T t T t k t k t v b t v b 1 1 ) ( ) ( ) ( ) ( ' g g Expected number of times in w i Expected number of times in w i emitting v k p (i) = g i (0) Probbility of being in stte i t time 0 Clcultion of the b

Positive Aspect rbitrry precision of estimtion Problems How to choose the initil prmeters i nd b? For ϖ nd i either rndom or uniform for the b better initil estimtes re very useful for fst convergence estimte these prmeters by: Mnul segmenttion Mximum Likelihood segmenttion 27

Wht is the pproprite model Must be decided on the kind of signl modeled In speech recognition often left-right model is used to model the dvncing of time f.e. every sylble get stte nd finl silent stte 28

Scling Problem: The i nd b i re lwys smller thn 1 so the clcultions converge ginst zero This exceeds the precision rnge even in double precision Solution: Multiply the i nd b i by scling coefficient c t tht is independent from i but depends on t For exmple: c t = N Â i= 1 1 ( t) i These prmeters fll out in the clcultion of i nd b 29

Speech recognizers using HMMs Isolted Word recognizer Build HMM for ech word in vocbulry nd clculte the (A,B,ϖ) prmeters (trin the model) For ech word to be recognized: - Feture nlysis (Vector quntiztion) Genertes Observtion Vectors from the signl - Run Viterbi on ll models to find the most probble model for tht observtion sequence 30

Block digrm of n isolted word recognizer 31

) log energy nd b) stte ssignment for the word six 32

Mesured performnce of n isolted word recognizer 100 digits by 100 tlkers(50 femle / 50 mle) Originl Trining: the originl trining set ws used TS2: the originl spekers s in the trining TS3: Complete new set of spekers TS4: Another new set of spekers 33

Conclusion There re vrious processes where the rel ctivity is invisible nd only generted pttern cn be observed. These cn be modeled by HMM Limittions of HMM : needs enough trining dt The Mrkov ssumption tht ech stte only depends on the previous stte is not lwys true Advntges Acceptble clcultion complexity Low error rtes HMM re the predominnt method for current utomtic speech recognition nd ply gret role in other recognition systems 34

Bibliogrphy Rbiner, L.R. A Tutoril on Hidden Mrkov Models nd Selected Applictions in Speech Recognition; Proceedings of the IEEE, Vol.77, Iss.2, Feb 1989; Pges:257-286 Richrd O. Dud, Peter E. Hrt, Dvid G. Stork Pttern Clssifiction chpter 3.10 Eric Keller (Editor) Fundmentls of Speech synthesis nd Speech recognition Chpter 8 by Kri Torkkol Used Websites http://www.comp.leeds.c.uk/roger/hiddenmrkovmodels/html_dev/min.html Introduction to Hidden Mrkov Models, University of Leeds http://www.cnel.ufl.edu/~ydu/report1.html Speech recognition using Hidden Mrkov Models, Ydunndn Ngr Ro, University of Florid 35