Hidden Markov Models. Representing sequence data. Markov Models. A dice-y example 4/26/2018. CISC 5800 Professor Daniel Leeds Π A = 0.3, Π B = 0.

Similar documents
Hidden Markov Models. Representing sequence data. Markov Models. A dice-y example 4/5/2017. CISC 5800 Professor Daniel Leeds Π A = 0.3, Π B = 0.

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Machine Learning for natural language processing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Introduction to Machine Learning CMU-10701

An Introduction to Bioinformatics Algorithms Hidden Markov Models

COMP90051 Statistical Machine Learning

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models

Hidden Markov Modelling

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Data-Intensive Computing with MapReduce

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

HMM: Parameter Estimation

Lecture 12: Algorithms for HMMs

Advanced Data Science

Lecture 13: Structured Prediction

Lecture 12: EM Algorithm

Lecture 12: Algorithms for HMMs

Hidden Markov Models. Three classic HMM problems

Statistical Methods for NLP

Parametric Models Part III: Hidden Markov Models

Statistical Machine Learning from Data

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Basic math for biology

CS 7180: Behavioral Modeling and Decision- making in AI

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Linear Dynamical Systems

Hidden Markov Models Part 2: Algorithms

( ).666 Information Extraction from Speech and Text

Hidden Markov Models

Statistical Processing of Natural Language

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Statistical Methods for NLP

Sequence Labeling: HMMs & Structured Perceptron

Expectation Maximization (EM)

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Dynamic Programming: Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models

Hidden Markov Models in Language Processing

Hidden Markov models

Hidden Markov Models. x 1 x 2 x 3 x K

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Math 350: An exploration of HMMs through doodles.

A gentle introduction to Hidden Markov Models

AN INTRODUCTION TO TOPIC MODELS

Data Mining in Bioinformatics HMM

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Stephen Scott.

Tutorial on Hidden Markov Model

Info 2950, Lecture 25

CS 7180: Behavioral Modeling and Decision- making in AI

Hidden Markov Models. Hosein Mohimani GHC7717

Introduction to Hidden Markov Models (HMMs)

Hidden Markov Models for biological sequence analysis

Further details of the Baum-Welch algorithm

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Models (HMMs) November 14, 2017

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

STA 414/2104: Machine Learning

Hidden Markov Models. Terminology and Basic Algorithms

BMI/CS 576 Fall 2016 Final Exam

Multiscale Systems Engineering Research Group

Automatic Speech Recognition (CS753)

Fun with weighted FSTs

Computational Genomics and Molecular Biology, Fall

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Intelligent Systems (AI-2)

Hidden Markov Models

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Dept. of Linguistics, Indiana University Fall 2009

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models. x 1 x 2 x 3 x K

ASR using Hidden Markov Model : A tutorial

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

HIDDEN MARKOV MODELS

order is number of previous outputs

Hidden Markov Models

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Statistical NLP: Hidden Markov Models. Updated 12/15

Transcription:

Representing sequence data Hidden Markov Models CISC 5800 Professor Daniel Leeds Spoken language DNA sequences Daily stock values Example: spoken language F?r plu? fi?e is nine Between F and r expect a vowel: aw, ee, ah ; NO oh, uh At end of plu expect consonant: g, m, s ; NO d, p 2 Markov Models Start with: n states: s 1,, s n Probability of initial start states: Π 1,, Π n Probability of transition between states: A i, = P(q t =s i q t-1 =s ).5.3 1.2.9 3.8.1 2.2 3 A dice-y example wo colored die What is the probability we start at s A? 0.3 What is the probability we have the sequence of die choices: s A, s A? 0.3x0.8=0.24 What is the probability we have the sequence of die choices: s B, s A, s B, s A? 0.7x0.2x0.2x0.2 = 0.0056 5 1

A dice-y example What is the probability we have the die choices s B at time t=5 Dynamic programming: find answer for q t, then compute q t+1 State\ime t 1 t 2 t 3 s A 0.3 0.38 0.428 s B 0.7 0.62 0.572 p t i = p q t = s i q t 1 = s p t 1 () p t (i) = P(q t =s i ) -- Probability state i at time t 7 Hidden Markov Models Probability observe value x i Actual state q hidden when state is s State produces visible data o: φ i, = P(o t = x i q t = s ) Compute Q O P O, Q θ = p(q 1 π) A A A p(q t q t 1, A) Probability of state sequence φ Probability of observation sequence, given states 8 Deducing die based on observed emissions Each color is biased Intuition balance transition and emission probabilities Observed numbers: 554565254556 the 2 is probably from s B Observed numbers: 554565213321 the 2 is probably from s A 9 Deducing die based on observed emissions Each color is biased o P(o s R ) P(o s B ) We see: 5 What is probability of o=5, q=b (blue) Π B φ 5,B = 0.7 x 0.2 = 0.14 We see: 5, 3 What is probability of o=5,3, q=b, B? Π B φ 5,B A B,B φ 3,B = 0.7 x 0.2 x 0.8 x 0.1 = 0.0112 11 2

Goal: calculate most likely states given observable data Viterbi algorithm: δ t (i) Define and use δ t (i) δ t (i) : max possible value of P(q 1,..,q t,o 1,..,o t ) given we insist q t =s i Find the most likely path from q 1 to q t that q t =s i Outputs are o 1,, o t 12 δ 1 i = Π i P o 1 q 1 = s i δ t i = P o t q t = s i φ ot,i max δ t 1 A i, = Π i φ 1,i P(Q* O)=argmax Q P(Q O) = argmax i δ t i max δ t 1 P q t = s i q t 1 = s = 13 Viterbi algorithm: bigger picture Compute all δ t (i) s At time compute δ 1 (i) for every state i At time compute δ 2 (i) for every state i (based on δ 1 i values) At time t= compute δ (i) for every state i (based on δ 1 i values) Find states going from t= back to to lead to max δ (i) Now find state that gives maximum value for δ () Find state k at time -1 used to maximize δ () Find state z at time 1 used to maximize δ 2 (y) 14 Viterbi in action: observe 5, 1 (o 1 =5) (o 2 =1) q t =s A.3x.1 =.03.0084 (from B) q t =s B.7x.2 =.14.0112 (from B) δ 2 A :.3 x max(.8x.03,.2 x.14 ) =.3 x.028 =.0084 δ 2 B :.1 x max(.2x.03,.8 x.14 ) =.1 x.112 =.0112 16 3

Viterbi in action: observe 5, 1, 1 Viterbi in action: observe 5, 1, 1, 1 δ 3 A :.3 x max(.8x.0084,.2 x.0112 ) =.3 x.00672 =.00202 δ 3 B :.1 x max(.2x.0084,.8 x.0112 ) =.1 x.00896 =.000896 δ 3 A :.3 x max(.8x.00202,.2 x.0009 ) =.3 x.0016 =.00048 δ 3 B :.1 x max(.2x.00202,.8 x.0009 ) =.1 x.000717 =.000717 (o 1 =5) (o 2 =1) t=3 (o 3 =1) q t =s A.3x.1 =.03.0084 (from B).00202 (from A) q t =s B.7x.2 =.14.0112 (from B).000896 (from B) 17 (o 1 =5) (o 2 =1) t=3 (o 3 =1) t=4 (o 4 =1) q t =s A.3x.1 =.03.0084 (from B).00202 (from A).00048 (from A) q t =s B.7x.2 =.14.0112 (from B).000896 (from B).00072 (from B) 18 Viterbi in action: observe 5, 1, 1, 1, 2 δ 3 A :.2 x max(.8x.00048,.2 x.00072 ) =.2 x.00038 =.000076 δ 3 B :.1 x max(.2x.00048,.8 x.00072 ) =.1 x.00058 =.000058 Parameters in HMM Initial probabilities: π i ransition probabilities Emission probabilities φ i, A i, How do we learn these values? (o 1 =5) (o 2 =1) t=3 (o 3 =1) t=4 (o 4 =1) t=5 (o 5 =2) q t =s A.03.0084 (<-B).00202 (<-A).00048 (<-A).00008 (<-A) q t =s B.14.0112 (<-B).00090 (<-B).00072 (<-B).00006 (<-B) State sequence: B, A, A, A, A 19 20 4

First, assume we know the states Learning HMM parameters: π i First, assume we know the states Learning HMM parameters: A i, x 1 : A,B,A,A,B x 2 : B,B,B,A,A x 3 : A,A,B,A,B Compute MLE for each parameter π = argmax π k π(q 1 ) p q t q t 1 π A = #D(q 1 = s A ) #D x 1 : A,B,A,A,B x 2 : B,B,B,A,A x 3 : A,A,B,A,B Compute MLE for each parameter A = argmax A k π(q 1 ) A i, = #D(q t=s i,q t 1 =s ) #D(q t 1 =s ) p q t q t 1 21 22 First, assume we know the states Learning HMM parameters: φ i, Challenges in HMM learning x 1 : A,B,A,A,B o 1 : 2,5,3,3,6 x 2 : B,B,B,A,A o 2 : 4,5,1,3,2 x 3 : A,A,B,A,B o 3 : 1,4,5,2,6 Compute MLE for each parameter φ = argmax φ k π(q 1 ) p q t q t 1 #D(o t = i, q t = s ) φ i, = #D(q t = s ) 23 Learning parameters (π, A, φ) with known states is not too hard BU usually states are unknown If we had the parameters and the observations, we could figure out the states: Viterbi P(Q* O)=argmax Q P(Q O) 24 5

Expectation-Maximization, or EM Problem: Uncertain of y i (class), uncertain of θ i (parameters) Solution: Guess y i, deduce θ i, re-compute y i, re-compute θ i etc. OR: Guess θ i, deduce y i, re-compute θ i, re-compute y i Will converge to a solution E step: Fill in expected values for missing labels y M step: Regular MLE for θ given known and filled-in variables Also useful when there are holes in your data Computing states q t Instead of picking one state: q t =s i, find P(q t =s i o) P q t = s i o 1,, o = α t(i)β t (i) α t()β t () Forward probability: α t i = P(o 1 o t q t = s i ) Backward probability: β t i = P(o t+1 o q t = s i ) 25 26 Details of forward probability Details of backward probability Forward probability: α t i = P(o 1 o t q t = s i ) Backward probability: β t i = P(o t+1 o q t = s i ) α 1 i = φ o1,iπ i = P o 1 q 1 = s i P(q 1 = s i ) α t i = φ ot,i A i, α t 1 α t i = P o t q t = s i P q t = s i q t 1 = s α t 1 β t i = β t i = A,i φ ot+1,β t+1 P q t+1 = s q t = s i P o t+1 q t+1 = s β t+1 Final β: β 1 (i) β 1 (i) = A,i φ o, 28 = P q = s q = s i P(o q = s ) 29 6

E-step: State probabilities One state: wo states in a row: P q t = s i o 1,, o = α t(i)β t (i) α t()β t () = S t(i) P q t = s, q t+1 = s i o 1,, o = = S t (i, ) α t ()A i, φ ot+1,iβ t+1 (i) f g α t (g)a f,g φ ot+1,fβ t+1 (f) Recall: when states known π A = #D(q 1=s A ) #D A i, = #D(q t=s i,q t 1 =s ) #D(q t 1 =s ) φ i, = #D(o t=i) #D(q t =s ) 30 31 M-step A i, = φ obs,i = π i = S 1 (i) t S t i, t S t() t o t=obs S t(i) t S t(i) Known states: π A = #D(q 1=s A ) #D A i, = #D(q t=s i,q t 1 =s ) #D(q t 1 =s ) φ i, = #D(o t=i AND q t =s ) #D(q t =s ) Review of HMMs in action For classification, find highest probability class given features Features for one sound: [q 1, o 1, q 2, o 2,, q, o ] Conclude word: Generates states: Q O 32 sound1 sound2 sound3 34 7