Hidden Markov Models. Dr. Naomi Harte

Similar documents
10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Model and Speech Recognition

Hidden Markov Modelling

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

ASR using Hidden Markov Model : A tutorial

Hidden Markov Models and Gaussian Mixture Models

Parametric Models Part III: Hidden Markov Models

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Hidden Markov Models Hamid R. Rabiee

Lecture 11: Hidden Markov Models

Speech Recognition HMM

Hidden Markov Models

COMP90051 Statistical Machine Learning

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Hidden Markov Models

L23: hidden Markov models

STA 4273H: Statistical Machine Learning

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models

Multiscale Systems Engineering Research Group

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

STA 414/2104: Machine Learning

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Engineering Part IIB: Module 4F11 Speech and Language Processing Lectures 4/5 : Speech Recognition Basics

Dept. of Linguistics, Indiana University Fall 2009

Statistical NLP: Hidden Markov Models. Updated 12/15

An Evolutionary Programming Based Algorithm for HMM training

Introduction to Markov systems

Weighted Finite-State Transducers in Computational Biology

Automatic Speech Recognition (CS753)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Machine Learning CMU-10701

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

CS 7180: Behavioral Modeling and Decision- making in AI

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

The main algorithms used in the seqhmm package

Hidden Markov Models

1. Markov models. 1.1 Markov-chain

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Design and Implementation of Speech Recognition Systems

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Notation and Problems of Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Note Set 5: Hidden Markov Models

Statistical Methods for NLP

Data Mining in Bioinformatics HMM

Statistical Processing of Natural Language

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

A New OCR System Similar to ASR System

Graphical models for part of speech tagging

Hidden Markov models

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Hidden Markov Models

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Connected-Word Speech

Particle Swarm Optimization of Hidden Markov Models: a comparative study

CS838-1 Advanced NLP: Hidden Markov Models

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Data-Intensive Computing with MapReduce

FACTORIAL HMMS FOR ACOUSTIC MODELING. Beth Logan and Pedro Moreno

SYMBOL RECOGNITION IN HANDWRITTEN MATHEMATI- CAL FORMULAS

Bayesian Hidden Markov Models and Extensions

A Modified Baum Welch Algorithm for Hidden Markov Models with Multiple Observation Spaces

Outline of Today s Lecture

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Learning Dynamic Audio/Visual Mapping with Input-Output Hidden Markov Models

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Independent Component Analysis and Unsupervised Learning

Statistical Sequence Recognition and Training: An Introduction to HMMs

Temporal Modeling and Basic Speech Recognition

Hidden Markov Models (HMMs)

AUDIO-VISUAL RELIABILITY ESTIMATES USING STREAM

Hidden Markov Models Part 2: Algorithms

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

On Optimal Coding of Hidden Markov Sources

Learning from Sequential and Time-Series Data

MAP adaptation with SphinxTrain

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

Hidden Markov Models

Using Sub-Phonemic Units for HMM Based Phone Recognition

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Abstract

Computational Genomics and Molecular Biology, Fall

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Transcription:

Hidden Markov Models Dr. Naomi Harte

The Talk Hidden Markov Models What are they? Why are they useful? The maths part Probability calculations Training optimising parameters Viterbi unseen sequences Real Systems

Background Discrete Markov process System can be in any of N states S 1 S N State changes each time instant, t 1 t 2 t 3 etc Actual state at time t is q t For first order Markov process P(q t = S j q t-1 =S i, q t-2 =S k ) simplifies to P(q t =S j q t-1 =S i )

Background P(q t =S j q t-1 =S i ) Independent of time State transition probabilities a ij = P(q t =S j q t-1 =S i ) i,j are 1..N a ij >= 0 a ij = 1 j=1:n Observable Markov Model

Example (from Rabiner)

Hidden Markov Model State corresponded to observable event Restrictive Observation probabilistic function of state Hidden state, observable outputs [O 1, O 2, O 3,, O T ]

Jack Ferguson s Urn and Ball Model N Urns, M colours O = {Red, Green, Green, Pink, Orange, Blue, Orange, Yellow} Ball and Urn (Rabiner) URN 1 URN 2 URN N P(Red) = b 1 (1) P(Blue) = b 1 (2) P(Green) = b 1 (3) P(Red) = b 2 (1) P(Blue) = b 2 (2) P(Green) = b 2 (3) P(Red) = b N (1) P(Blue) = b N (2) P(Green) = b N (3) P(Pink) = b 1 (M) P(Pink) = b 2 (M) P(Pink) = b N (M)

Ball and Urn Simplest HMM State is urn Colour probability defined for each state (urn) State transition matrix governs urn choice

HMM elements N - number of states A state transition probability a ij = P(q t =S j q t-1 =S i ) B observation probability in state j b j = P(O t q t =S j ) Discrete O t is v k, k=1:m Continuous, gaussian mixture Initial state distribution π i =P(q 1 =S i ) Model λ =(A, B, π)

What are HMMs useful for? Modelling temporally evolving events with a reproducible pattern with some reasonable level of variation measurable features at intervals Well structured Left to right HMM More random? fully connected (ergodic) HMM Applications in BOTH Audio and Video

HMM applications Need labelled training data!! Usual reason to NOT use HMMs Speech & audio visual applications Research databases Labelled/transcribed

What might a HMM model? Sequence of events, features sampled at intervals In speech recognition: A word, a phoneme, a syllable In speech analysis for home monitoring Normal speech, emotionally distressed speech, slurred speech In music to transcribe scores A violin, a piano, a trumpet, a mixture of instruments In sports video to automatically extract highlights A tennis serve, tennis volley, tennis rally, passing shot etc. Snooker: pot black, pot colour, pot red, foul In cell biology video, flag specific events Nothing happening, fluorescence, cells growing, cells shrinking, cell death or division

Observations What is this observation sequence O? [O 1, O 2, O 3,, O T ] Pertinent features or measures taken at regular time intervals that compactly describe events of interest Spectral features, pitch, speaker rate in speech Colour, shape, motion in video

Example c 1 c 12 O 1 O 2 O 3 O T Take DCT of log spectrum on 20ms windows with 50% overlap

HMM problem 1 Given O = [O 1, O 2, O T ], and model λ, how to efficiently compute P(O λ). Evaluation Which model gives best score Forward-Backward procedure

HMM problem 2 Given O = [O 1, O 2, O T ], and model λ, how to choose a state sequence Q=[q 1, q 2,, q T ] that is optimal Uncover hidden part No correct sequence Viterbi Algorithm

HMM problem 3 How to adjust model parameters of model λ= (A, B, π) to maximise P(O λ). Training Adapt parameters to observed training data Use Baum Welch Iterative solution. Expectation maximisation

Notation Follow Rabiner tutorial

Back to Problem 1 Given O = [O 1, O 2, O T ], and model λ, how to efficiently compute P(O λ). Consider ALL possible state sequences Say one particular sequence Q = [q 1, q 2,,q T ] Probability of O given Q and λ? T ( Q, λ) P( O q, λ) P O = = t= 1 b q 1 t ( O ) b ( O ) Kb ( O ) 1 q 2 t 2 q T T

Observation probability ctd. Probability of state sequence? ( Q λ) = π q aq q a K q q a 1 1 2 2 3 q T qt P 1 JOINT probability of O and Q? ( O, Q λ ) P( O Q, λ) P( Q, λ) P = Probability of O for ALL possible Q? P ( O λ ) P( O Q, λ) P( Q λ) = allq

Observation probability ctd. Probability of state sequence? ( Q λ) = π q aq q a K q q a 1 1 2 2 3 q T qt P 1 JOINT probability of O and Q? ( O, Q λ ) P( O Q, λ) P( Q, λ) P = Probability of O for ALL possible Q? P ( O λ ) P( O Q, λ) P( Q λ) = allq Gets crazy as N and T increase!!

Forward-Backward Procedure Be smart! Only have N states So any state at t+1 can only be reached from N previous states at time t Reuses calculations

Forward variable (Rabiner)

Exercise Corresponding Backward variable? Partial observation sequence from t+1 to end, given in state i at time t and model λ Answer in Rabiner paper!

Observation Probability Observation probability in state j b j = P(O t q t =S j ) Discrete O t is v k, k=1:m Continuous, multivariate gaussian mixture density most common Are the features independent? 1st years how does this affect the pdf?

What if features not independent? Use full covariance HMMs Slow Need more training data Decorrelate the features PCA, LDA, DCT

Problem 2 Given O = [O 1, O 2, O T ], and model λ, how to choose a state sequence Q=[q 1, q 2,, q T ] that is optimal Well explained in Rabiner paper Single best state sequence Q Best score along path at time t accounting for first t observations and ending in state i () i = P[ q q Lq = i O, O, LO λ] δ t max q 1, q 2, Kq t 1 1 2 t, 1 2 t

Viterbi Trellis

Back to Problem 3 How to adjust model parameters of model λ= (A, B, π) to maximise P(O λ). Training of models Baum Welch An implementation of EM algorithm (tutorial from David) Start with good estimate Clustering with k-means

Training strategies Choice of number of states Controlling transitions Fully connected, or left-right HMM Gradually increasing number of mixtures per state

More Information HTK, Hidden Markov Model Toolkit from Cambridge University htk.eng.cam.ac.uk Rabiner paper Rabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, vol.77, no.2, pp.257-286, Feb 1989 Speech Recognition Books