Hidden Markov Models

Similar documents
Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Lecture 11: Hidden Markov Models

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

Advanced Data Science

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Statistical Sequence Recognition and Training: An Introduction to HMMs

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

STA 414/2104: Machine Learning

Multiscale Systems Engineering Research Group

L23: hidden Markov models

Machine Learning for natural language processing

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Hidden Markov Models

Graphical Models Seminar

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Hidden Markov Models Part 2: Algorithms

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Modelling

HMM part 1. Dr Philip Jackson

Basic math for biology

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models Hamid R. Rabiee

Parametric Models Part III: Hidden Markov Models

Statistical Methods for NLP

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Note Set 5: Hidden Markov Models

STA 4273H: Statistical Machine Learning

Statistical Methods for NLP

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

p(d θ ) l(θ ) 1.2 x x x

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

EECS E6870: Lecture 4: Hidden Markov Models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Dynamic Approaches: The Hidden Markov Model

HMM: Parameter Estimation

Statistical Machine Learning from Data

Linear Dynamical Systems (Kalman filter)

Hidden Markov Models. Three classic HMM problems

CS 7180: Behavioral Modeling and Decision- making in AI

Hidden Markov Models and Gaussian Mixture Models

order is number of previous outputs

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Statistical NLP: Hidden Markov Models. Updated 12/15

Pattern Recognition. Parameter Estimation of Probability Density Functions

Hidden Markov Models

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Sequential Supervised Learning

CS 7180: Behavioral Modeling and Decision- making in AI

Hidden Markov Models Part 1: Introduction

Hidden Markov Models NIKOLAY YAKOVETS

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Hidden Markov Models

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Hidden Markov Models. Terminology and Basic Algorithms

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Hidden Markov models 1

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

Linear Dynamical Systems

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Conditional Random Field

Hidden Markov Models

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Hidden Markov Models. x 1 x 2 x 3 x K

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

ASR using Hidden Markov Model : A tutorial

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

Naïve Bayes classification

CS 343: Artificial Intelligence

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

O 3 O 4 O 5. q 3. q 4. Transition

Weighted Finite-State Transducers in Computational Biology

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Hidden Markov Models

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Hidden Markov Models. Dr. Naomi Harte

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Data Mining in Bioinformatics HMM

CS 5522: Artificial Intelligence II

A gentle introduction to Hidden Markov Models

CS 5522: Artificial Intelligence II

Transcription:

Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26

Content Motivation/Goals Modeling of sequential data Definition of (Hidden) Markov Models Parameters of HMMs Calculations / Illustrations / Algorithms Examples CI/CI(CS) SS 2015 June 23, 2015 Slide 2/26

Motivation List of samples, calculation of joint likelihood: N p(x Θ) = p(x n Θ) n=1 Requires independence of samples x n (i.i.d.)! 4 0.5 3 white noise 2 1 0 1 2 Speech signal 0 0.5 3 0 50 100 150 200 Strong requirement that might not be met! How to model dependencies? 1 0 50 100 150 200 CI/CI(CS) SS 2015 June 23, 2015 Slide 3/26

Example Weather (1) Observations: Q = {,,,, } (Weather on a specific day) Data modeled as i.i.d. Probability of rain on one day depends only on relative frequency of Rain in Q Correlations/Trends over multiple days are ignored CI/CI(CS) SS 2015 June 23, 2015 Slide 4/26

Example Weather (2) Observations: Q = {,, } Weather on day n as q n {,, } We are interested in the following conditional probability P(q n q n 1, q n 2,..., q 1 ) How probable is rain on day n, given the observations above: P(q 4 = q 3 =, q 2 =, q 1 = ) Possibility: Estimate based on prior observations of sequence {,,, } Problem: Complexity grows exponentially CI/CI(CS) SS 2015 June 23, 2015 Slide 5/26

Simplification: Markov-Assumption The following assumption simplifies the problem: P(q n q n 1, q n 2,..., q 1 ) = P(q n q n 1 ) Weather on day n only depends on weather on day n 1 Markov-Assumption of first-order Model using this assumption is called Markov-Model, an output sequence of it Markov-Chain Probability of a sequence {q n,..., q 2, q 1 } P(q n,..., q 1 ) = n P(q i q i 1 ) i=1 CI/CI(CS) SS 2015 June 23, 2015 Slide 6/26

Simplification: Markov-Assumption (2) Now, only 3 3 = 9 statistics necessary Arrange in table: Wetter today Weather tomorrow 0.8 0.05 0.15 0.2 0.6 0.2 0.2 0.3 0.5 For sure any weather tomorrow Table holds transition probabilities P(q i q i 1 ) Representation as automata Blackboard CI/CI(CS) SS 2015 June 23, 2015 Slide 7/26

Computation of Probabilities Reminder: Product rule: P(X, Y ) = P(Y X )P(X ) Tomorrow, given yesterday and today P(q 3 = q 2 =, q 1 = ) Tomorrow, day after tomorrow, given today P(q 3 =, q 2 = q 1 = ) Day after tomorrow, given today Blackboard P(q 3 = q 1 = ) CI/CI(CS) SS 2015 June 23, 2015 Slide 8/26

Hidden Markov Models (HMMs) States (weather) of system no longer directly observable Observations that depend on the states (emission prob.) E.g. prison guard carries umbrella with him or not Wetter P. for umbrella 0.1 0.8 0.3 CI/CI(CS) SS 2015 June 23, 2015 Slide 9/26

Hidden Markov Models (HMMs) States (weather) of system no longer directly observable Observations that depend on the states (emission prob.) E.g. prison guard carries umbrella with him or not Wetter P. for umbrella 0.1 0.8 0.3 hidden states q i {,, } and observations x i {, } CI/CI(CS) SS 2015 June 23, 2015 Slide 9/26

Hidden Markov Models (HMMs) States (weather) of system no longer directly observable Observations that depend on the states (emission prob.) E.g. prison guard carries umbrella with him or not Wetter q i P. for umbrella P(x i q i ) 0.1 0.8 0.3 hidden states q i {,, } and observations x i {, } We want P(q i x i ) Bayes P(q i x i ) = P(x i q i )P(q i ), P(x i ) CI/CI(CS) SS 2015 June 23, 2015 Slide 9/26

HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26

HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities Entries in B are the emission probabilities CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26

HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities Entries in B are the emission probabilities discrete observations x n {v 1,..., v K }, b i,k = P(x n = v k q n = s i ), are the P. for observing x n = v k in state s i CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26

HMMs Definition and Parameters An HMM consists of Set of Ns states S = {s 1,..., s Ns } and Parameters Θ = {π, A, B} π i = P(q 1 = s i ), a-priori P. of states (first state in sequence) ai,j = P(q n+1 = s j q n = s i ), Transition probabilities Entries in B are the emission probabilities discrete observations x n {v 1,..., v K }, b i,k = P(x n = v k q n = s i ), are the P. for observing x n = v k in state s i continuous Observations, e.g. x n R D, b i (x n) = p(x n q n = s i ), (PDFs of observations) CI/CI(CS) SS 2015 June 23, 2015 Slide 10/26

HMMs Definition and Parameters Transition Matrix A q n q n+1 0.8 0.05 0.15 0.2 0.6 0.2 0.2 0.3 0.5 a i,j = P(q n+1 = s j q n = s i ) 0.8 0.05 0.15 A = 0.2 0.6 0.2 0.2 0.3 0.5 CI/CI(CS) SS 2015 June 23, 2015 Slide 11/26

HMMs Definition and Parameters Emission Matrix B q i P(x i = q i ) 0.1 0.8 0.3 x n {v 1 =, v 2 = } b i,k = P(x n = v k q n = s i ) 0.1 0.9 B = 0.8 0.2 0.3 0.7 CI/CI(CS) SS 2015 June 23, 2015 Slide 12/26

HMMs Useful formulas Most is already known, here just in HMM notation Probability of state sequence Q = {q 1,..., q n } P(Q Θ) = π q1 N a qn,qn+1 n=1 Prob. of observation sequence given state sequence N P(X Q, Θ) = n=1 b qn,x n Joint Likelihood of X and Q (product rule) P(X, Q Θ) = P(X Q, Θ) P(Q Θ) Likelihood of observation sequence given an HMM P(X Θ) = P(X, Q Θ) allq CI/CI(CS) SS 2015 June 23, 2015 Slide 13/26

HMMs Trellis Diagram Representation of HMM and parameters with time evolution State 1 b 1,k a 1,1 b 1,k b 1,k...... b 1,k a 1,2 b 2,k b 2,k b 2,k b 2,k State 2 a 1,3...... State 3 b 3,k b 3,k b 3,k...... b 3,k Sequence: x 1 x 2 x i x N n = 1 n = 2 n = i n = N time CI/CI(CS) SS 2015 June 23, 2015 Slide 14/26

7 6 5 4 3 2 1 0 2500 2000 1500 F2 1000 200 2 1.5 1 0.5 0 2000 400 F1 1500 600 1000 500 800 0 0 500 1000 1500 2000 Graz University of Technology SPSC Laboratory HMMs with continuous observations Observations continuous s1 s2 B might hold parameters of Gaussians x 10 5 Example here: Speech recognition Observations: Feature vectors State could be word element, observation feature vector x 10 6 s1 s2 s3 x 10 5 1 0.8 0.6 0.4 0.2 0 1600 1400 1200 1000 800 600 400 F2 200 400 F1 600 800 CI/CI(CS) SS 2015 June 23, 2015 Slide 15/26

HMMs Examples Matlab-examples CI/CI(CS) SS 2015 June 23, 2015 Slide 16/26

HMMs 3 basic problems 1. Given an observation sequence X and model Θ, calculate P(X Θ), (P. that Θ generated sequence) e.g. for classification, forward-backward-algorithm 2. Given an observation sequence X and model Θ, estimate optimal state sequence Q Find hidden states, Viterbi-algorithm 3. Given an observation sequence X (training-sequence), estimate model Θ, such that P(X Θ) is maximized Learning problem, e.g. Baum-Welch-algorithm We cover 1) and 2) CI/CI(CS) SS 2015 June 23, 2015 Slide 17/26

Sequence classification with HMMs Given an observation sequence X and multiple HMMs Θ k X could be extracted features of speech, HMMs model word elements Problem: Which of the HMMs has generated sequence? Calculate likelihood for X for every model We already know the equation P(X Θ) = allq P(X, Q Θ) Sum over all state sequences that the model allows Matlab CI/CI(CS) SS 2015 June 23, 2015 Slide 18/26

Optimal State Sequence Given an observation sequence X and HMM Θ Problem: What is the optimal state sequence? Q that explains X best That is Q, which maximizes P(Q X, Θ) Sequence with maximum a-posteriori probability We can calculate L(q 1,..., q n x 1,..., x n ) ML-path For every path: Complexity grows exponentially with n Viterbi-algorithm (GSM, WLAN, satellite communication, speech recognition,... ) CI/CI(CS) SS 2015 June 23, 2015 Slide 19/26

Optimal state sequence Weather example e.g. at n = 3: Every state possible, one path (per state) has largest L until here (survivor-path) δ 3 ( ) = 0.0019 STATES δ 3 ( ) = 0.0269 δ 3 ( ) = 0.0052 Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 20/26

Viterbi Algorithm General Blackboard CI/CI(CS) SS 2015 June 23, 2015 Slide 21/26

Viterbi Algorithm Induction Assume δ n (i) holds max. likelihood of path ending in q n = s i at time n Likelihood in next time step is straightforward δ n+1 (j) = max {δ n (i) a i,j } b j,xn+1 i And the most probable predecessor is ψ n+1 (j) = arg max {δ n (i) a i,j } i If we also know δ 1 (i), we are (almost) done For initialization, we take the a-priori-p. of the states δ 1 (i) = π i b i,x1 ψ 1 (i) = 0 CI/CI(CS) SS 2015 June 23, 2015 Slide 22/26

Viterbi Algorithm Weather example (1) Most probable path to at n = 2: δ 2 ( ) = max(δ 1 ( ) a,, δ 1 ( ) a,, δ 1 ( ) a, ) b, δ 1 = 0.3 ψ 2 ( ) = STATES Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 23/26

Viterbi Algorithm Weather example (2) Three most probable paths at n = 3: δ 3 ( ) = 0.0019 STATES δ 3 ( ) = 0.0269 δ 3 ( ) = 0.0052 Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 24/26

Viterbi Algorithm Termination and Backtracking Out of the δ n (i) and ψ n (i) we want to recover the optimal state sequence Q = {q 1,..., q N } Last time step n = N: p (X Θ) = max 1 i N s δ N (i), q N maximum likelihood = arg max 1 i N s δ N (i), last state of ML path We have last state ψ n (i) prior ones can be traced back (Backtracking) q n 1 = ψ n (q n) CI/CI(CS) SS 2015 June 23, 2015 Slide 25/26

Viterbi Algorithm Weather example (3) Backtracking of optimal path δ 3 ( ) = 0.0019 STATES δ 3 ( ) = 0.0269 δ 3 ( ) = 0.0052 Sequence: x 1 = x 2 = x 3 = n = 1 n = 2 n = 3 time CI/CI(CS) SS 2015 June 23, 2015 Slide 26/26