The main algorithms used in the seqhmm package

Similar documents
Hidden Markov Models Hamid R. Rabiee

Hidden Markov Modelling

Multiscale Systems Engineering Research Group

L23: hidden Markov models

Basic math for biology

Hidden Markov Models. Implementing the forward-, backward- and Viterbi-algorithms

Hidden Markov Models and Gaussian Mixture Models

Brief Introduction of Machine Learning Techniques for Content Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis

Hidden Markov Models

Hidden Markov Models (HMMs)

Hidden Markov Models

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Hidden Markov Models. Terminology and Basic Algorithms

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Conditional Random Field

ASR using Hidden Markov Model : A tutorial

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models and Gaussian Mixture Models

Graphical Models Seminar

Use of hidden Markov models for QTL mapping

Hidden Markov Models

1. Markov models. 1.1 Markov-chain

Weighted Finite-State Transducers in Computational Biology

Hidden Markov Models. Dr. Naomi Harte

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Numerically Stable Hidden Markov Model Implementation

Hidden Markov Model and Speech Recognition

Introduction to Hidden Markov Modeling (HMM) Daniel S. Terry Scott Blanchard and Harel Weinstein labs

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

On Optimal Coding of Hidden Markov Sources

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Note Set 5: Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Hidden Markov Models NIKOLAY YAKOVETS

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Statistical NLP: Hidden Markov Models. Updated 12/15

Lecture 11: Hidden Markov Models

Dynamic Approaches: The Hidden Markov Model

Training Hidden Markov Models with Multiple Observations A Combinatorial Method

Linear Dynamical Systems

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

STA 4273H: Statistical Machine Learning

Probabilistic Models for Sequence Labeling

Hidden Markov Models

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

STA 414/2104: Machine Learning

Hidden Markov models

O 3 O 4 O 5. q 3. q 4. Transition

Dept. of Linguistics, Indiana University Fall 2009

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Today s Lecture: HMMs

A Revealing Introduction to Hidden Markov Models

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Introduction to Machine Learning CMU-10701

Lecture 3: Machine learning, classification, and generative models

Hidden Markov Model Documentation

Statistical Sequence Recognition and Training: An Introduction to HMMs

Hidden Markov Models

Data-Intensive Computing with MapReduce

1 Hidden Markov Models

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Samy Bengioy, Yoshua Bengioz. y INRS-Telecommunications, 16, Place du Commerce, Ile-des-Soeurs, Qc, H3E 1H6, CANADA

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Outline DP paradigm Discrete optimisation Viterbi algorithm DP: 0 1 Knapsack. Dynamic Programming. Georgy Gimel farb

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Statistical Processing of Natural Language

CS532, Winter 2010 Hidden Markov Models

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Notation and Problems of Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

CS 7180: Behavioral Modeling and Decision- making in AI

Sparse Forward-Backward for Fast Training of Conditional Random Fields

Conditional Random Fields

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Learning from Sequential and Time-Series Data

Parametric Models Part III: Hidden Markov Models

Statistical Methods for NLP

COMP90051 Statistical Machine Learning

Outline of Today s Lecture

Hidden Markov Models. Terminology and Basic Algorithms

Sequential Supervised Learning

Statistical Machine Learning from Data

Graphical models for part of speech tagging

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

Log-Linear Models, MEMMs, and CRFs

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Hidden Markov models 1

Transcription:

The main algorithms used in the seqhmm package Jouni Helske University of Jyväskylä, Finland May 9, 2018 1 Introduction This vignette contains the descriptions of the main algorithms used in the seqhmm (Helske and Helske, 2017) package. First, a forward-backward algorithm is presented, followed by a Viterbi algorithm, and the derivations of the gradients for the numerical optimisation routines. 2 Forward Backward Algorithm Following Rabiner (1989), the forward variable α it (s) P (y i1,..., y it, z t s M) is the joint probability of partial observation sequences for subject i until time t and the hidden state s at time t given the model M. Let us denote b s (y it ) b s (y it1 ) b s (y itc ), the joint emission probability of observations at time t in channels 1,..., C given hidden state s. The forward variable can be solved recursively for subject i 1,..., N: α i1 (s) π s b s (y i1 ) 2. Recursion: For [ t 1,..., T 1, and r 1,..., S, compute S α i(t+1) (r) α it(s)a sr b r (y i(t+1) ) 3. Termination: Compute the likelihood P (Y i M) S α it (s) The backward variable β it (s) P (y i(t+1),..., y it z t s, M) is the joint probability of the partial observation sequence after time t and hidden state s at time t given the model M. For subject i 1,..., N, the backward variable can be computed as 1

1. Initialization: For s 1,..., S, set β it (s) 1 2. Recursion: [ For t T 1,..., 1, and s 1,..., S, compute S β it (s) r1 a sr b s (y i(t+1) )β i(t+1) (r) In practice the forward-backward algorithm is prone to numerical instabilities. Typically we scale the forward and backward probabilities, as follows (Rabiner, 1989). For subject i 1,..., N, α i1 (s) π s b s (y i1 ), c t 1/ α i1 (s), ˆα i1 (s) c t α i1 (s) (1) 2. Recursion: For t 1,..., T 1, and r 1,..., S, compute [ S α i(t+1) (r) α it (s)a sr b r (y i(t+1) ) c t 1/ α i(t+1) (s), ˆα i(t+1) (s) c t α i(t+1) (s) (2) 3. Termination: Compute the log-likelihood logp (Y i M) T t1 c t The scaling factors c t from the forward algorithm are commonly used to scale also the backward variables, although other scaling schemes are possible as well. In seqhmm, the scaled backward variables for subject i 1,..., N are computed as ˆβ it (s) 1 (3) 2. Recursion: For t T 1,..., 1, and r 1,..., S, compute [ S β it (s) a sr b s (y i(t+1) )β i(t+1) (r), r1 ˆβ it (s) c t β it (s) (4) 2

Most of the times this scaling method described works well, but in some illconditioned cases it is possible that the default scaling still produces underflow in backward algorithm. For these cases, seqhmm also supports the computation of the forward and backward variables in log-space. Although numerically more stable, the algorithm is somewhat slower due repeated use of log-sum-exp trick. 3 Viterbi Algorithm We define the score δ it (s) max P (z i1 z it s, y i1 y it M), z i1z i2 z i(t 1) which is the highest probability of the hidden state sequence up to time t ending in state s. By induction we have [ δ i(t+1) (r) max δ it (s)a sr b r (y i(t+1) ). (5) s We collect the arguments maximizing Equation 5 in an array ψ it (r) to keep track of the best hidden state sequence. The full Viterbi algorithm can be stated as follows: 1. Initialization δ i1 (s) π s b s (y i1 ), s 1,..., S ψ i1 (s) 0 2. Recursion δ it (r) max,...,s (δ i(t 1) (s)a sr )b h (y it ), ψ it (s) arg max,...,s (δ i(t 1) (s)a sr ), s 1,..., S; t 2,..., T 3. Termination ˆP max,...,s (δ it (s)) ẑ it arg max,...,s (δ it (s)) 4. Sequence backtracking ẑ it ψ i(t+1) (ŝ i(t+1) ), t T 1,..., 1. To avoid numerical underflow due to multiplying many small probabilities, the Viterbi algorithm can be straighforwardly computed in log space, i.e., calculating log(δ it (s)). 4 Gradients Following Levinson, Rabiner, and Sondhi (1983), by using the scaled forward and backward variables we have π s b s (y i1 ) ˆβ i1 (s), 3

and T 1 ˆα it (s)b r (y i(t+1) ) a ˆβ i(t+1) (r), sr t1 b rc (m) t:y itcm α 1t (s)a sr ˆβi(t+1) (r) + I(y i1c m)π r ˆβi1 (r). In the direct numerical optimization algorithms used by seqhmm, the model is parameterised using unconstrained parameters π s, a sr, b rc(m) such that a sr exp(a sr)/ S k1 exp(a sk), and similarly for emission and initial probabilities. This leads to and π s a sr b rc(m) log P (Y i M) π s π s (1 π s ) log P (Y i M) a sr a sr (1 a sr ), log P (Y i M) b rc (m)(1 b rc (m)). b rc (m) 4.1 MHMM case For mixture HMM with K clusters, we define a full model with S S 1 + + S K states in a block form with π i (w i1 π 1,..., w ik π K ), where π k, k 1,..., K is the vector of initial probabilities for the submodel M k and w ik exp(x i γ k)/(1 + K j2 exp(x i γ j), with γ 1 0. First note that the log-likelihood of the HMM for ith subject can be written as P (Y i M) α t (s)a sr b r (y i(t+1) )β t+1 (r), r1 for any t 1,..., T 1. Thus for t 1 we have P (Y i M) α 1 (s)a sr b r (y i2 )β 2 (r) r1 α 1 (s) a sr b r (y i2 )β 2 (r) r1 α 1 (s)β 1 (s) π is b s (y i1 )β 1 (s). (6) 4

Therefore the gradients for the unconstrained parameters πs k cluster are given as π k s For γ k, the gradients are of form log P (Y i M k ) πs k πs k (1 πs k )w ik. Now if state s belongs to cluster k, we have of the kth b s (y i1 ) ˆβ 1 (s) π is. (7) π is π k s exp(x i γ k) γ K k j1 exp(x i γ j)) π k s x i w ik (1 w ik ), (8) and π is π h s x i w ih w ik, otherwise, where h is the index of cluster containing the state s. References Satu Helske and Jouni Helske. Mixture hidden Markov models for sequence data: the seqhmm package in R. Preprint ArXiv:1704.00543, 2017. URL http://arxiv.org/abs/1704.00543. S. E. Levinson, L. R. Rabiner, and M. M. Sondhi. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell System Technical Journal, 62(4):1035 1074, 1983. ISSN 1538-7305. doi: 10.1002/j.1538-7305.1983.tb03114.x. URL http://dx.doi.org/10.1002/j.1538-7305.1983.tb03114.x. Lawrence Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257 286, 1989. doi: 10.1109/5.18626. 5