Implementation of EM algorithm in HMM training. Jens Lagergren

Similar documents
CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Hidden Markov Models. Three classic HMM problems

Computational Genomics and Molecular Biology, Fall

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Note Set 5: Hidden Markov Models

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

HMM: Parameter Estimation

Parametric Models Part III: Hidden Markov Models

CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm

Hidden Markov Models

Hidden Markov Models: All the Glorious Gory Details

Stephen Scott.

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Multiscale Systems Engineering Research Group

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

1 What is a hidden Markov model?

11.3 Decoding Algorithm

Data Mining in Bioinformatics HMM

MACHINE LEARNING 2 UGM,HMMS Lecture 7

STA 414/2104: Machine Learning

Weighted Finite-State Transducers in Computational Biology

Dynamic Approaches: The Hidden Markov Model

O 3 O 4 O 5. q 3. q 4. Transition

Steven L. Scott. Presented by Ahmet Engin Ural

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Pair Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

STA 4273H: Statistical Machine Learning

Hidden Markov Models Part 2: Algorithms

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Basic math for biology

Automatic Speech Recognition (CS753)

Applications of Hidden Markov Models

Multiplex network inference

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

The Expectation-Maximization Algorithm

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5

Lecture 9. Intro to Hidden Markov Models (finish up)

L23: hidden Markov models

HMM : Viterbi algorithm - a toy example

Hidden Markov Models

Computer Intensive Methods in Mathematical Statistics

Hidden Markov Models

Markov chains and Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Markov Chains and Hidden Markov Models

Linear Dynamical Systems

Hidden Markov Models: Maxing and Summing

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Lecture 21: Spectral Learning for Graphical Models

Today s Lecture: HMMs

Hidden Markov Models and Gaussian Mixture Models

Statistical Methods for NLP

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

CSCE 471/871 Lecture 3: Markov Chains and

order is number of previous outputs

p(d θ ) l(θ ) 1.2 x x x

HMM : Viterbi algorithm - a toy example

Statistical Methods for NLP

Hidden Markov Models. Hosein Mohimani GHC7717

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Statistical Pattern Recognition

Hidden Markov Models (I)

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Hidden Markov Models for biological sequence analysis

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Introduction to Machine Learning CMU-10701

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

State Space and Hidden Markov Models

Conditional Random Field

Hidden Markov Models. Terminology, Representation and Basic Problems

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Graphical Models for Collaborative Filtering

Hidden Markov Models. Terminology and Basic Algorithms

Statistical NLP: Hidden Markov Models. Updated 12/15

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

9 Forward-backward algorithm, sum-product on factor graphs

Advanced Data Science

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Hidden Markov Models

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Unit 16: Hidden Markov Models

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

LECTURER: BURCU CAN Spring

Statistical Sequence Recognition and Training: An Introduction to HMMs

Transcription:

Implementation of EM algorithm in HMM training Jens Lagergren April 22, 2009

EM Algorithms: An expectation-maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on hidden variables. EM is a 2 step iterative method. In the first step(e-step) it computes an expectation of the log likelihood with respect to the current estimate of the distribution for the hidden variables. In second step (M-step), it computes the parameters which maximize the expected log likelihood. In each cycle, it uses the aproximations from previous step to aproximate the solution progressively[wikipedia.com]. In BioInformatics, EM is commonly used to generate HMM s from selected genetic sequences. Training Hidden Markov Models using EM HMM profiles are usually trained with a set of sequences that are known to belong to a single family. Once the HMM has been trained and optimized to identify these known members, it is then used to clasify unknown genes. Namely, this requires us to construct a HMM, M, that generates a collection of sequences F with the maximum probability: max Pr[x M] (1) HMMsM But this can be computationally hard. Therefore we break down the problem into following sub-problems Finding the right structure, i.e. determining the number of layers. (We will not consider this sub-problem at the moment) Finding optimal parameters for a given structure. To solve the latter problem we now let A π be the expected number of π transitions. A π be the expected number of visits to π. G π,σ be the expected number of times σ is generated from state π. θ be the set of parameters(hidden variables to be optimized) composed of: e π (σ) is the probability to generate the symbol σ in state π. a π is the transition probability from state π to. We also continue with the notation X i := x 1,..., x i and Π i = π 1,..., π i, from previous notes. 1

Expectation Maximization We will use the EM algorithm to optimize the parameters to create the best fitting HMM from a given set of sequences F. From the initial HMM M and the initial parameter set θ we can generate a new optimized parameter set: θ, by iterating this procedure the aproximation will eventually converge with the local maximum. We calculate this new set θ as follows E[A π x, θ] a π = e π(σ) = E[A π X, θ] E[G π,σ X, θ] E[A π X, θ] We can compute each nominator and denominator separately: 1. E[A π X, θ] (denominator) 2. E[G π,σ X, θ] (nominator for e π(σ)) 3. E[A π X, θ] (nominator for a π) 1. Calculating the denominator E[A π X, θ]: Notice that: E[A π X, θ] = E[A π X, θ] Therefore E[A π X, θ] is easily computed once E[A π, X, θ] is resolved in point 3. 2. Calculating the nominator:e[g π,σ X, θ] E[G π,σ X, θ] = 3. Calculating E[A π X, θ] i,x i=σ = i,x i =σ = i,x i =σ P r[π i = π X, θ] P r[π i = π, X θ] P r[x θ] P r[π i 1 = π, π i = π, X θ] P r[x θ] E[A π X, θ] = Pr[π i = π, π i+1 = X, θ] i i = Pr[π i = π, π i+1 =, X θ] Pr[X θ] 2

so it is enough to be able to compute Pr[π i = π, π i+1 =, X θ] Pr[π i = π, π i+1 =, X θ] = Pr[π i = π, π i+1 =, X X i, π i = π, θ]pr[x i, π i = π θ] = Pr[x i+1,..., x n, π i+1 = π i = π, θ] Pr[X i, π i = π θ] }{{}}{{} The Markov property! f π (i) = Pr[x i+1,..., x n π i+1 =, π i = π, θ] Pr[π i+1 = π i = π, θ] f π (i) }{{} a π = Pr[x i+2,..., x n π i+1 =, θ] Pr[x i+1 π i+1 = ] a πf π (i) }{{}}{{} b π (i+1) e (x i+1 ) = b π (i + 1)e (x i+1 )a πf π (i)... where 1. b π (i + 1) is the backward variable defined as: b π (i) = Pr[x i+1,..., x n π i = π, θ] which can be computed using dynamic programming in a similar way as f π (i) was computed in lecture 5. 2. f π (i) is the forward variable defined in lecture 5. 3. e (x i+1 ) and a π were already calculated in the last cycle. Stopping the iteration When to stop? Notice that for θ given by a π and e π(σ) either θ are the locally optimal parameters, in which case you may stop the algorithm. or Pr[x θ ] > [x θ], which means a solution has not been reached yet. Time analysis of EM alorithm in HMM Training Problem A full time analysis of the complete EM algorithm would need to analyze the number of iterations required for convergence which may vary in different cases, nevertheless one may estimate the time required to calculate some of its components, Ie: Claim 1. max π1,...,π n Pr[x 1,..., x n, π 0,..., π n ] can be computed in time O( Q 2 l). 3

Proof: Let v π (i) = max Pr[x 1,..., x i, π 0,..., π i 1, π i = π] and Then v π (0) = { 1 when π = qstart 0 otherwise. v π (i) π 1,...,π i 1 Pr[x 1,..., x i, π 0,..., π i 1, π i = π] That is, max Pr[X i, Π i 2, π i 1 =, π i = π] max Pr[X i, Π i 2, π i 1 =, π i = π X i 1, Π i 2, π 0, π i 1 = ] Pr[X i 1, Π i 2, π 0, π i 1 = ] max Pr[x i, π i = π π i 1 = π i 1 ] }{{} The Markov property! Pr[X i 1, Π i 2, π 0, π i 1 = ] max Pr[x i, π i = π]pr[π i = π π i 1 = π i 1 ] Pr[X i 1, Π i 2, π 0, π i 1 = ] = Pr[x i, π i = π] max }{{} Pr[π i = π π i 1 = π i 1 ] max Pr[X i 1, Π i 2, π 0, π i 1 = ] }{{} π 1,...,π i 1 e π (x i ) a π }{{} v (i 1) = e π (x i ) max a πv (i 1) v π (i) v (i 1)a πe π (x i ) Since there are Q n subproblems V π (i) and each can be computed in time O( Q ) this gives a recursion that can be computed in time O( Q 2 n). 4