Computational Genomics and Molecular Biology, Fall

Similar documents
Computational Genomics and Molecular Biology, Fall

Today s Lecture: HMMs

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

p(d θ ) l(θ ) 1.2 x x x

O 3 O 4 O 5. q 3. q 4. Transition

STA 414/2104: Machine Learning

Statistical Methods for NLP

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Lecture 7 Sequence analysis. Hidden Markov Models

Data-Intensive Computing with MapReduce

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CSCE 471/871 Lecture 3: Markov Chains and

STA 4273H: Statistical Machine Learning

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Basic math for biology

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Lecture 9. Intro to Hidden Markov Models (finish up)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Introduction to Machine Learning CMU-10701

EECS730: Introduction to Bioinformatics

A gentle introduction to Hidden Markov Models

Hidden Markov Models (HMMs) November 14, 2017

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Lecture 11: Hidden Markov Models

COMP90051 Statistical Machine Learning

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models

Expectation Maximization (EM)

order is number of previous outputs

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models. Terminology, Representation and Basic Problems

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models

HMM part 1. Dr Philip Jackson

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Hidden Markov Models

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Lab 3: Practical Hidden Markov Models (HMM)

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Dynamic Approaches: The Hidden Markov Model

Hidden Markov Models. Three classic HMM problems

Advanced Data Science

8: Hidden Markov Models

Hidden Markov Models in Language Processing

Introduction to Artificial Intelligence (AI)

11.3 Decoding Algorithm

Hidden Markov Models. x 1 x 2 x 3 x N

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models (I)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Statistical NLP: Hidden Markov Models. Updated 12/15

Sequence Labeling: HMMs & Structured Perceptron

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Model and Speech Recognition

Hidden Markov Models

Hidden Markov Models

Machine Learning for natural language processing

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Parametric Models Part III: Hidden Markov Models

Log-Linear Models, MEMMs, and CRFs

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models for biological sequence analysis I

Chapter 4: Hidden Markov Models

Hidden Markov models

Temporal Modeling and Basic Speech Recognition

Statistical Machine Learning from Data

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

8: Hidden Markov Models

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models for biological sequence analysis

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Implementation of EM algorithm in HMM training. Jens Lagergren

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

HMM : Viterbi algorithm - a toy example

Hidden Markov Models (HMMs)

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Hidden Markov Models. Ron Shamir, CG 08

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

HMM : Viterbi algorithm - a toy example

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Hidden Markov models in population genetics and evolutionary biology

Sequence modelling. Marco Saerens (UCL) Slides references

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Transcription:

Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems related to local multiple sequence alignments: Discovery Modeling Recognition In the last few lectures, we have discussed the use of Hidden Markov Models (HMMs) to model motifs. In today s lecture, we discussed two algorithms that are used to solve the recognition problem with HMMs: the Viterbi algorithm and the forward algorithm. The problem of motif discovery is left for future lectures. HMMs are defined formally as follows: 1. N states S 1..S N 2. M symbols in alphabet 3. Transition probability matrix a ij 4. Emission probabilities e i (a) probability state i emits character a. 5. Initial distribution vector π = (π i ) We refer to the emission probabilities, the emission probabilities and the initial distribution, collectively as the parameters, designated λ = (a ij, e i (a), π). Following the notation used in Durbin, we will refer to the sequence of observed symbols as O = O 1, O 2, O 3,... and the sequence of states visited as Q = q 1, q 2, q 3,... (the state path.) When considering more than one sequence or state path, we will use superscripts to distinguish them: O i = O i 1, Oi 2, Oi 3,... and Qj = q j 1, qj 2, qj 3,... The parameters of an HMM can be learned from labeled data: a ij = A i j A ij e i (x) = E i(x) x E i(x ), where A ij is the number of transitions from state S i to S j in the training data and E i (x) is the number of instances of the symbol x that are labeled with state S i.

Computational Genomics and Molecular Biology, Fall 2011 2 In an HMM, each state emits symbols from a fixed alphabet each time a state is visited. Emission probabilities are state-dependent, but not time-dependent. a symbol may be emitted by more than one state. Similarly, a state can emit more than one symbol. Note that an HMM is a generative model. It gives the probability of generating a particular sequence (hence, the emission probabilities.) This allows us to ask: Given an observed sequence, O, and an HMM model, what is the probability that O was generated by this HMM? In an HMM, there may be more than one, and often very many, state paths associated with O. Therefore, the true sequence of states that generated the observed sequence is unknown, or hidden, hence the term, Hidden Markov model. The sequences are hidden because it is not possible to tell the state merely by the output symbol. This hidden sequence of states corresponds to what we want to know, namely the classification of each symbol. 1.1 HMM s and probabilities An HMM emits all sequences O i Σ with probability P (O i ) 0. Every time an HMM is executed it will emit a sequence; therefore,. P (O i ) = 1 i Since each sequence can, potentially, be from several state paths, P (O) = j P (O Qj ) P (Q j ). Fig. 1 below shows P (O, Q) for the set of all possible state paths, Q. From this, we obtain P (O i ) = 1 P (O i Q j ) P (Q j ) = 1 i j i P (O i, Q j ) = 1 i j

Computational Genomics and Molecular Biology, Fall 2011 3 Figure 1: The probability of sequence O and state path Q. The area under this curve is P (O). The maximum point on the curve is the most likely path, Q. 2 Recognition questions Given a sequence sequence, O, there are several questions we may wish to ask: 1. What is the true path? Otherwise stated, we wish to assign labels to an unlabeled sequence. Example: Identify the cytosolic, transmembrane, and extracellular regions in the sequence. In this case, we wish to assign the labels E, M, or C to the unlabeled data. 2. What is the probability that a given sequence O, was generated by the HMM? Example: Is the sequence a transmembrane protein? For the former question, we use the Viterbi algorithm to find the most probable path through the HMM, given O, which we take as an estimate of the true path. For the second question, we use the Forward algorithm to calculate P (O). 2.1 The Viterbi algorithm: In the transmembrane example, the amino acid sequence is known, but the subcellular location of each residue is hidden. In the HMM model, the subcellular location of each residue is represented by the (hidden) state that emitted the symbol associated with that residue. We will infer the subcellular locations of the residues by inferring the sequence of states visited by the HMM. The boundaries between the transmembrane, extracellular and cytosolic regions are defined by the transitions between C, M, and E states along this state path.

Computational Genomics and Molecular Biology, Fall 2011 4 We approach this problem by assuming that most likely path, Q (0) = argmax Q P (Q O), is a good estimation of the sequence of states that generated a given observed sequence O. This process is called decoding because we decode the sequence of symbols to determine the hidden sequence of states. HMMs were developed in the field of speech recognition, where recorded speech is decoded into words or phonemes to determine the meaning of the utterance. We could use brute force by calculating P (Q O) for all paths. However, this becomes intractable as soon as number of states gets larger, since the number of state sequences grows exponentially (N T ). Instead, we calculate argmax Q P (Q, O) using a dynamic programming algorithm called the Viterbi algorithm. Note, that this will still give us the most probable path because the path that maximizes P (Q, O) also maximizes P (Q O): P (Q, O) argmax P (Q O) = argmax Q Q P (O) = argmax P (Q, O). Q Let δ(t, i) be the probability of observing the first t residues and ending up in state S i via the most probable path. We calculate δ(t, i) as follows: Initialization: Recursion: δ(t, i) = δ(1, i) = π i e i (O 1 ) max δ(t 1, j) a ji e i (O t ) 1<=j<=N Final: The probability of the most probable path P (Q ) = max δ(t, i) 1<=i<=N The final state, q T is the state that maximizes δ(t, i). The state path can be reconstructed by tracing back through the dynamic programming matrix, similar to the traceback in pairwise sequence alignment. Running time: O(T N 2 ) There are T N entries in the dynamic programming matrix. terms. Each entry requires calculating N 2.2 The Forward Algorithm We may also wish to determine the probability of a particular sequence, O, being generated to compare likelihood of different sequences or compare the likelihood of one sequence under different

Computational Genomics and Molecular Biology, Fall 2011 5 models. For example, suppose we want to compute the probability of observing a particular transmembrane sequence, given our HMM. We can compute this by summing over all states in each path: P (O) = Q P (O, Q). As before, this becomes intractable as soon as number of states gets large, since the number of state sequences grows exponentially (N T ). The Forward algorithm is a dynamic programming algorithm allows us to calculate, P (O λ), the probability that the model generated the observed sequence efficiently. Note, we do not know the state sequence so we must consider all state sequences. We do this by interatively calculating the probability of being in state i after generating the sequence up to observation O t. We designate this quantity: α(t, i) = P (O 1, O 2, O 3,...O t, q t = S i ) Initialization: Iteration: α(1, i) = π i e i (O 1 ) N α(t + 1, i) = α(t, j) a ji e i (O t+1 ) j=1 The probability of observing the entire sequence is given by the sum over all possible final states: P (O) = N α(t, i) i=1 Running time: O(T N 2 ) Note that this algorithm is very similar to the Viterbi algorithm except that it uses a sum rather than taking the maximum. 3 An example In class, we used the following three-state HMM shown in Fig. 2 as an example. The parameters for this model are i E M C π i 0 0 1 e i (H) 0.2 0.9 0.3 e i (L) 0.8 0.1 0.7

Computational Genomics and Molecular Biology, Fall 2011 6 Figure 2: Three-state transmembrane HMM For some examples, we assumed that all transmembrane sequences start in the cytosol; i.e., π(c) = 1.0. Later, we applied the Viterbi and Forward algorithms to this example (see worksheets attached to the class website). For those examples, we assumed that π(c) = 0.5 and π(e) = 0.5.