Hidden Markov Models. Three classic HMM problems

Similar documents
HIDDEN MARKOV MODELS

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

11.3 Decoding Algorithm

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models 1

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models for biological sequence analysis

HMM: Parameter Estimation

Hidden Markov Models for biological sequence analysis I

CSCE 471/871 Lecture 3: Markov Chains and

Stephen Scott.

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models (I)

Basic math for biology

Markov Chains and Hidden Markov Models. = stochastic, generative models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Bioinformatics: Biology X

CS711008Z Algorithm Design and Analysis

Data Mining in Bioinformatics HMM

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

EECS730: Introduction to Bioinformatics

Lecture 5: December 13, 2001

Today s Lecture: HMMs

Markov chains and Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

6 Markov Chains and Hidden Markov Models

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. Ron Shamir, CG 08

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Pairwise alignment using HMMs

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Lecture 7 Sequence analysis. Hidden Markov Models

Chapter 4: Hidden Markov Models

L23: hidden Markov models

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Hidden Markov Models. Terminology, Representation and Basic Problems

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Dynamic Approaches: The Hidden Markov Model

Stephen Scott.

Hidden Markov Models

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models Part 2: Algorithms

Genome 373: Hidden Markov Models II. Doug Fowler

HMM : Viterbi algorithm - a toy example

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

Computational Genomics and Molecular Biology, Fall

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Statistical NLP: Hidden Markov Models. Updated 12/15

Hidden Markov Models. Terminology and Basic Algorithms

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

HMM : Viterbi algorithm - a toy example

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

Statistical Sequence Recognition and Training: An Introduction to HMMs

Hidden Markov Models. Terminology and Basic Algorithms

Advanced Data Science

HMMs and biological sequence analysis

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Hidden Markov Models (HMMs) November 14, 2017

1 What is a hidden Markov model?

STA 414/2104: Machine Learning

Multiple Sequence Alignment using Profile HMM

Hidden Markov Models Hamid R. Rabiee

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Implementation of EM algorithm in HMM training. Jens Lagergren

Multiscale Systems Engineering Research Group

Hidden Markov Models NIKOLAY YAKOVETS

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Pair Hidden Markov Models

Parametric Models Part III: Hidden Markov Models

STA 4273H: Statistical Machine Learning

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Lecture 3: Markov chains.

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Transcription:

An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems Evaluation Given a model and an output sequence, what is the probability that the model generated that output? Decoding Given a model and an output sequence, what is the most likely state sequence (path) through the model that produced that sequence? Learning Given a model and a set of observed sequences, what should the model parameters be so that it has a high probability of generating those sequences? 1

Evaluation Problem HMM parameters Σ: set of emission characters. Ex.: Σ = {H, T} or {1,0} for coin tossing Σ = {A,C,T,G} for DNA Q: set of hidden states, each emitting symbols from Σ. Q={F,B} for coin tossing HMM Parameters (cont) A = (a kl ): a Q x Q matrix of probability of changing from state k to state l. a FF = 0.9 a FB = 0.1 a BF = 0.1 a BB = 0.9 E = (e k (b)): a Q x Σ matrix of probability of emitting symbol b while being in state k. e F (0) = ½ e B (0) = ¼ e F (1) = ½ e B (1) = ¾ 2

P(x,π) Calculation P(x,π): Probability that sequence x was generated by the path π: n P(x,π) = P(π 0 π 1 ). Π P(x i π i ) P(π i π i+1 ) i=1 = a π0, π 1 Π e π i (x i ) a π i, π i+1 π 0 and π n+1 represent the fictitious initial and terminal state (begin and end) Decoding Problem Goal: Find an optimal hidden path of states given observations. Input: Sequence of observations x = x 1 x n generated by an HMM M(Σ, Q, A, E) Output: A path that maximizes P(x π) over all possible paths π. 3

Edit Graph for Decoding Problem Building Manhattan for Decoding Problem Andrew Viterbi used the Manhattan grid model to solve the Decoding Problem. Every choice of π = π 1 π n corresponds to a path in the graph. The only valid direction in the graph is eastward. This graph has Q 2 (n-1) edges. 4

Decoding Problem vs. Alignment Problem Valid directions in the alignment problem. Valid directions in the decoding problem. Decoding Problem as Finding a Longest Path in a DAG The Decoding Problem is reduced to finding a longest path in the directed acyclic graph (DAG) above. Notes: the length of the path is defined as the product of its edges weights, not the sum. 5

Decoding Problem (cont d) Every path in the graph has the probability P(x π). The Viterbi algorithm finds the path that maximizes P(x π) among all possible paths. The Viterbi algorithm runs in O(n Q 2 ) time. Decoding Problem: weights of edges w (k, i) (l, i+1) The weight w is given by:??? 6

Decoding Problem: weights of edges n P(x π) = Π e π i+1 (x i+1 ). a π i, π i+1 i=0 w (k, i) (l, i+1) The weight w is given by:?? Decoding Problem: weights of edges i-th term = e π i+1 (x i+1 ). a π i, π i+1 w (k, i) (l, i+1) The weight w is given by:? 7

Decoding Problem: weights of edges i-th term = e l (x i+1 ). a kl for π i =k, π i+1 =l w (k, i) (l, i+1) The weight w=e l (x i+1 ). a kl Decoding Problem and Dynamic Programming S k,i is the probability of the most probable path for the prefix (x 1,...,x i ) s l,i+1 = max k Є Q {s k,i weight of edge between (k,i) and (l,i+1)}= = max k Є Q {s k,i a kl e l (x i+1 ) }= e l (x i+1 ) max k Є Q {s k,i a kl } 8

Decoding Problem (cont d) Initialization: s begin,0 = 1 s k,0 = 0 for k begin. For each i =0,...,L-1 and for each l Q recursively calculate: s l,i+1 = e l (x i+1 ) max k Є Q {s k,i a kl } Let π * be the optimal path. Then, P(x π * ) = max k Є Q {s k,n. a k,end } Viterbi Algorithm The value of the product can become extremely small, which leads to overflowing. To avoid overflowing, use log value instead. s k,i+1 = loge l (x i+1 ) + max k Є Q {s k,i + log(a kl )} Initialization: s begin,0 = 0 s k,0 = - for k begin The score for the best path Score (X, π * ) = max k Є Q {s k,n + log(a kend )} 9

Viterbi Algorithm Complexity: We calculate the values of O( Q.L) cells of the matrix V Spend O( Q ) operations per cell Overall time complexity: O(L. Q 2 ) Space complexity: O(L. Q ) Example Consider a HMM model that includes two hidden states (S1 and S2). Initial probabilities for both 1 and 2 are equal to 0.5, while transition probabilities are : a S1S1 = 0.6; a S2S2 = 0.5; a S1S2 =0.4; a S2S1 = 0.5. Nucleotides T, C, G, A are emitted with probabilities e A = e T = 0.3 and e C = e G = 0.2 from state S1; and e A = e T = 0.2 and e C = e G = 0.3 from state S2. 10

Problem 1 Evaluation Given the HMM described in the example and the output sequence x = GGCACTGAA what is the probability that the HMM generated x using the following path (sequence of hidden states)? π = S1 S1 S2 S2 S2 S2 S1 S1 S1 Problem 2 - Decoding Use the Viterbi algorithm to compute the most likely path generating sequence x = GGCACTGAA using the HMM in the example. 11

Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How to calculate this probability for an HMM as well? We must add the probabilities for all possible paths P ( x) = P( x π,π ) Forward Algorithm Define f k,i (forward probability) as the probability of emitting the prefix x 1 x i and eventually reaching the state π i = k. f k,i = P(x 1 x i, π i = k) The recurrence for the forward algorithm: f l,i+1 = e l (x i+1 ). Σ f k,i. a kl k Є Q 12

Forward algorithm Initialization: f begin,0 = 1 f k,0 = 0 for k begin. For each i =0,...,L calculate: f l,i = e l (x i ) Σ k f k,i-1 a kl Termination P(x) = Σ k f k,l. a k,end Example Consider a HMM model that includes two hidden states (S1 and S2). Initial probabilities for both 1 and 2 are equal to 0.5, while transition probabilities are : a S1S1 = 0.6; a S2S2 = 0.5; a S1S2 =0.4; a S2S1 = 0.5. Nucleotides T, C, G, A are emitted with probabilities e A = e T = 0.3 and e C = e G = 0.2 from state S1; and e A = e T = 0.2 and e C = e G = 0.3 from state S2. 13

Problem 3 Compute P(x) Use the Forward algorithm to compute the the probability of a sequence x, P(x), using the HMM in the example. x = GGC Forward-Backward Problem Given: a sequence of coin tosses generated by an HMM. Goal: find the probability that the dealer was using a biased coin at a particular time. 14

Backward Algorithm However, forward probability is not the only factor affecting P(π i = k x). The sequence of transitions and emissions that the HMM undergoes between π i+1 and π n also affect P(π i = k x). forward x i backward Backward Algorithm (cont d) Define backward probability b k,i as the probability of being in state π i = k and emitting the suffix x i+1 x n. The recurrence for the backward algorithm: b k,i = Σ e l (x i+1 ). b l,i+1. a kl l Є Q 15

Backward-Forward Algorithm The probability that the dealer used a biased coin at any moment i: P(x, π i = k) f k (i). b k (i) P(π i = k x) = = P(x) P(x) P(x) is the sum of P(x, π i = k) over all k Example Consider a HMM model with two hidden states, S1 and S2. Consider also that the starting probabilities are 0.4 for S1 and 0.6 for S2, and the transition (T) and emission (E) probability matrices are the following: Compute the most likely path generating the sequence X = TAC. What is the probability of generating the sequence X = TACG and being at state S2 when generating the symbol C? 16

HMM Parameter Estimation So far, we have assumed that the transition and emission probabilities are known. However, in most HMM applications, the probabilities are not known. It s very hard to estimate the probabilities. HMM Parameter Estimation Problem q Given Ø HMM with states and alphabet (emission characters) Ø Independent training sequences x 1, x m q Find HMM parameters Θ (that is, a kl, e k (b)) that maximize P(x 1,, x m Θ) the joint probability of the training sequences. 17

Maximize the likelihood P(x 1,, x m Θ) as a function of Θ is called the likelihood of the model. The training sequences are assumed independent, therefore P(x 1,, x m Θ) = Π i P(x i Θ) The parameter estimation problem seeks Θ that realizes max P ( x i Θ) Θ i In practice the log likelihood is computed to avoid underflow errors Two situations Known paths for training sequences CpG islands marked on training sequences One evening the casino dealer allows us to see when he changes dice Unknown paths CpG islands are not marked Do not see when the casino dealer changes dice 18

Known paths A kl = # of times each k l is taken in the training sequences E k (b) = # of times b is emitted from state k in the training sequences Compute a kl and e k (b) as maximum likelihood estimators: a = A / A kl k kl e ( b) = E k l' kl ' ( b) / b' E ( b' ) k Pseudocounts q Some state k may not appear in any of the training sequences. This means A kl = 0 for every state l and a kl cannot be computed with the given equation. q To avoid this overfitting use predetermined pseudocounts r kl and r k (b). A kl = # of transitions k l + r kl E k (b) = # of emissions of b from k + r k (b) The pseudocounts reflect our prior biases about the probability values. 19

Unknown paths: Viterbi training Idea: use Viterbi decoding to compute the most probable path for training sequence x Start with some guess for initial parameters and compute π* the most probable path for x using initial parameters. Iterate until no change in π* : 1. Determine A kl and E k (b) as before 2. Compute new parameters a kl and e k (b) using the same formulas as before 3. Compute new π* for x and the current parameters Viterbi training analysis q The algorithm converges precisely There are finitely many possible paths. New parameters are uniquely determined by the current π*. There may be several paths for x with the same probability, hence must compare the new π* with all previous paths having highest probability. q Does not maximize the likelihood Π x P(x Θ) but the contribution to the likelihood of the most probable path Π x P(x Θ, π*) q In general performs less well than Baum-Welch 20

Unknown paths: Baum-Welch Idea: 1. Guess initial values for parameters. art and experience, not science 2. Estimate new (better) values for parameters. how? 3. Repeat until stopping criteria is met. what criteria? Better values for parameters Would need the A kl and E k (b) values but cannot count (the path is unknown) and do not want to use a most probable path. For all states k,l, symbol b and training sequence x Compute A kl and E k (b) as expected values, given the current parameters 21