Hidden Markov Models

Similar documents
Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

11.3 Decoding Algorithm

HIDDEN MARKOV MODELS

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x K

CSCE 471/871 Lecture 3: Markov Chains and

Stephen Scott.

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. x 1 x 2 x 3 x K

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models for biological sequence analysis I

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Stephen Scott.

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models. Hosein Mohimani GHC7717

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

HMM: Parameter Estimation

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

EECS730: Introduction to Bioinformatics

Introduction to Machine Learning CMU-10701

Pairwise alignment using HMMs

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Hidden Markov Models (I)

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Today s Lecture: HMMs

HMMs and biological sequence analysis

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 5: December 13, 2001

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models 1

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

O 3 O 4 O 5. q 3. q 4. Transition

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Bioinformatics: Biology X

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Multiple Sequence Alignment using Profile HMM

Basic math for biology

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Statistical Sequence Recognition and Training: An Introduction to HMMs

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Pair Hidden Markov Models

Statistical NLP: Hidden Markov Models. Updated 12/15

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Data Mining in Bioinformatics HMM

order is number of previous outputs

6 Markov Chains and Hidden Markov Models

Hidden Markov Models

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Hidden Markov Models. Ron Shamir, CG 08

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Markov chains and Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Dynamic Approaches: The Hidden Markov Model

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Parametric Models Part III: Hidden Markov Models

L23: hidden Markov models

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Computational Genomics and Molecular Biology, Fall

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Sequences and Information

CS711008Z Algorithm Design and Analysis

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

p(d θ ) l(θ ) 1.2 x x x

Chapter 4: Hidden Markov Models

Hidden Markov Models (HMMs) November 14, 2017

Advanced Data Science

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models Part 2: Algorithms

Statistical Machine Learning from Data

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

Bio nformatics. Lecture 3. Saad Mneimneh

Genome 373: Hidden Markov Models II. Doug Fowler

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Design and Implementation of Speech Recognition Systems

Statistical Methods for NLP

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models. x 1 x 2 x 3 x N

Hidden Markov Models. Terminology, Representation and Basic Problems

Linear Dynamical Systems (Kalman filter)

Hidden Markov Models (HMMs) and Profiles

MACHINE LEARNING 2 UGM,HMMS Lecture 7

STA 4273H: Statistical Machine Learning

Transcription:

Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How to calculate this probability for an HMM as well? We must add the probabilities for all possible paths = π P( x) P( x,π ) Bioinformática - 55 1

Forward Algorithm Define f k,i (forward probability) ) as the probability of emitting the prefix x 1 x i and eventually reaching the state π i = k. k f k,i = P(x 1 x i,π i = k) The recurrence for the forward algorithm: f l,i+1 l,i+1 = e l (x i+1 ). Σ f k,i. a kl k Є Q Bioinformática - 55 Forward algorithm Initialization: f begin,0 = 1 f k,0 = 0 for k begin. For each i =0,...,L calculate: f l,i = e l (x i ) Σ k f k,i-1 a kl Termination P(x) ) = Σ k f k,l. a k,end Bioinformática - 55 2

Forward-Backward Problem Given: a sequence of coin tosses generated by an HMM. Goal: find the probability that the dealer was using a biased coin at a particular time. Backward Algorithm However, forward probability is not the only factor affecting P(π i = k x). The sequence of transitions and emissions that the HMM undergoes between π i+1 and π n also affect P(π i = k x). forward x i backward 3

Backward Algorithm (cont d) Define backward probability b k,i as the probability of being in state π i = k and emitting the suffix x i+1 x n. The recurrence for the backward algorithm: b k,i (x i+1,i = Σ e l(x i+1 ). b l,i+1. a kl l Є Q Backward-Forward Algorithm The probability that the dealer used a biased coin at any moment i: P(x, π i = k) k f k (i). b k (i) P(π i = k x) ) = P(x) P(x) = P(x) is the sum of P(x, π i = k) over all k 4

HMM Parameter Estimation So far, we have assumed that the transition and emission probabilities are known. However, in most HMM applications, the probabilities are not known. It s s very hard to estimate the probabilities. HMM Parameter Estimation Problem Given HMM with states and alphabet (emission characters) Independent training sequences x 1, x m Find HMM parameters Θ (that is, a kl, e k (b)) that maximize P(x 1,, x m Θ) the joint probability of the training sequences. 5

Maximize the likelihood P(x 1,, x m Θ) as a function of Θ is called the likelihood of the model. The training sequences are assumed independent, therefore P(x 1,, x m Θ) = Π i P(x i Θ) The parameter estimation problem seeks Θ that realizes max P ( x i Θ) Θ i In practice the log likelihood is computed to avoid underflow errors Two situations Known paths for training sequences CpG islands marked on training sequences One evening the casino dealer allows us to see when he changes dice Unknown paths CpG islands are not marked Do not see when the casino dealer changes dice 6

Known paths A kl = # of times each k l is taken in the training sequences E k (b) = # of times b is emitted from state k in the training sequences Compute a kl and e k (b) as maximum likelihood estimators: a = A / A kl k kl e ( b) = E l' k kl' ( b)/ b' E ( b') k Pseudocounts Some state k may not appear in any of the training sequences. This means A kl = 0 for every state l and a kl cannot be computed with the given equation. To avoid this overfitting use predetermined pseudocounts r kl and r k (b). A kl = # of transitions k l + r kl E k (b) = # of emissions of b from k + r k (b) The pseudocounts reflect our prior biases about the probability values. 7

Unknown paths: Viterbi training Idea: use Viterbi decoding to compute the most probable path for training sequence x Start with some guess for initial parameters and compute π* the most probable path for x using initial parameters. Iterate until no change in π* : 1. Determine A kl and E k (b) as before 2. Compute new parameters a kl and e k (b) using the same formulas as before 3. Compute new π* for x and the current parameters Viterbi training analysis The algorithm converges precisely There are finitely many possible paths. New parameters are uniquely determined by the current π*. There may be several paths for x with the same probability, hence must compare the new π* with all previous paths having highest probability. Does not maximize the likelihood Π x P(x Θ) but the contribution to the likelihood of the most probable path Π x P(x Θ, π*) In general performs less well than Baum-Welch 8

Unknown paths: Baum-Welch Idea: 1. Guess initial values for parameters. art and experience, not science 2. Estimate new (better) values for parameters. how? 3. Repeat until stopping criteria is met. what criteria? Better values for parameters Would need the A kl and E k (b) values but cannot count (the path is unknown) and do not want to use a most probable path. For all states k,l, symbol b and training sequence x Compute A kl and E k (b) as expected values, given the current parameters 9

Notation For any sequence of characters x emitted along some unknown pathπ, denote by π i = k the assumption that the state at position i (in which x i is emitted) is k. Probabilistic setting for A k,l Given x 1,,x m consider a discrete probability space with elementary events ε k,l, = k l is taken in x 1,, x m For each x in {x 1,,x m } and each position i in x let Y x,i be a random variable defined by Y x, i ( ε k, l 1, if π i = k and π i + 1 = l ) = 0, otherwise Define Y = Σ x Σ i Y x,i random var that counts # of times the event ε k,l happens in x 1,,x m. 10

The meaning of A kl Let A kl be the expectation of Y E(Y) = Σ x Σ i E(Y x,i ) = Σ x Σ i P(Y x,i = 1) = Σ x Σ i P({ε k,l π i = k and π i+1 = l}) = Σ x Σ i P(π i = k, π i+1 = l x) Need to compute P(π i = k, π i+1 = l x) Probabilistic setting for E k (b) Given x 1,,x m consider a discrete probability space with elementary events ε k,b = b is emitted in state k in x 1,,x m For each x in {x 1,,x m } and each position i in x let Y x,i be a random variable defined by Y x, i( εk, b 1, if xi = b and πi = k ) = 0, otherwise Define Y = Σ x Σ i Y x,i random var that counts # of times the event ε k,b happens in x 1,,x m. 11

The meaning of E k (b) Let E k (b) be the expectation of Y E(Y) = Σ x Σ i E(Y x,i ) = Σ x Σ i P(Y x,i = 1) = Σ x Σ i P({ε k,b x i = b and π i = k}) x { i x = b} i Need to compute P(π i = k x) P({ ε π k, b xi = b, π i = k}) = P( i = k x) x { i x = b} i Computing new parameters Consider x = x 1 x n training sequence Concentrate on positions i and i+1 Use the forward-backward values: f ki = P(x 1 x i, π i = k) b ki = P(x i+1 x n π i = k) 12

Compute A kl (1) Prob k l is taken at position i of x P(π i = k, π i+1 = l x 1 x n ) = P(x, π i = k, π i+1 = l) / P(x) Compute P(x) using either forward or backward values We ll show that P(x, π i = k, π i+1 = l) = b li+1 e l (x i+1 ) a kl f ki Expected # times k l is used in training sequences A kl = Σ x Σ i (b li+1 e l (x i+1 ) a kl f ki ) / P(x) Compute A kl (2) P(x, π i = k, π i+1 = l) = P(x 1 x i, π i = k, π i+1 = l, x i+1 x n ) = P(π i+1 = l, x i+1 x n x 1 x i, π i = k) P(x 1 x i,π i =k)= P(π i+1 = l, x i+1 x n π i = k) f ki = P(x i+1 x n π i = k, π i+1 = l) P(π i+1 = l π i = k) f ki = P(x i+1 x n π i+1 = l) a kl f ki = P(x i+2 x n x i+1, π i+1 = l) P(x i+1 π i+1 = l) a kl f ki = P(x i+2 x n π i+1 = l) e l (x i+1 ) a kl f ki = b li+1 e l (x i+1 ) a kl f ki 13

Compute E k (b) Prob x i of x is emitted in state k P(π i = k x 1 x n ) = P(π i = k, x 1 x n )/P(x) P(π i = k, x 1 x n ) = P(x 1 x i,π i = k,x i+1 x n ) = P(x i+1 x n x 1 x i,π i = k) P(x 1 x i,π i = k) = P(x i+1 x n π i = k) f ki = b ki f ki Expected # times b is emitted in state k E k ( b) = ( fki bki ) x i: x = b i P( x) a e kl = ( b) Finally, new parameters A kl = / E l' A kl' ( b)/ E k k k b' Can add pseudocounts as before. ( b' ) 14

Stopping criteria Cannot actually reach maximum (optimization of continuous functions) Therefore need stopping criteria Compute the log likelihood of the model for current Θ Compare with previous log likelihood Stop if small difference Stop after a certain number of iterations x log P ( x Θ ) The Baum-Welch algorithm Initialization: Pick the best-guess for model parameters (or arbitrary) Iteration: 1. Forward for each x 2. Backward for each x 3. Calculate A kl, E k (b) 4. Calculate new a kl, e k (b) 5. Calculate new log-likelihood Until log-likelihood does not change much 15

Baum-Welch analysis Log-likelihood is increased by iterations Baum-Welch is a particular case of the EM (expectation maximization) algorithm Convergence to local maximum. Choice of initial parameters determines local maximum to which the algorithm converges Finding Distant Members of a Protein Family A distant cousin of functionally related sequences in a protein family may have weak pairwise similarities with each member of the family and thus fail significance test. However, they may have weak similarities with many members of the family. The goal is to align a sequence to all members of the family at once. Family of related proteins can be represented by their multiple alignment and the corresponding profile. 16

Profile Representation of Protein Families Aligned DNA sequences can be represented by a 4 n profile matrix reflecting the frequencies of nucleotides in every aligned position. Protein family can be represented by a 20 n profile representing frequencies of amino acids. Profiles and HMMs HMMs can also be used for aligning a sequence against a profile representing protein family. A 20 n profile P corresponds to n sequentially linked match states M 1,,M n in the profile HMM of P. 17

Multiple Alignments and Protein Family Classification Multiple alignment of a protein family shows variations in conservation along the length of a protein Example: after aligning many globin proteins, the biologists recognized that the helices region in globins are more conserved than others. What are Profile HMMs? A Profile HMM is a probabilistic representation of a multiple alignment. A given multiple alignment (of a protein family) is used to build a profile HMM. This model then may be used to find and score less obvious potential matches of new protein sequences. 18

Profile HMMs D D D D I I I I I Start M 1 M 2 M 3 M 4 End I Match: Insert: a Delete: conserved an insertion a deletion, position with a general with silent state specialized (background) without emission any emission probability probability L M W K E ILMWKE ILWK Profile HMM A profile HMM 19

Building a profile HMM Multiple alignment is used to construct the HMM model. Assign each column to a Match state in HMM. Add Insertion and Deletion state. Estimate the emission probabilities according to amino acid counts in column. Different positions in the protein will have different emission probabilities. Estimate the transition probabilities between Match, Deletion and Insertion states The HMM model gets trained to derive the optimal parameters. States of Profile HMM Match states Insertion states Insertion states I 0 I 1 I n Deletion states Deletion states D 1 D n M 1 M n (plus begin/end states) 20

Transition Probabilities in Profile HMM log(a MI )+log(a IM ) = gap initiation penalty log(a II ) = gap extension penalty Emission Probabilities in Profile HMM Probabilty of emitting a symbol a at an insertion state I j : e Ij (a) = p(a) where p(a) is the frequency of the occurrence of the symbol a in all the sequences. 21

Profile HMM Alignment Define v M j (i) as the logarithmic likelihood score of the best path for matching x 1..x i to profile HMM ending with x i emitted by the state M j. v I j (i) and v D j (i) are defined similarly. Profile HMM Alignment: Dynamic Programming v M j(i) = log (e( M j (x i)/p( v M j-1(i-1) + log(a M j-1, 1,Mj ) )) + max v I j-1(i-1) + log(a I j-1,m j ) v D j-1(i-1) + log(a D j-1,m j ) )/p(x i )) + max v I j(i) = log (e( I j (x i)/p( v M j(i-1) + log(a M j, I j ) )) + max v I j(i-1) + log(a I j, I j ) v D j(i-1) + log(a D j, I j ) )/p(x i )) + max 22

Paths in Edit Graph and Profile HMM A path through an edit graph and the corresponding path through a profile HMM 23