Hidden Markov Models. x 1 x 2 x 3 x K

Similar documents
Hidden Markov Models. x 1 x 2 x 3 x K

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models

CSCE 471/871 Lecture 3: Markov Chains and

An Introduction to Bioinformatics Algorithms Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Pair Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. Three classic HMM problems

Stephen Scott.

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models

Lecture 9. Intro to Hidden Markov Models (finish up)

HIDDEN MARKOV MODELS

An Introduction to Bioinformatics Algorithms Hidden Markov Models

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Introduction to Machine Learning CMU-10701

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

11.3 Decoding Algorithm

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Multiple Sequence Alignment using Profile HMM

Hidden Markov Models for biological sequence analysis

Pairwise alignment using HMMs

Hidden Markov Models

Stephen Scott.

Hidden Markov Models. Hosein Mohimani GHC7717

HMM: Parameter Estimation

Hidden Markov Models (I)

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

6 Markov Chains and Hidden Markov Models

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Markov Chains and Hidden Markov Models. = stochastic, generative models

HMMs and biological sequence analysis

Hidden Markov Models

Info 2950, Lecture 25

Hidden Markov Models for biological sequence analysis I

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models Part 2: Algorithms

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

EECS730: Introduction to Bioinformatics

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Statistical NLP: Hidden Markov Models. Updated 12/15

Bioinformatics: Biology X

Sequences and Information

CS 7180: Behavioral Modeling and Decision- making in AI

CS711008Z Algorithm Design and Analysis

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Advanced Data Science

Hidden Markov Models. x 1 x 2 x 3 x N

Computational Genomics and Molecular Biology, Fall

MACHINE LEARNING 2 UGM,HMMS Lecture 7

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Basic math for biology

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Hidden Markov Models. Ron Shamir, CG 08

Alignment Algorithms. Alignment Algorithms

Lecture 12: Algorithms for HMMs

Today s Lecture: HMMs

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Lecture 12: Algorithms for HMMs

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

order is number of previous outputs

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017

Markov chains and Hidden Markov Models

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Hidden Markov Models 1

Lecture 7 Sequence analysis. Hidden Markov Models

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

HMM : Viterbi algorithm - a toy example

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

O 3 O 4 O 5. q 3. q 4. Transition

Lecture 5: December 13, 2001

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Evolutionary Models. Evolutionary Models

Bioinformatics: Biology X

STA 414/2104: Machine Learning

2 : Directed GMs: Bayesian Networks

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models,99,100! Markov, here I come!

Design and Implementation of Speech Recognition Systems

Hidden Markov Models

Math 350: An exploration of HMMs through doodles.

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Transcription:

Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K

Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0) = 0, for all k > 0 Initialization: b k (N) = 1, for all k Iteration: Iteration: Iteration: V l (i) = e l (x i ) max k V k (i-1) a kl f l (i) = e l (x i ) Σ k f k (i-1) a kl b l (i) = Σ k e l (x i +1) a kl b k (i+1) Termination: Termination: Termination: P(x, π*) = max k V k (N) P(x) = Σ k f k (N) P(x) = Σ k a 0k e k (x 1 ) b k (1)

Learning Re-estimate the parameters of the model based on training data

Two learning scenarios 1. Estimation when the right answer is known Examples: GIVEN: a genomic region x = x 1 x 1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls 2. Estimation when the right answer is unknown Examples: GIVEN: the porcupine genome; we don t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don t see when he changes dice QUESTION: Update the parameters θ of the model to maximize P(x θ)

1. When the states are known Given x = x 1 x N for which the true π = π 1 π N is known, Define: A kl E k (b) = # times k l transition occurs in π = # times state k in π emits b in x We can show that the maximum likelihood parameters θ (maximize P(x θ)) are: A kl E k (b) a kl = e k (b) = Σ i A ki Σ c E k (c)

1. When the states are known Intuition: When we know the underlying states, Best estimate is the normalized frequency of transitions & emissions that occur in the training data Drawback: Given little data, there may be overfitting: P(x θ) is maximized, but θ is unreasonable 0 probabilities BAD Example: Given 10 casino rolls, we observe x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3 π = F, F, F, F, F, F, F, F, F, F Then: a FF = 1; a FL = 0 e F (1) = e F (3) =.2; e F (2) =.3; e F (4) = 0; e F (5) = e F (6) =.1

Pseudocounts Solution for small training sets: Add pseudocounts A kl = # times k l transition occurs in π + r kl E k (b) = # times state k in π emits b in x + r k (b) r kl, r k (b) are pseudocounts representing our prior belief Larger pseudocounts Strong priof belief Small pseudocounts (ε < 1): just to avoid 0 probabilities

2. When the states are hidden We don t know the true A kl, E k (b) Idea: We estimate our best guess on what A kl, E k (b) are Or, we start with random / uniform values We update the parameters of the model, based on our guess We repeat

2. When the states are hidden Starting with our best guess of a model M, parameters θ: Given x = x 1 x N for which the true π = π 1 π N is unknown, We can get to a provably more likely parameter set θ i.e., θ that increases the probability P(x θ) Principle: EXPECTATION MAXIMIZATION 1. Estimate A kl, E k (b) in the training data 2. Update θ according to A kl, E k (b) 3. Repeat 1 & 2, until convergence

Estimating new parameters To estimate A kl : (assume θ CURRENT, in all formulas below) At each position i of sequence x, find probability transition k l is used: P(π i = k, π i+1 = l x) = [1/P(x)] P(π i = k, π i+1 = l, x 1 x N ) = Q/P(x) where Q = P(x 1 x i, π i = k, π i+1 = l, x i+1 x N ) = = P(π i+1 = l, x i+1 x N π i = k) P(x 1 x i, π i = k) = = P(π i+1 = l, x i+1 x i+2 x N π i = k) f k (i) = = P(x i+2 x N π i+1 = l) P(x i+1 π i+1 = l) P(π i+1 = l π i = k) f k (i) = = b l (i+1) e l (x i+1 ) a kl f k (i) f k (i) a kl e l (x i+1 ) b l (i+1) So: P(π i = k, π i+1 = l x, θ) = P(x θ CURRENT )

Estimating new parameters So, A kl is the E[# times transition k l, given current θ] f k (i) a kl e l (x i+1 ) b l (i+1) A kl = Σ i P(π i = k, π i+1 = l x, θ) = Σ i P(x θ) Similarly, f k (i) x 1 x i-1 k x i a kl l e l (x i+1 ) x i+1 b l (i+1) x i+2 x N E k (b) = [1/P(x θ)]σ {i xi = b} f k (i) b k (i)

The Baum-Welch Algorithm Initialization: Pick the best-guess for model parameters (or arbitrary) Iteration: 1. Forward 2. Backward 3. Calculate A kl, E k (b), given θ CURRENT 4. Calculate new model parameters θ NEW : a kl, e k (b) 5. Calculate new log-likelihood P(x θ NEW ) GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATION Until P(x θ) does not change much

The Baum-Welch Algorithm Time Complexity: # iterations O(K 2 N) Guaranteed to increase the log likelihood P(x θ) Not guaranteed to find globally best parameters Converges to local optimum, depending on initial conditions Too many parameters / too large model: Overtraining

Alternative: Viterbi Training Initialization: Same Iteration: 1. Perform Viterbi, to find π * 2. Calculate A kl, E k (b) according to π * + pseudocounts 3. Calculate the new parameters a kl, e k (b) Until convergence Notes: Not guaranteed to increase P(x θ) Guaranteed to increase P(x θ, π * ) In general, worse performance than Baum-Welch

Pair-HMMs and CRFs Slide Credits: Chuong B. Do

Quick recap of HMMs Formally, an HMM = (Σ, Q, A, a 0, e). alphabet: Σ = {b 1,, b M } set of states: Q = {1,, K} transition probabilities: A = [a ij ] initial state probabilities: a 0i emission probabilities: e i (b k ) Example: 0.95 0.05 0.95 FAIR LOADED 0.05

Pair-HMMs Consider the HMM = ((Σ 1 U {η}) x (Σ 2 U {η}), Q, A, a 0, e). Instead of emitting a pair of letters, in some states we may emit a letter paired with η (the empty string) For simplicity, assume η is never emitted for both observation sequences simultaneously Call the two observation sequences x and y

Application: sequence alignment ε Consider the following pair-hmm: 1 ε I P(x i, η) δ 1 2δ M P(x i, y j ) op(onal δ 1 ε c Σ, P(η, c) = P(c, η) = Q(c) J P(η, y j ) QUESTION: What are the interpretations of P(c,d) and Q(c) for c,d Σ? QUESTION: What does this model have to do with alignments? ε QUESTION: What is the average length of a gapped region in alignments generated by this model? Average length of matched regions?

Recap: Viterbi for single-sequence HMMs 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Algorithm: V k (i) = max π1 πi-1 P(x 1 x i-1, π 1 π i-1, x i, π i = k) Compute using dynamic programming!

(Broken) Viterbi for pair-hmms In the single sequence case, we defined V k (i) = max π1 πi-1 P(x 1 x i-1, π 1 π i-1, x i, π i = k) = e k (x i ) max j a jk V j (i - 1) In the pairwise case, (x 1, y 1 ) (x i -1, y i-1 ) no longer correspond to the first i 1 letters of x and y

(Fixed) Viterbi for pair-hmms Consider this special case: 1 ε 1 2δ M P(x i, y j ) 1 ε V M (i, j) = P(x i, y j ) max (1-2δ) V M (i - 1, j - 1) (1 - ε) V I (i - 1, j - 1) (1 - ε) V J (i - 1, j - 1) ε δ δ ε V I (i, j) = Q(x i ) max δ V M (i - 1, j) ε V I (i - 1, j) I P(x i, η) op(onal c Σ, P(η, c) = P(c, η) = Q(c) J P(η, y j ) V J (i, j) =Q(y j ) max δ V M (i, j - 1) ε V J (i, j - 1) Similar for forward/backward algorithms (see Durbin et al for details) QUESTION: What s the computational complexity of DP?

Connection to NW with affine gaps V M (i, j) = P(x i, y j ) max V I (i, j) = Q(x i ) max (1-2δ) V M (i - 1, j - 1) (1 - ε) V I (i - 1, j - 1) (1 - ε) V J (i - 1, j - 1) δ V M (i - 1, j) ε V I (i - 1, j) V J (i, j) =Q(y j ) max δ V M (i, j - 1) ε V J (i, j - 1) QUESTION: How would the optimal alignment change if we divided the probability for every single alignment by i=1 x Q(x i ) j = 1 y Q(y j )?

Connection to NW with affine gaps V M (i, j) = P(x i, y j ) max Q(x i ) Q(y j ) V I (i, j) = max V J (i, j) = max (1-2δ) V M (i - 1, j - 1) (1 - ε) V I (i - 1, j - 1) (1 - ε) V J (i - 1, j - 1) δ V M (i - 1, j) ε V I (i - 1, j) δ V M (i, j - 1) ε V J (i, j - 1) Account for the extra terms along the way.

Connection to NW with affine gaps log V M (i, j) = log P(x i, y j ) + max Q(x i ) Q(y j ) log (1-2δ) + log V M (i - 1, j - 1) log (1 - ε) + log V I (i - 1, j - 1) log (1 - ε) + log V J (i - 1, j - 1) log V I (i, j) = max log δ + log V M (i - 1, j) log ε + log V I (i - 1, j) log V J (i, j) = max log δ + log V M (i, j - 1) log ε + log V J (i, j - 1) Take logs, and ignore a couple terms.

Connection to NW with affine gaps M(i, j) = S(x i, y j ) + max I(i, j) = max M(i - 1, j - 1) I(i - 1, j - 1) J(i - 1, j - 1) d + M(i - 1, j) e + I(i - 1, j) J(i, j) = max d + M(i, j - 1) e + J(i, j - 1) Rename!

A simple example Let s work out an example, assume seq. identity 88% Calculate match & mismatch scores P(A, A) + + P(T, T) = 0.88, therefore P(A, A) = 0.22 P(A, C) + + P(G, T) = 0.12, therefore P(x, y, x!= y) = 0.01 Match score log(0.22 / 0.25 2 ) = 1.25846 Mismatch score log(.01 /.25 2 ) = - 1.83258 When is a score of an ungapped aligned region = 0? Assume a fraction p of matches 1.25846p 1.83258(1 p) = 0 Therefore, p = 1.83258/(1.25846 + 1.83258) = 0.5929