CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Similar documents
Hidden Markov Models (I)

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. Three classic HMM problems

CSCE 471/871 Lecture 3: Markov Chains and

Stephen Scott.

Hidden Markov Models for biological sequence analysis I

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

CS711008Z Algorithm Design and Analysis

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

HIDDEN MARKOV MODELS

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

HMM: Parameter Estimation

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

6 Markov Chains and Hidden Markov Models

Basic math for biology

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models

Pairwise alignment using HMMs

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Machine Learning CMU-10701

STA 414/2104: Machine Learning

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models. Ron Shamir, CG 08

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

Lecture 7 Sequence analysis. Hidden Markov Models

Hidden Markov Models Part 2: Algorithms

STA 4273H: Statistical Machine Learning

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

11.3 Decoding Algorithm

Parametric Models Part III: Hidden Markov Models

HMMs and biological sequence analysis

Hidden Markov Models. x 1 x 2 x 3 x K

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Hidden Markov Models (Part 1)

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Hidden Markov Models. x 1 x 2 x 3 x N

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Markov chains and Hidden Markov Models

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Data Mining in Bioinformatics HMM

Machine Learning for natural language processing

Lecture 12: Algorithms for HMMs

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models

Hidden Markov Models 1

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Advanced Data Science

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Graphical Models Seminar

Lecture 12: Algorithms for HMMs

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Computational Genomics and Molecular Biology, Fall

order is number of previous outputs

Linear Dynamical Systems (Kalman filter)

Hidden Markov Models

Lecture 11: Hidden Markov Models

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models in Language Processing

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Hidden Markov Modelling

p(d θ ) l(θ ) 1.2 x x x

Hidden Markov Models

Sequence Labeling: HMMs & Structured Perceptron

Today s Lecture: HMMs

Hidden Markov Models and Gaussian Mixture Models

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models. Hosein Mohimani GHC7717

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. Terminology and Basic Algorithms

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models,99,100! Markov, here I come!

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Statistical Machine Learning from Data

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

HMM : Viterbi algorithm - a toy example

Multiple Sequence Alignment using Profile HMM

Statistical Sequence Recognition and Training: An Introduction to HMMs

Transcription:

CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models A Markov chain of states At each state, there are a set of possible observations E.g.,.95.9 : /6 2: /6 3: /6 4: /6 5: /6 6: /6 Fair.5. Three major problems Most probable state path The likelihood Parameter estimation for HMMs : / 2: / 3: / 4: / 5: / 6: /2 Loaded

A biological example: CpG islands Higher rate of Methyl-C mutating to T in CpG dinucleotides generally lower CpG presence in genome, except at some biologically important ranges, e.g., in promoters, -- called CpG islands. The conditional probabilities are collected from ~ 6, bps human genome sequences, + stands for CpG islands and for non CpG islands. A + A C G T A C G T.8.274.426.2 A.3.25.285.2 C.7.368.274.88 C.322.298.78.32 G.6.339.375.25 G.248.246.298.28 T.79.355.384.82 T.77.239.292.292 Task : given a sequence x, determine if it is a CpG island. Solution: compute the log-odds ratio scored by the two Markov chains: S(x) = log [ P(x model +) / P(x model -)] # of sequence Histogram of the length-normalized scores S(x) 2

Task 2: For a long genomic sequence x, label these CpG islands, if there are any. Approach : Adopt the method for Task by calculating the log-odds score for a window of, say, bps around every nucleotide and plotting it. Problems with this approach: Won t do well if CpG islands have sharp boundary and variable length No effective way to choose a good Window size. Approach 2: using hidden Markov model.9.95 A:.7 C:.368 G:.274 T:.88..5 A:.372 C:.98 G:.2 T:.338 + The model has two states, + for CpG island and - for non CpG island. Those numbers are made up here, and shall be fixed by learning from training examples. A reasonable assignment for emission frequencies may use the respective eigenvectors of the two Markov chains in approach. Use the same notations as in the text: a kl is the transition probability from state k to state l; e k (b) is the emission frequency probability that symbol b is seen when in state k. 3

.9.95 A:.7 C:.368 G:.274 T:.88..5 A:.372 C:.98 G:.2 T:.338 + Joint probability of an observed sequence x and a state path : i:23456789 x:tgcgcgtac :--++++--- P(x, ) = i= to L e i (x i ) a i i+ P(x, ) =.388.95.2.5.368.9.274.9.368.9.274..338.95.372.95.98. Then, the probability to observe sequence x in the model is P(x) = Σ P(x, ), which is also called the likelihood of the model. Q: Given a sequence x of length L, how many state paths do we have? A: N L, where N stands for the number of states in the model. As an exponential function of the input size, it precludes enumerating all possible state paths for computing P(x). Example: Let s assume the model has a set of two states S = {, }. For L=3, there are 2 3 = 8 paths: Φ = {,,,,,,, }. P(x) = Σ P(x, ) 4

Specifically, let f k (i) be the intermediate result computed up to the k-th term in the i-th bracket, [e (x ) + + e N (x ) ] a a [e (x i ) + e k (x i ) + +e N (x i )] a a [e (x L ) + +e N (x L )] f k (i) which is the probability contributed by all paths from the beginning up to (and include) position i with state at position i being k. And it can be written recursively as f k (i) = [Σ j f j (i-) a jk ] e k (x i ) Graphically, x x 2 x 3 x L N N N A silent state is introduced for better presentation N Forward algorithm Initialization: f () =, f k () = for k >. Recursion: f k (i) = e k (x i ) Σ j f j (i-) a jk. Termination: P(x) = Σ k f k (L) a k. Time complexity: length. O(N 2 L), where N is the number of states and L is the sequence Similarly, we can compute P(x) backwards. Backward algorithm Initialization: b k (L) = a k for all k. Recursion: b k (i) = Σ j a kl e l (x i+ ) b l (i+). Termination: P(x) = Σ k a k e k (x ) b k (). Therefore, f k (i)b k (i) gives P(x, i = k), the probability contributed from all paths that go through state k at position i. Note: b k (i) does not include emitting x i, this avoids double counting e k (x i ) in f k (i)b k (i). 5

Decoding: Given an observed sequence x, what is the most probable state path, i.e., * = argmax P(x, ) [e (x ) e N (x ) ] a a [e (x i ) e k (x i ) e N (x i )] a a [e (x L ) e N (x L )] v k (i) Viterbi Algorithm Initialization: v () =, v k () = for k >. Recursion: v k (i) = e k (x i ) max j (v j (i-) a jk ); ptr i (k) = argmax j (v j (i-) a jk ); Termination: P(x, ) = max k (v k (L) a k ); * L = argmax j (v j (L) a j ); Traceback: * i- = ptr i ( * i )..9.95 A:.7 C:.368 G:.274 T:.88..5 A:.372 C:.98 G:.2 T:.338 + v k (i) = e k (x i ) max j (v j (i-) a jk ); V k (i) T G C T A i +.94.23.76.3.2.69.8.33..38 k 6

Posterior decoding P( i = k x) = P(x, i = k) /P(x) = f k (i)b k (i) / P(x) Algorithm: for i = to L do argmax k P( i = k x) Notes:. Posterior decoding may be useful when there are multiple almost most probable paths, or when a function is defined on the states. 2. The state path identified by posterior decoding may not be most probable overall, or may not even be a viable path. 7

Model building - Topology - Requires domain knowledge - Parameters - When states are labeled - Simple counting: a kl = A kl / Σ l A kl and e k (b) = E k (b) / Σ b E k (b ) - When states are not labeled Method (Viterbi training). Use Viterbi algorithm to label 2. Do counting to collect new a kl and e k (b); 3. Repeat steps and 2 until stopping criterion is met. Method 2 (Baum-Welch algorithm) Baum-Welch algorithm (Expectation-Maximization) A iterative procedure similar to Viterbi training P( i = k, i+ = l x, ) = f k (i) a kl e l (x i+ ) b l (i+) / P(x) A kl = Σ j {(/P(x j ) [Σ i f kj (i) a kl e l (x j i+ ) b l j (i+)] } E k (b) =Σ j {(/P(x j ) [Σ {i x_i^j = b} f kj (i) b kj (i)] } 8