CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Similar documents
Stephen Scott.

CSCE 471/871 Lecture 3: Markov Chains and

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models

HIDDEN MARKOV MODELS

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models (I)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Machine Learning CMU-10701

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models

Today s Lecture: HMMs

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

CS711008Z Algorithm Design and Analysis

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models for biological sequence analysis I

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

6 Markov Chains and Hidden Markov Models

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Markov Chains and Hidden Markov Models. = stochastic, generative models

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

EECS730: Introduction to Bioinformatics

Markov chains and Hidden Markov Models

Hidden Markov Models 1

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

HMM: Parameter Estimation

Computational Genomics and Molecular Biology, Fall

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

O 3 O 4 O 5. q 3. q 4. Transition

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

8: Hidden Markov Models

Biology 644: Bioinformatics

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Hidden Markov Models. x 1 x 2 x 3 x N

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

MACHINE LEARNING 2 UGM,HMMS Lecture 7

STA 414/2104: Machine Learning

Lecture 5: December 13, 2001

Pairwise alignment using HMMs

Stephen Scott.

Hidden Markov Models. Hosein Mohimani GHC7717

Advanced Data Science

HMMs and biological sequence analysis

Statistical NLP: Hidden Markov Models. Updated 12/15

STA 4273H: Statistical Machine Learning

11.3 Decoding Algorithm

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Hidden Markov Models. Terminology, Representation and Basic Problems

Data Mining in Bioinformatics HMM

Hidden Markov Models Part 2: Algorithms

8: Hidden Markov Models

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Hidden Markov Models and some applications

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Lecture 7 Sequence analysis. Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models and some applications

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

order is number of previous outputs

Chapter 4: Hidden Markov Models

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) November 14, 2017

Linear Dynamical Systems

Basic math for biology

Multiple Sequence Alignment using Profile HMM

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Data-Intensive Computing with MapReduce

Hidden Markov Models

Undirected Graphical Models

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Hidden Markov Models

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Expectation Maximization (EM)

Bio nformatics. Lecture 3. Saad Mneimneh

Introduction to Hidden Markov Models (HMMs)

Hidden Markov Methods. Algorithms and Implementation

Gibbs Sampling Methods for Multiple Sequence Alignment

Transcription:

Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2 / 27 Bioinformatics : CpG Islands chains models (HMMs) Formal definition Finding most probable state path ( algorithm) Forward and backward algorithms Specifying an HMM The Property Focus on nucleotide sequences: Sequences of symbols from alphabet {A, C, G, T} The sequence CG (written CpG ) tends to appear more frequently in some places than in others Such CpG islands are usually 10 2 10 3 bases long Questions: 1 Given a short segment, is it from a CpG island? 2 Given a long segment, where are its islands? 3 / 27 4 / 27 Modeling CpG Islands Modeling CpG Islands (cont d) The Property Model will be a CpG generator Want probability of next symbol to depend on current symbol Will use a standard (non-hidden) model Probabilistic state machine Each state emits a symbol The Property P(A T) A T C G 5 / 27 6 / 27

The Property The Property A first-order model (what we study) has the property that observing symbol x i while in state i depends only on the previous state i 1 (which generated x i 1 ) Standard model has 1-1 correspondence between symbols and states, thus and P(x i x i 1,...,x 1 )=P(x i x i 1 ) P(x 1,...,x L )=P(x 1 ) LY P(x i x i 1 ) i=2 The Property For convenience, can add special begin (B) and end (E) states to clarify equations and define a distribution over sequence lengths Emit empty (null) symbols x 0 and x L+1 to mark ends of sequence B A T C G E 7 / 27 8 / 27 L+1 Y for The Property 9 / 27 How do we use this to differentiate islands from non-islands? Define two models: islands ( + ) and non-islands ( ) Each model gets 4 states (A, C, G, T) Take training set of known islands and non-islands Let c + st = number of times symbol t followed symbol s in an island: ˆP + (t s) = c+ st P t 0 c+ st 0 Now score a sequence X = hx 1,...,x L i by summing the log-odds ratios: log ˆP(X +) XL+1 = log ˆP(X ) ˆP + (x i x i 1 ) ˆP (x i x i 1 ) 10 / 27 Second CpG question: Given a long sequence, where are its islands? Could use tools just presented by passing a fixed-width window over the sequence and computing scores Trouble if islands lengths vary Prefer single, unified model for islands vs. non-islands A+ C+ T+ G+ [complete connectivity between all pairs] A C T G - - - - Within the + group, transition probabilities similar to those for the separate + model, but there is a small chance of switching to a state in the group What s? What s? (cont d) No longer have one-to-one correspondence between states and emitted characters E.g., was C emitted by C + or C? Must differentiate the symbol sequence X from the state sequence = h 1,..., L i State transition probabilities same as before: P( i = ` i 1 = j) (i.e., P(` j)) Now each state has a prob. of emitting any value: P(x i = x i = j) (i.e., P(x j)) [In CpG HMM, emission probs discrete and = 0 or 1] 11 / 27 12 / 27

: The Occasionally Dishonest Casino The Algorithm 13 / 27 Assume casino is typically fair, but with prob. 0.05 it switches to loaded die, and switches back with prob. 0.1 Fair 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 0.95 0.05 0.1 0.9 Given a sequence of rolls, what s hidden? Loaded 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 14 / 27 Probability of seeing symbol sequence X and state sequence is P(X, ) =P( 1 0) LY P(x i i ) P( i+1 i ) Can use this to find most likely path: = argmax P(X, ) and trace it to identify islands (paths through + states) There are an exponential number of paths through chain, so how do we find the most likely one? The Algorithm (cont d) The Algorithm (cont d) Assume that we know (for all k) v k (i) =probability of most likely path ending in state k with observation x i Then Given the formula, can fill in table with dynamic programming: v`(i + 1) =P(x i+1 `) max{v k (i) P(` k)} k All states at i State l at i +1 l v 0 (0) =1, v k (0) =0 for k > 0 For i = 1 to L; for ` = 1 to M (# states) v`(i) =P(x i `) max k {v k (i 1)P(` k)} ptr i (`) =argmax k {v k (i 1)P(` k)} P(X, )=max k {v k (L)P(0 k)} L = argmax k {v k(l)p(0 k)} For i = L to 1 i 1 = ptr i ( i ) To avoid underflow, use log(v`(i)) and add 15 / 27 16 / 27 The The 17 / 27 Given a sequence X, find P(X) = P P(X, ) Use dynamic programming like, replacing max with sum, and v k (i) with f k (i) =P(x 1,...,x i, i = k) (= prob. of observed sequence through x i, stopping in state k) f 0 (0) =1, f k (0) =0 for k > 0 For i = 1 to L; for ` = 1 to M (# states) f`(i) =P(x i `) P k f k(i 1)P(` k) P(X) = P k f k(l)p(0 k) To avoid underflow, can again use logs, though exactness of results compromised 18 / 27 Given a sequence X, find the probability that x i was emitted by state k, i.e., P( i = k X) = P( i = k, X) P(X) Algorithm: f k (i) z } { P(x 1,...,x i, i = k) = b k (i) z } { P(x i+1,...,x L i = k) P(X) {z} computed by forward alg b k (L) =P(0 k) for all k For i = L 1 to 1; for k = 1 to M (# states) b k (i) = P` P(` k) P(x i+1 `) b`(i + 1)

Use of Forward/ Specifying an HMM Define g(k) =1 if k 2 {A +,C +,G +,T + } and 0 otherwise Then G(i X) = P k P( i = k X) g(k) =probability that x i is in an island For each state k, compute P( i = k X) with forward/backward algorithm Technique applicable to any HMM where set of states is partitioned into classes Use to label individual parts of a sequence Two problems: defining structure (set of states) and parameters (transition and emission probabilities) Start with latter problem, i.e., given a training set X 1,...,X N of independently generated sequences, learn a good set of parameters Goal is to maximize the (log) likelihood of seeing the training set given that is the set of parameters for the HMM generating them: log(p(x j ; )) 19 / 27 20 / 27 Specifying an HMM: State Sequence Known Specifying an HMM: State Sequence Known (cont d) Estimating parameters when e.g., islands already identified in training set Let A k` = number of k ` transitions and E k (b) = number of emissions of b in state k P(` k) =A k` / X`0 A k`0 P(b k) =E k (b) / X b 0 E k (b 0 ) Be careful if little training data available E.g., an unused state k will have undefined parameters Workaround: Add pseudocounts r k` to A k` and r k (b) to E k (b) that reflect prior biases about parobabilities Increased training data decreases prior s influence 21 / 27 22 / 27 Specifying an HMM: The Algorithm Specifying an HMM: The Algorithm (cont d) Used for estimating params when state seq unknown Special case of expectation maximization (EM) Start with arbitrary P(` k) and P(b k), and use to estimate A k` and E k (b) as expected number of occurrences given the training set 1 : A k` = 1 P(X j ) LX f j k (i) P(` k) P(xj i+1 `) bj`(i + 1) (Prob. of transition from k to ` at position i of sequence j, summed over all positions of all sequences) E k (b) = X i:x j i =b P( i = k X j )= 1 X f j k P(X j ) (i) bj k (i) i:x j i =b Use these (& pseudocounts) to recompute P(` k) and P(b k) After each iteration, compute log likelihood and halt if no improvement 23 / 27 1 Superscript j corresponds to jth train example 24 / 27

Specifying an HMM: Specifying an HMM: Silent How to specify HMM states and connections? come from background knowledge on problem, e.g., size-4 alphabet, +/, ) 8 states Connections: Tempting to specify complete connectivity and let sort it out Problem: Huge number of parameters could lead to local max Better to use background knowledge to invalidate some connections by initializing P(` k) =0 will respect this May want to allow model to generate sequences with certain parts deleted E.g., when aligning DNA or protein sequences against a fixed model or matching a sequence of spoken words against a fixed model, some parts of the input might be omitted Problem: Huge number of connections, slow training, local maxima 25 / 27 26 / 27 Specifying an HMM: Silent (cont d) Silent states (like begin and end states) don t emit symbols, so they can bypass a regular state If there are no purely silent loops, can update, forward, and backward algorithms to work with silent states Used extensively in profile HMMs for modeling sequences of protein families (aka multiple alignments) 27 / 27