CSCE 471/871 Lecture 3: Markov Chains and

Similar documents
Stephen Scott.

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

HIDDEN MARKOV MODELS

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Three classic HMM problems

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models for biological sequence analysis

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models (I)

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models for biological sequence analysis I

Introduction to Machine Learning CMU-10701

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Stephen Scott.

Hidden Markov Models

CS711008Z Algorithm Design and Analysis

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

11.3 Decoding Algorithm

Hidden Markov Models. Hosein Mohimani GHC7717

6 Markov Chains and Hidden Markov Models

Markov chains and Hidden Markov Models

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models Part 2: Algorithms

Lecture 5: December 13, 2001

Hidden Markov Models. x 1 x 2 x 3 x N

HMM: Parameter Estimation

Computational Genomics and Molecular Biology, Fall

Today s Lecture: HMMs

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Advanced Data Science

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

O 3 O 4 O 5. q 3. q 4. Transition

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

HMM : Viterbi algorithm - a toy example

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

HMM : Viterbi algorithm - a toy example

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017

HMMs and biological sequence analysis

Multiple Sequence Alignment using Profile HMM

Statistical NLP: Hidden Markov Models. Updated 12/15

MACHINE LEARNING 2 UGM,HMMS Lecture 7

STA 414/2104: Machine Learning

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

EECS730: Introduction to Bioinformatics

STA 4273H: Statistical Machine Learning

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Pairwise alignment using HMMs

Hidden Markov Models 1

Lecture 7 Sequence analysis. Hidden Markov Models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Biology 644: Bioinformatics

Data Mining in Bioinformatics HMM

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Hidden Markov Models

Lecture 12: Algorithms for HMMs

Hidden Markov Models

Sequences and Information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

order is number of previous outputs

Hidden Markov Models (HMMs) and Profiles

Computational Genomics and Molecular Biology, Fall

Lecture 12: Algorithms for HMMs

CS 7180: Behavioral Modeling and Decision- making in AI

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Pair Hidden Markov Models

Lecture 11: Hidden Markov Models

Hidden Markov Models (HMMs) November 14, 2017

Statistical Methods for NLP

Basic math for biology

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Math 350: An exploration of HMMs through doodles.

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Expectation Maximization (EM)

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Transcription:

and and 1 / 26 sscott@cse.unl.edu

2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State sequence unknown Structure

3 / 26 An Example: CpG Islands and Focus on nucleotide sequences The sequence CG (written CpG ) tends to appear more frequently in some places than in others Such CpG islands are usually 10 2 10 3 bases long Questions: 1 Given a short segment, is it from a CpG island? 2 Given a long segment, where are its islands?

4 / 26 Modeling CpG Islands and Model will be a CpG generator Want probability of next symbol to depend on current symbol Will use a standard (non-hidden) model Probabilistic state machine Each state emits a symbol

5 / 26 Modeling CpG Islands (2) and P(A T) A T C G

6 / 26 The Property and A first-order model (what we study) has the property that observing symbol x i while in state π i depends only on the previous state π i 1 (which generated x i 1 ) Standard model has 1-1 correspondence between symbols and states, thus and P(x i x i 1,..., x 1 ) = P(x i x i 1 ) P(x 1,..., x L ) = P(x 1 ) L P(x i x i 1 ) i=2

7 / 26 Begin and End States and For convenience, can add special begin (B) and end (E) states to clarify equations and define a distribution over sequence lengths Emit empty (null) symbols x 0 and x L+1 to mark ends of sequence B A T C G E

for Discrimination and 8 / 26 How do we use this to differentiate islands from non-islands? Define two models: islands ( + ) and non-islands ( ) Each model gets 4 states (A, C, G, T) Take training set of known islands and non-islands Let c + st = number of times symbol t followed symbol s in an island: ˆP + (t s) = c+ st t c+ st Example probabilities in [Durbin et al., p. 51] Now score a sequence X = x 1,..., x L by summing the log-odds ratios: ( ) ( ) ˆP(X +) L+1 ˆP + (x i x i 1 ) log = log ˆP(X ) ˆP (x i x i 1 ) i=1

and Definition Viterbi Forward/Backward Second CpG question: Given a long sequence, where are its islands? Could use tools just presented by passing a fixed-width window over the sequence and computing scores Trouble if islands lengths vary Prefer single, unified model for islands vs. non-islands A+ C T G + + + [complete connectivity between all pairs] A C T G - - - - 9 / 26 Within the + group, transition probabilities similar to those for the separate + model, but there is a small chance of switching to a state in the group

10 / 26 What s in an? and Definition Viterbi Forward/Backward No longer have one-to-one correspondence between states and emitted characters E.g., was C emitted by C + or C? Must differentiate the symbol sequence X from the state sequence π = π 1,..., π L State transition probabilities same as before: P(π i = l π i 1 = j) (i.e., P(l j)) Now each state has a prob. of emitting any value: P(x i = x π i = j) (i.e., P(x j))

11 / 26 What s in an? (2) and Definition Viterbi Forward/Backward [In CpG, emission probs discrete and = 0 or 1]

Example: The Occasionally Dishonest Casino Example: The Occasionally Dishonest Casino and Definition Viterbi Forward/Backward Assume Assume that athat casino a casino is typically is typically fair, but fair, with but probability with 0.05 it sw to a probability loaded die, 0.05 and switches back to awith loaded probability die, and0.1 switches back with probability 0.1 Fair Loaded 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 0.95 0.05 0.1 0.9 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 12 / 26 Given a sequence of rolls, what s hidden?

13 / 26 The Viterbi Algorithm and Definition Viterbi Forward/Backward Probability of seeing symbol sequence X and state sequence π is P(X, π) = P(π 1 0) L P(x i π i ) P(π i+1 π i ) i=1 Can use this to find most likely path: π = argmax P(X, π) π and trace it to identify islands (paths through + states) There are an exponential number of paths through chain, so how do we find the most likely one?

14 / 26 The Viterbi Algorithm (2) and Definition Viterbi Forward/Backward Assume that we know (for all k) v k (i) = probability of most likely path ending in state k with observation x i Then v l (i + 1) = P(x i+1 l) max{v k (i) P(l k)} k All states at i State l at i +1 l

15 / 26 The Viterbi Algorithm (3) and Definition Viterbi Forward/Backward Given the formula, can fill in table with dynamic programming: v 0 (0) = 1, v k (0) = 0 for k > 0 For i = 1 to L; for l = 1 to M (# states) v l (i) = P(x i l) max k {v k (i 1) P(l k)} ptr i (l) = argmax k {v k (i 1) P(l k)} P(X, π ) = max k {v k (L) P(0 k)} π L = argmax k {v k(l) P(0 k)} For i = L to 1 π i 1 = ptr i (π i ) To avoid underflow, use log(v l (i)) and add

16 / 26 The Forward Algorithm and Definition Viterbi Forward/Backward Given a sequence X, find P(X) = π P(X, π) Use dynamic programming like Viterbi, replacing max with sum, and v k (i) with f k (i) = P(x 1,..., x i, π i = k) (= prob. of observed sequence through x i, stopping in state k) f 0 (0) = 1, f k (0) = 0 for k > 0 For i = 1 to L; for l = 1 to M (# states) f l (i) = P(x i l) k f k(i 1) P(l k) P(X) = k f k(l) P(0 k) To avoid underflow, can again use logs, though exactness of results compromised (Section 3.6)

The Backward Algorithm and Definition Viterbi Forward/Backward 17 / 26 Given a sequence X, find the probability that x i was emitted by state k, i.e., P(π i = k X) = P(π i = k, X) P(X) Algorithm: f k (i) {}}{ P(x 1,..., x i, π i = k) = b k (i) {}}{ P(x i+1,..., x L π i = k) P(X) }{{} computed by forward alg b k (L) = P(0 k) for all k For i = L 1 to 1; for k = 1 to M (# states) b k (i) = l P(l k) P(x i+1 l) b l (i + 1)

18 / 26 Example Use of Forward/Backward Algorithm and Definition Viterbi Forward/Backward Define g(k) = 1 if k {A +, C +, G +, T + } and 0 otherwise Then G(i X) = k P(π i = k X) g(k) = probability that x i is in an island For each state k, compute P(π i = k X) with forward/backward algorithm Technique applicable to any where set of states is partitioned into classes Use to label individual parts of a sequence

and Known Unknown Structure Two problems: defining structure (set of states) and parameters (transition and emission probabilities) Start with latter problem, i.e., given a training set X 1,..., X N of independently generated sequences, learn a good set of parameters θ Goal is to maximize the (log) likelihood of seeing the training set given that θ is the set of parameters for the generating them: N log(p(x j ; θ)) j=1 19 / 26

20 / 26 When Known and Known Unknown Structure Estimating parameters when e.g., islands already identified in training set Let A kl = number of k l transitions and E k (b) = number of emissions of b in state k ( ) P(l k) = A kl / l A kl P(b k) = E k (b) / ( b E k (b ) )

When Known (2) and Known Be careful if little training data available E.g., an unused state k will have undefined parameters Workaround: Add pseudocounts r kl to A kl and r k (b) to E k (b) that reflect prior biases about probabilities Increased training data decreases prior s influence [Sjölander et al. 96] Unknown Structure 21 / 26

The Baum-Welch Algorithm and Known Unknown Structure 22 / 26 Estimating parameters when state sequence unknown Special case of expectation maximization (EM) alg Start with arbitrary P(l k) and P(b k), and use to estimate A kl and E k (b) as expected number of occurrences given the training set 1 : A kl = N j=1 1 P(X j ) L i=1 f j k (i) P(l k) P(xj i+1 l) bj l (i + 1) (Prob. of transition from k to l at position i of sequence j, summed over all positions of all sequences) E k (b) = N j=1 i:x j i =b P(π i = k X j ) = N j=1 1 Superscript j corresponds to jth train example 1 P(X j ) i:x j i =b f j k (i) bj k (i)

23 / 26 The Baum-Welch Algorithm (2) and Known Unknown Structure A kl = E k (b) = N j=1 N j=1 1 P(X j ) L i=1 f j k (i) P(l k) P(xj i+1 l) bj l (i + 1) i:x j i =b P(π i = k X j ) = N j=1 1 P(X j ) i:x j i =b f j k (i) bj k (i) Use these (& pseudocounts) to recompute P(l k) and P(b k) After each iteration, compute log likelihood and halt if no improvement

Structure and Known Unknown Structure How to specify states and connections? States come from background knowledge on problem, e.g., size-4 alphabet, +/, 8 states Connections: Tempting to specify complete connectivity and let Baum-Welch sort it out Problem: Huge number of parameters could lead to local max Better to use background knowledge to invalidate some connections by initializing P(l k) = 0 Baum-Welch will respect this 24 / 26

Silent States and May want to allow model to generate sequences with certain parts deleted E.g., when aligning sequences against a fixed model, some parts of the input might be omitted Known Unknown Structure 25 / 26 Problem: Huge number of connections, slow training, local maxima

Silent States (2) and Silent states (like begin and end states) don t emit symbols, so they can bypass a regular state Known Unknown Structure If there are no purely silent loops, can update Viterbi, forward, and backward algorithms to work with silent states [Durbin et al., p. 72] Used extensively in profile s for modeling sequences of protein families (aka multiple alignments) 26 / 26