Today s Lecture: HMMs

Similar documents
Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall

Stephen Scott.

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

O 3 O 4 O 5. q 3. q 4. Transition

Lecture 9. Intro to Hidden Markov Models (finish up)

Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

HMMs and biological sequence analysis

STA 414/2104: Machine Learning

Lecture 3: Markov chains.

Markov Chains and Hidden Markov Models. = stochastic, generative models

Stephen Scott.

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Markov Models & DNA Sequence Evolution

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models

Dynamic Approaches: The Hidden Markov Model

p(d θ ) l(θ ) 1.2 x x x

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

STA 4273H: Statistical Machine Learning

Data Mining in Bioinformatics HMM

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis I

EECS730: Introduction to Bioinformatics

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HIDDEN MARKOV MODELS

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Lecture 7 Sequence analysis. Hidden Markov Models

Basic math for biology

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models (I)

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models

Evolutionary Models. Evolutionary Models

DNA Feature Sensors. B. Majoros

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

order is number of previous outputs

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

HMM part 1. Dr Philip Jackson

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Outline of Today s Lecture

Hidden Markov Models

Design and Implementation of Speech Recognition Systems

Statistical Methods for NLP

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Hidden Markov Models 1

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Brief Introduction of Machine Learning Techniques for Content Analysis

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models

Chapter 4: Hidden Markov Models

BMI/CS 576 Fall 2016 Final Exam

Hidden Markov Methods. Algorithms and Implementation

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Gibbs Sampling Methods for Multiple Sequence Alignment

Hidden Markov Models

11.3 Decoding Algorithm

Hidden Markov Models. Terminology and Basic Algorithms

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Statistical NLP: Hidden Markov Models. Updated 12/15

Introduction to Machine Learning CMU-10701

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Graphical Models Seminar

CS 188: Artificial Intelligence Fall Recap: Inference Example

Conditional Random Field

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

CS 188: Artificial Intelligence Fall 2011

Hidden Markov Model and Speech Recognition

Hidden Markov models

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Hidden Markov Models Hamid R. Rabiee

Transcription:

Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1

Hidden Markov Models Probability models for sequences of observed symbols, e.g. nucleotide or amino acid residues aligned pairs of residues aligned set of residues corresponding to leaves of an underlying evolutionary tree angles in protein chain (structure modelling) sounds (speech recognition) 2

Assume a sequence of hidden (unobserved) states underlies each observed symbol sequence Each state emits symbols (one symbol at a time) States may correspond to underlying reality we are trying to infer, e.g. unobserved biological feature: (positions within) site, coding region of gene rate of evolution protein structural element speech phoneme 3

observed symbols A G C A T...... 1 2 3 i n unobserved states 4

observed symbols A G C A T...... 1 2 3 0 0 i n begin state unobserved states end state (we do not include) 5

Advantages of HMMs Flexible gives reasonably good models in wide variety of situations Computationally efficient Often interpretable: hidden states can correspond to biological features. can find most probable sequence of hidden states = biological parsing of residue sequence. 6

HMMs: Formal Definition Alphabet B = {b} of observed symbols Set S = {k} of hidden states (usually k = 0,1, 2...,m; 0 is reserved for begin state, and sometimes also an end state) (Markov chain property): prob of state occurring at given position depends only on immediately preceding state, and is given by transition probabilities (a kl ): a kl = Prob(next state is l curr state is k) l a kl = 1, for each k. Usually, many transition probabilities are set to 0. Model topology is the # of states, and allowed (i.e. a kl 0) transitions. Sometimes omit begin state, in which case need initiation probabilities (p k ) for sequence starting in a given state 7

observed symbols A G C A T e 1 (A)...... e 2 (G) e 3 (C) e i (A) e n (T) a 0 1 1 a 1 2 2 a 2 3 3 a 3 4 0 0 i a i i+1 n unobserved states 8

Prob that symbol occurs at given sequence position depends only on hidden state at that position, and is given by emission probabilities: e k (b) = Prob(observed symbol is b curr state is k) (begin and end states do not emit symbols) Note that there are no direct dependencies between observed symbols in the sequence, however there are indirect dependencies implied by state dependencies 9

Where do the parameters come from? Can either define parameter values a priori, or estimate them from training data (observed sequences of the type to be modelled). Usually one does a mixture of both model topology is defined (some transitions set to 0), but remaining parameters estimated 10

Hidden Markov Model observed symbols A G C A T e 1 (A)...... e 2 (G) e 3 (C) e i (A) e n (T) a 0 1 1 a 1 2 2 a 2 3 3 a 3 4 0 0 i a i i+1 n unobserved states 11

HMM Examples Site models: states correspond to positions (columns in the tables). state i transitions only to state i+1: a i,i+1 = 1 for all i; all other a ij are 0 emission probabilities are position-specific frequencies: values in frequency table columns 12

Topology for Site HMM: allowed transitions (transits with non-zero prob all are 1) 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Emission probabilities HMM for C. elegans 3 Splice Sites 3 ss Intron Exon A 3276 3516 2313 476 67 757 240 8192 0 3359 2401 2514 C 970 648 664 236 129 1109 6830 0 0 1277 1533 1847 G 593 575 516 144 39 595 12 0 8192 2539 1301 1567 T 3353 3453 4699 7336 7957 5731 1110 0 0 1017 2957 2264 CONSENSUS W W W T T t C A G r w w A 0.400 0.429 0.282 0.058 0.008 0.092 0.029 1.000 0.000 0.410 0.293 0.307 C 0.118 0.079 0.081 0.029 0.016 0.135 0.834 0.000 0.000 0.156 0.187 0.225 G 0.072 0.070 0.063 0.018 0.005 0.073 0.001 0.000 1.000 0.310 0.159 0.191 T 0.409 0.422 0.574 0.896 0.971 0.700 0.135 0.000 0.000 0.124 0.361 0.276 0 1 2 3 4 5 6 7 8 9 10 11 12 hidden states 14

Can expand model to allow omission of nuc at some positions by including other (downstream) transitions (or via silent states ) Can allow insertions by including additional states. transition probabilities no longer necessarily 1 or 0 15

Insertions & Deletions in Site Model insertion state other transitions correspond to deletions 16

Examples (cont d) 1-state HMMs single state, emitting residues with specified freqs: = background model 17

Examples (cont d) 2-state HMMs if a 11 and a 22 are small (close to 0), and a 12 and a 21 are large (close to 1), then get (nearly) periodic model with period 2; e.g. dinucleotide repeat in DNA, or (some) beta strands in proteins. if a 11 and a 22 large, and a 12 and a 21 small, then get models of alternating regions of different compositions (specified by emission probabilities), e.g. higher vs. lower G+C content regions (RNA genes in thermophilic bacteria); or hydrophobic vs. hydrophilic regions of proteins (e.g. transmembrane domains). 18

A A T G C C T G G A T A G+C-rich state A+T-rich state 19

2-state HMMs Can find most probable state decomposition ( Viterbi path ) consistent with observed sequence Advantages over linked-list dynamic programming method (lecture 3) for finding high-scoring segments: That method assumes you know appropriate parameters to find targeted regions; HMM method can estimate parameters. HMM (easily) finds multiple segments HMM can attach probabilities to alternative decompositions HMM generalization to > 2 types of segments is easy just allow more states! Disadvantage: Markov assumption on state transitions implies geometric distribution for lengths of regions -- may not be appropriate 20

Hidden Markov Model observed symbols A G C A T e 1 (A)...... e 2 (G) e 3 (C) e i (A) e n (T) a 0 1 1 a 1 2 2 a 2 3 3 a 3 4 0 0 i a i i+1 n unobserved states 21

HMM Probabilities of Sequences Prob of sequence of states 1 2 3... n is a 0 1 a 1 2 a 2 3 a 3 4... a n-1 n. Prob of seq of observed symbols b 1 b 2 b 3... b n, conditional on state sequence is e 1 (b 1 )e 2 (b 2 ) e 3 (b 3 )... e n (b n ) Joint probability = a 0 1 n i=1 a i i+1 e i (b i ) (define a n n+1 to be 1) (Unconditional) prob of observed sequence = sum (of joint probs) over all possible state paths not practical to compute directly, by brute force! We will use dynamic programming. 22

Computing HMM Probabilities WDAG structure for sequence HMMs: for i th position in seq (i = 1,... n), have 2 nodes for each state: total # nodes = 2ns + 1, where n = seq length, s = # states Pair of nodes for a given state at i th position is connected by an emission edge Weight is the emission prob for i th observed residue. Can omit node pair if emission prob = 0. Have transition edges connecting (right-hand) state nodes at position i with (left-hand) state nodes at position i+1 Weights are transition probs Can omit edges with transition prob = 0. 23

WDAG for 3-state HMM, length n sequence weights are emission probabilities e k (b i ) for i th residue b i weights are transition probabilities a kl e 1 (b i-1 ) a 11 e 1 (b i ) a 11... e 2 (b i ) e 3 (b i ) a 12 a 33... b i-1 b i b i+1 position i-1 position i position i+1 24

Beginning of Graph... begin state b 1 b 2 b 3 position 1 position 2 position 3 25

Paths through graph from begin node to end node correspond to sequences of states Product weight along path = joint probability of state sequence & observed symbol sequence Sum of (product) path weights, over all paths, = probability of observed sequence Sum of (product) path weights over all paths going through a particular node, or all paths that include a particular edge, divided by prob of observed sequence, = posterior probability of that edge or node Highest-weight path = highest probability state sequence 26

Path Weights e 1 (b i-1 )... a 12 e 2 (b i ) a23 e 3 (b i+1 )... position i-1 position i position i+1 27

By general results on WDAGs, can use dynamic programming to find sum of all product path weights = forward algorithm for probability of observed sequence highest weight path = Viterbi algorithm to find highest probability path sum of all product path weights through particular node or particular edge = forwards/backwards algorithm to find posterior probabilities 28

In each case, compute successively for each node (by increasing depth: left to right) the sum (for forward & forward/backward algorithm), or maximum (for Viterbi algorithm), of the weights of all paths ending at that node. In forwards/backwards approach, work through all nodes twice, once in each direction. (N.B. paths are constrained to begin at the begin node!) 29

For each vertex v, let f(v) = paths p ending at vweight(p), where weight(p) = product of edge weights in p. Only consider paths starting at begin node. Compute f(v) by dynam. prog: f(v) = i w i f(v i ), where v i ranges over the parents of v, and w i = weight of the edge from v i to v. v 1 v 2 v 3 w 1 w 2 w 3 v Similarly for m(v) = max p ending at v weight(p) and for b(v) = p beginning at v weight(p) (the paths beginning at v are the ones ending at v in the reverse graph). 30

The Viterbi path is the most probable parse! 31

Numerical issues: multiplying many small values can cause underflow. Remedies: Scale weights to be close to 1 (affects all paths by same constant factor which can be multiplied back later); or (where possible) use log weights, so can add instead of multiplying. see Rabiner & Tobias Mann links on web page & will discuss further in discussion section Complexity: O( V + E ), total # vertices and edges. # nodes = 2ns + 2 where n = sequence length, s = # states. # edges = (n 1)s 2 + ns + 2s So overall complexity is O(ns 2 ) (actually s 2 can be reduced to # allowed transitions between states depends on model topology). 32

HMM Parameter Estimation Suppose parameter values (transition & emission probs) unknown Need to estimate from set of training sequences Maximum likelihood (ML) estimation (= choice of param vals to maximize prob of data) is preferred optimality properties of ML estimates discussed in Ewens & Grant 33

Hidden Markov Model observed symbols A G C A T e 1 (A)...... e 2 (G) e 3 (C) e i (A) e n (T) a 0 1 1 a 1 2 2 a 2 3 3 a 3 4 0 0 i a i i+1 n unobserved states 34

Parameter estimation when state sequence is known When underlying state sequence for each training sequence is known, e.g.: weight matrix; Markov model then ML estimates are given by: emission probabilities: e k (b)^ = (# times symbol b emitted by state k) / (# times state k occurs). transition probabilities: a kl ^ = (# times state k followed by state l) / (# times state k occurs) in denominator above, omit occurrence at last position of sequence (for transition probabilities) But include it for emission probs can include pseudocounts, to incorporate prior expectations/avoid small sample overfitting (Bayesian justification) 35

Parameter estimation when Viterbi training state sequences unknown 1. choose starting parameter values 2. find highest weight paths (Viterbi) for each sequence 3. estimate new emission and transition probs as above, assuming Viterbi state sequence is true 4. iterate steps 2 and 3 until convergence not guaranteed to occur but nearly always does 5. does not necessarily give ML estimates, but often are reasonably good 36