Lecture 7 Sequence analysis. Hidden Markov Models

Similar documents
CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov Models

Hidden Markov Models

Computational Genomics and Molecular Biology, Fall

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Basic math for biology

Hidden Markov Models (I)

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models

Hidden Markov Models for biological sequence analysis I

STA 414/2104: Machine Learning

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

STA 4273H: Statistical Machine Learning

Hidden Markov Models. Three classic HMM problems

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Introduction to Machine Learning CMU-10701

Today s Lecture: HMMs

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

HMM: Parameter Estimation

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Stephen Scott.

Dynamic Approaches: The Hidden Markov Model

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Computational Genomics and Molecular Biology, Fall

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

CSCE 471/871 Lecture 3: Markov Chains and

Data Mining in Bioinformatics HMM

Hidden Markov Models. x 1 x 2 x 3 x K

Lecture 9. Intro to Hidden Markov Models (finish up)

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Data-Intensive Computing with MapReduce

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Markov Chains and Hidden Markov Models. = stochastic, generative models

EECS730: Introduction to Bioinformatics

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

CS711008Z Algorithm Design and Analysis

Linear Dynamical Systems

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models

HIDDEN MARKOV MODELS

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Stephen Scott.

L23: hidden Markov models

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Multiple Sequence Alignment using Profile HMM

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

11.3 Decoding Algorithm

Hidden Markov Modelling

Hidden Markov Models Part 2: Algorithms

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Statistical Methods for NLP

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Linear Dynamical Systems (Kalman filter)

Machine Learning for natural language processing

Markov Models & DNA Sequence Evolution

Bioinformatics 2 - Lecture 4

Pairwise alignment using HMMs

Conditional Random Field

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

Lecture 11: Hidden Markov Models

Lecture 12: Algorithms for HMMs

Hidden Markov models in population genetics and evolutionary biology

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

HMMs and biological sequence analysis

Lecture 3: Markov chains.

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Hidden Markov Models

Lecture 12: Algorithms for HMMs

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Hidden Markov Models. Hosein Mohimani GHC7717

Genome 373: Hidden Markov Models II. Doug Fowler

Machine Learning for OR & FE

CS 7180: Behavioral Modeling and Decision- making in AI

Brief Introduction of Machine Learning Techniques for Content Analysis

Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models (HMMs) November 14, 2017

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Hidden Markov Models. x 1 x 2 x 3 x N

COMP90051 Statistical Machine Learning

Transcription:

Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60

1 Motivation 2 Examples of Hidden Markov models 3 Hidden Markov Models Formal definition Most likely path: Viterbi algorithm Summing over paths Sampling paths Learning HMM parameters and structure 4 Examples 6 Hidden Markov models and sequence alignment Pairwise alignment Multiple alignment: profile HMMs 5 Sequence alignment 7 Summary Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 2 / 60

Motivation Models of gene structure!"#"$%"&"'()#$*#%$*+&),*('$*##)&*()#$ locally, detecting small signal sequences and signatures global arrangement (order) has to be meaningful Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 3 / 60

Motivation Promoters andstructure enhancers d!un promoteur!"#$!!!$!"$$$#!$!$"!#$!$$!"$!#$!##!$!$"!#$!$$!"$!#$!#% structure of enhancers small and degenerate binding sites clustered, and separated by intervals of variable length Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 4 / 60

Motivation Position Specific Scoring Matrix experimentally validated binding sites!!!"!#!$"#!!!#!$###"!#!#$!"#!##!$#!"#!"%!!!"!#!$"#!!!#!$###"!#!#$!"#!##!$#!"#!"%!!!"!#!$"#!!!#!$###"%!"!"!"!!!#!$###"!#!#$!"#!##!$#!"#!"% "#!!!#!$###"!#!#$!"#!##!$#!"#!"% binding site A 0.5 0.4 0.1 0.5 0.2 C 0.2 0.2 0.7 0.2 0.2 G 0.1 0.2 0.1 0.2 0.1 T 0.2 0.2 0.1 0.1 0.5 background A 0.4 C 0.3 G 0.2 T 0.2 q = (q ij ) i=1..5, j=a,c,g,t r = (r j ) j=a,c,g,t Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 5 / 60

Isochores G. Bernardi / Gene 241 (2000) 3 17 7 Fig. 6. Ideogram of human chromosomes at a 850 band resolution (Francke, 1994) showing the H3+ bands as red bands (from Saccone et al., 1999). Black and grey bands are G(Giemsa) bands, white and red bands are R(Reverse) bands. isochores alternating GC rich and GC poor regions of several kb due to biased gene conversion dependent on local recombination rate

Examples of Hidden Markov models Isochores A 0.40 C 0.06 G 0.09 T 0.45 A 0.25 C 0.25 G 0.30 T 0.20 δ 1 δ P R 1 ɛ ɛ observed sequence (S) hidden sequence (H) ACTGCGA PPPRRRP p(s, H) = π P e P (A) (1 δ) e P (C) (1 δ) e P (T ) δ e R (G) (1 ɛ) e R (C) (1 ɛ) e R (G) ɛ e P (A) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 7 / 60

Examples of Hidden Markov models Isochores A 0.40 C 0.06 G 0.09 T 0.45 A 0.25 C 0.25 G 0.30 T 0.20 δ 1 δ P R 1 ɛ ɛ observed sequence (S) hidden sequence (H) ACTGCGA PPPRRRP mean length of P-tracks is 1 δ, and 1 ɛ geometric distributions for R-tracks. Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 7 / 60

Examples of Hidden Markov models Isochores with begin and end state A 0.40 A 0.25 C 0.06 C 0.25 G 0.09 G 0.30 T 0.45 T 0.20 Begin P R End transition probabilities B P R E B 0.00 0.49 0.49 0.02 P 0.00 0.98 0.01 0.01 R 0.00 0.01 0.98 0.01 E 0.00 0.00 0.00 0.00 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 8 / 60

Examples of Hidden Markov models Secondary structure H L E 3 hidden states (L,H and E) in each state, 20 emission probabilities (amino-acids) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 9 / 60

Examples of Hidden Markov models Secondary structure H L E transition probabilities between hidden states L H E L 0.85 0.07 0.08 H 0.05 0.94 0.01 E 0.06 0.01 0.93 q = (q kl ) k,l=l,h,e emission probabilities A C... Y L 0.05 0.07... 0.08 H 0.03 0.03... 0.12 E 0.10 0.11... 0.03 e = (e k (x)) k=l,h,e, x=a,c,...,y Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 9 / 60

Examples of Hidden Markov models Secondary structure H L E!!!!!!!""""""""""""!!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!!"""""" p(s, H) = π L e L (G) q LH e H (L) q HH e H (L) q HH e H (E) q HH e H (L)... Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 9 / 60

Enhancers A 1 A 2 A 3 A 4 1 ɛ O E 1 δ B 1 B 2 B 3 ATGCAAACATCCCGACACTAGAACCATCAGCAGGACACTAGCA! OOOOEEEAAAAEEEBBBEEBBBEEEEAAAAEEEEOOOOOOOOO! states O and E emit bases according to background probabilities states A i and B i emit bases according to position specific scoring matrices of transcription factors 1 and 2.

Enhancers A 1 A 2 A 3 A 4 1 ɛ O E 1 δ B 1 B 2 B 3 ATGCAAACATCCCGACACTAGAACCATCAGCAGGACACTAGCA! OOOOEEEAAAAEEEBBBEEBBBEEEEAAAAEEEEOOOOOOOOO! δ < ɛ tracks of E are smaller than tracks of O a way of modelling clustering of transcriptions factors

Formal definition A Hidden Markov chain M characterized by an alphabet of hidden states s 1, s 2,..., s K an alphabet of observable symbols v 1, v 2,..., v P initial probabilities (over hidden states) π (vector of dim K ) transition probabilities (between hidden states) q kl (K K matrix) emission probabilities e k (v p ) (K vectors of dim P) A realization: 2 parallel series of random variables the hidden state path h = (h t ) t=1..t the observed sequence of emitted symbols: x = (x t ) t=1..t Probability of the joint (hidden and observed) sequence p(x, h M) = π(h 1 )e h1 (x 1 ) [ T 1 t=1 q ht h t+1 e ht+1 (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 11 / 60 ]

Formal definition HMM with begin and end states by convention hidden state 0 will stand for begin and end state begin/end state does not emit symbols for k = 1... K : q 0k : probability of starting chain in state k (q 0k = π k ) q k0 : probability of ending chain after being in state k joint probability p(x, h M) = q 0h1 e h1 (x 1 ) [ T 1 t=1 q ht h t+1 e ht+1 (x t+1 ) ] q ht 0 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 12 / 60

Formal definition The main algorithmic problems of HMM paths given an observed sequence x find the most likely hidden path integrate probability over all paths sample paths according to their probability learning HMMs estimating parameters (transition and emission probabilities) choosing structure (how many hidden states, etc) two types of learning supervised: given annotated examples (x, h pairs) unsupervised: given non annotated examples (x only) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 13 / 60

Most likely path: Viterbi algorithm Most likely path: Viterbi algorithm h = (h t ) t=1..t : the sequence of hidden state (hidden path) x = (x t ) t=1..t : the observed sequence of emitted symbols p(x, h M): the joint probability question Given x = (x t ) t=1..t, find ĥ = (ĥt) t=1..t such that: ĥ = max h p(x, h M) maximum is over P T possible paths cannot be computed by brute force searching but exhaustive search is possible using dynamic programming Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 14 / 60

Most likely path: Viterbi algorithm Viterbi algorithm example H L E!!!!!!!""""""""""""!!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!!"""""" Given HMM (with emission and transition probabilities specified) and a protein sequence, find most likely secondary structure annotation Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 15 / 60

Most likely path: Viterbi algorithm Viterbi recursion Definition v k (t): probability of the most probable path ending in state k at time t. v k (t) = max h 1,...,h t 1 p(h 1,..., h t 1, h t = k, x M) ptr k (t): hidden state at time t 1 of this most probable path recursion v l (t + 1) = max v k (t)q kl e l (x t+1 ) k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 16 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 ptr 2 (t) v 1 (t) 2 3 ptr 1 (t) ptr 3 (t) v 2 (t) v 3 (t)... Definition v k (t): probability of the most probable path ending in state k at time t. ptr k (t): hidden state at time t 1 of this most probable path Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 17 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 2 3 recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)?... v 1 (t + 1)? = v 1 (t)q 11 e 1 (L) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 18 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 2 3 recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)?... v 1 (t + 1)? = v 2 (t)q 21 e 1 (L) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 19 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 2 3 recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)?... v 1 (t + 1)? = v 3 (t)q 31 e 1 (L) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 20 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 2 3 recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)... v 1 (t + 1) = max k=1,2,3 v k(t)q k1 e 1 (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 21 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 2 ptr 1 (t + 1) v 1 (t + 1)... 3 recursion ptr 1 (t + 1) = arg max k=1,2,3 v k(t)q k1 e 1 (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 22 / 60

Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 v 1 (t) v 1 (t + 1) 2 v 2 (t) v 2 (t + 1)... 3 v 3 (t) v 3 (t + 1) recursion v l (t + 1) = max v k (t)q kl e l (x t+1 ) k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 23 / 60

Most likely path: Viterbi algorithm Viterbi algorithm initialization v 0 (0) = 1, v k (0) = 0 for k < 0. recursion t = 0... T 1 v l (t + 1) = e l (x t+1 ) max k v k (t)q kl ptr l (t + 1) = argmax k v k (t)q kl termination p(x, ĥ M) = max k v k (T ) ĥ T = argmax k v k (T ) traceback t = T 1... 1 ĥ t = ptr t+1 (ĥt+1) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 24 / 60

Most likely path: Viterbi algorithm Complexity of Viterbi algorithm compute a product of probabilities at each time step t = 1..T for each state k = 1..K at time t and for each state l = 1..K at time t + 1 Complexity TK 2 linear in T : very efficient, even for long genomic sequences quadratic in K : should keep model simple (low number of hidden states) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 25 / 60

Most likely path: Viterbi algorithm Integrating over all paths. Baum Welsch algorithm h = (h t ) t=1..t : the sequence of hidden state (hidden path) x = (x t ) t=1..t : the observed sequence of emitted symbols p(x, h M): the joint probability question Given x = (x t ) t=1..t, compute p(x M) = h p(x, h M) sum is over P N possible paths again, exhaustive sum is possible using dynamic programming (forward and backward algorithms) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 26 / 60

Summing over paths Integrating over all paths. Baum Welsch algorithm question Given x = (x t ) t=1..t, compute p(x M) = h p(x, h M) sum is over P N possible paths again, exhaustive sum is possible using dynamic programming Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 27 / 60

Summing over paths Forward algorithm M V K R L 1 f 1 (t) 2... 3 definition f k (t) = p(x 1,..., x t, h t = k) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 28 / 60

Summing over paths Forward algorithm M V K R L 1 2 3 recursion f 1 (t) f 2 (t) f 3 (t) f 1 (t + 1)... f l (t + 1) = e l (x t+1 ) k f k (t)q kl Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 29 / 60

Summing over paths Forward algorithm M V K R L 1 f 1 (t) f 1 (t + 1) 2 f 2 (t) f 2 (t + 1)... 3 f 3 (t) f 3 (t + 1) recursion f l (t + 1) = e l (x t+1 ) k f k (t)q kl Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 29 / 60

Summing over paths Forward algorithm initialization f 0 (0) = 1, f k (0) = 0 for k > 0. recursion t = 0... T 1 f l (t + 1) = e l (x t+1 ) k f k(t)q kl termination p(x M) = k f k(t ) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 30 / 60

Summing over paths Backward algorithm D F Y M A 1 b 1 (t) 2... 3 definition b k (t) = p(x t+1,..., x T h t = k) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 31 / 60

Summing over paths Backward algorithm D F Y M A 1 2... 3 recursion b 1 (t 1) b 1 (t) b 2 (t) b 3 (t) b k (t 1) = l q kl e l (x t )b l (t) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 32 / 60

Summing over paths Backward algorithm D F Y M A 1 2... 3 recursion b 1 (t 1) b 2 (t 1) b 3 (t 1) b 1 (t) b 2 (t) b 3 (t) b k (t 1) = l q kl e l (x t )b l (t) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 32 / 60

Summing over paths Backward algorithm initialization b k (L) = q k0. recursion t = T... 2 b k (t 1) = l e l(x t )b l (t)q kl termination p(x M) = l q 0le l (x 1 )b l (1) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 33 / 60

Summing over paths Marginal posterior probabilities question Given x = (x t ) t=1..t, compute the pointwise marginal probability: p(h t = k x, M) = 1 p(x M) h h t =k p(x, h M) h h t =k p(x, h) = p(x 1,..., x t, h t = k) p(x t+1,..., x T h t = k) = f t (k) b t (k) p(h t = k x) = f t(k) b t (k) p(x) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 34 / 60

Summing over paths Marginal posterior probabilities question Given x = (x t ) t=1..t, compute the pointwise marginal probability: Posterior decoding p(h t = k x, M) = 1 p(x M) h h t =k for each t = 1..T, find h t that maximizes p(h t x) different from Viterbi path does not even necessarily represent a valid path more stable than Viterbi p(x, h M) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 34 / 60

Sampling paths Sampling paths question Given x = (x t ) t=1..t, sample a path h = (h t ) t=1..t probabilistically from h p(h x, M) viterbi path is the most likely outcome (by definition) but suboptimal paths may matter in particular, if total probability is spread over many paths sampling and averaging gives an idea of uncertainty about path Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 35 / 60

Sampling paths Sampling paths question Given x = (x t ) t=1..t, sample a path h = (h t ) t=1..t probabilistically from h p(h x, M) could in principle be done by Gibbs sampling but a more efficient dynamic programming method exists Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 35 / 60

Sampling paths D F Y M A 1 2 3 b 1 (1) b 2 (1) b 3 (1) Bayes theorem p(h 1 = k) = q 0k p(x 1... x T h 1 = k) = b k (1) p 1k = p(h 1 = k x 1... x T ) q 0k b k (1)e k (x 1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 36 / 60

Sampling paths D F Y M A 1 2 3 b 1 (1) b 2 (1) b 3 (1) Compute (p 1k ) k=1..k, and sample h 1 at random: h 1 p 1k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 37 / 60

Sampling paths D F Y M A 1 b 1 (1) 2 3 Compute (p 1k ) k=1..k, and sample h 1 at random: h 1 p 1k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 37 / 60

Sampling paths D F Y M A 1 2 3 b 1 (t + 1) b 2 (t + 1) b 3 (t + 1) Suppose we have sampled h 1... h t p(h t+1 = l h t = k) = q kl p(x t+1... x T h t+1 = l) = b l (t + 1)e l (x t+1 ) p t+1 l = p(h t+1 = l x t+1... x T, h 1... h t ) q kl b l (t + 1)e l (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 38 / 60

Sampling paths D F Y M A 1 2 3 b 1 (t + 1) b 2 (t + 1) b 3 (t + 1) Compute (p t+1 k ) k=1..k, and sample h t+1 at random:... h t+1 p t+1 k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 39 / 60

Sampling paths D F Y M A 1 b 1 (t + 1) 2 3 Compute (p t+1 k ) k=1..k, and sample h t+1 at random:... h t+1 p t+1 k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 39 / 60

Sampling paths D F Y M A 1 2 3... until the entire path has been sampled Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 40 / 60

Learning HMM parameters and structure Learning HMM parameters question Given a training database of (x, h) annotated pairs learn the parameters of the HMM. just compute N kl total number of transitions from k to l in database N k M kp M k total number of transitions from k: N k = l N kl total number of emissions of symbol p when in state k total number of emissions in state k: M k = p M kp ML estimate q kl = N kl N k, e k (s p ) = M kp M k Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 41 / 60

Learning HMM parameters and structure Learning HMM parameters question Given a training database of (x, h) annotated pairs learn the parameters of the HMM. just compute N kl total number of transitions from k to l in database N k M kp M k total number of transitions from k: N k = l N kl total number of emissions of symbol p when in state k total number of emissions in state k: M k = p M kp Bayes estimate (add pseudocounts) q kl = N kl+n l N k + n, e k(s p ) = M kp + m p M k + m Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 41 / 60

Learning HMM parameters and structure Unsupervised training Viterbi training (approximate) start from rough parameter estimates use Viterbi algorithm to infer paths compute N kl and M kp on viterbi paths estimate parameters based on N kl and M kp iterate Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 42 / 60

Learning HMM parameters and structure Unsupervised training Viterbi training (approximate) start from rough parameter estimates use Viterbi algorithm to infer paths compute N kl and M kp on viterbi paths estimate parameters based on N kl and M kp iterate Baum Welsch training (exact: EM) start from rough parameter estimates use Baum Welsch algorithm to compute expectations of N kl and M kp over all paths estimate parameters based on those expectations N kl and M kp iterate Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 42 / 60

Examples Predicting secondary structure Martin et al 2006 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 43 / 60

Examples Modelling protein architecture Handcrafted HMMs. Stultz et al 1993. Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 44 / 60

Examples Automatic annotation of gene structure Burge and Karlin 1997 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 45 / 60

Examples Automatic annotation of gene structure Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 46 / 60

Hidden Markov models and sequence alignment Using HMMs for aligning sequences rn to article PREVIOUS NEXT View Larger Image Download original TIFF Download PowerPoint Friendly Image Figure 5. The Java GUI allows users to visualize the estimated alignment accuracy under FSA's statistical model. FSA's alignment is colored according the expected accuracy under FSA's statistical model (top) as well as according to the true accuracy (bottom) given from a comparison between FSA's alignment and the reference structural alignment. It is Nicolas Lartillot clear (Universite from inspection de Montréal) that accuracies estimated BIN6009 under FSA's statistical model may 2012 47 / 60

Hidden Markov models and sequence alignment Pairwise alignment Pair HMM for pairwise alignment 1 2δ sequence X sequence Y hidden sequence X ɛ δ 1 ɛ M δ 1 ɛ Y ɛ HEAG--AW P-AGGHA-... MXMMYYMX gap opening with probability δ gap extension with probability ɛ Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 48 / 60

Hidden Markov models and sequence alignment Pairwise alignment Pair-HMM. pairwise alignment X B M E Y sequence X sequence Y hidden sequence HEAG--AW P-AGGHA-... MXMMYYMX Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 49 / 60

Assessing homology between 2 sequences model for homologous sequences (M 1 ) X B M E Y HEAG--AW P-AGGHAmodel for non homologous sequences (M 0 ) B X Y E HEAGAW----- ------PAGGHA L = p(s 1, s 2 M 1 ) p(s 1, s 2 M 0 ) L > 1: homologous L < 1: not homologous (but needs calibration).

rn to article Hidden Markov models and sequence alignment Pairwise alignment Posterior decoding (FSA) PREVIOUS NEXT View Larger Image Download original TIFF Download PowerPoint Friendly Image Figure 5. The Java GUI allows users to visualize the estimated alignment accuracy under FSA's statistical model. for each pair of positions, compute posterior probability of being aligned (i.e. jointly emitted by a match state) FSA's alignment is colored according the expected accuracy under FSA's statistical model (top) as well as according to the true accuracy (bottom) given from a comparison between FSA's alignment and the reference structural alignment. It is clear from inspection that accuracies estimated under FSA's statistical model correspond closely to the true accuracies. Sequences are from alignment BBS12030 in the RV12 dataset of BAliBASE 3 [24]. put in same colums positions that have high match probability Bradley et al PLOS Compute Biol 2009. Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 51 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Multiple alignment multiple HMM generalization of pair HMM with P > 2 sequences is possible however, number of hidden states exponential in P pair HMM merging pairwise alignments obtained using a pair HMM as in previous slide (posterior decoding) profile HMM Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 52 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs!!!!!!!""""""""""""!!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!!"" gaps are not uniformly distributed along sequences conserved blocks, alternating with regions of more variable length corresponds to structured / loop regions of the conformation Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 53 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 54 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM trained on seed alignment: FQDN-R FK-N-R FK-E-R FK-EVR... 012.3.45 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 54 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM trained on seed alignment: FQDN-R FK-N-R FK-E-R FK-EVR... 012.3.45 applied on new sequence AEFWQ-RAI... 0ii1i2d4ii5 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 54 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM overall procedure for each protein family make a seed alignment (based on superposition of 3D structures) train a profile HMM on seed alignment homologues should have higher p(s M): using Baum Welsch, search for homologues in databases. using Viterbi (or posterior decoding), align homologues with seed alignment Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 55 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM seed alignments are often small using mixture of priors to model match state emission probabilities (see TP) improves detection of distant homologues (Brown et al 1993) Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 55 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs exemple: globin family Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 56 / 60

Hidden Markov models and sequence alignment Multiple alignment: profile HMMs exemple: globin family Krogh et al 1994!"#$%&'(&)*+&,--.& Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 57 / 60