Lecture 7 Sequence analysis. Hidden Markov Models

Size: px

Start display at page:

Download "Lecture 7 Sequence analysis. Hidden Markov Models"

Calvin Snow
5 years ago
Views:

1 Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

2 1 Motivation 2 Examples of Hidden Markov models 3 Hidden Markov Models Formal definition Most likely path: Viterbi algorithm Summing over paths Sampling paths Learning HMM parameters and structure 4 Examples 6 Hidden Markov models and sequence alignment Pairwise alignment Multiple alignment: profile HMMs 5 Sequence alignment 7 Summary Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

3 Motivation Models of gene structure!"#"$%"&"'()#$*#%$*+&),*('$*##)&*()#$ locally, detecting small signal sequences and signatures global arrangement (order) has to be meaningful Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

Motivation Promoters andstructure enhancers d!

4 Motivation Promoters andstructure enhancers d!un promoteur!"#$!!!$!"$$$#!$!$"!#$!$$!"$!#$!##!$!$"!#$!$$!"$!#$!#% structure of enhancers small and degenerate binding sites clustered, and separated by intervals of variable length Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

5 Motivation Position Specific Scoring Matrix experimentally validated binding sites!!!"!#!$"#!!!#!$###"!#!#$!"#!##!$#!"#!"%!!!"!#!$"#!!!#!$###"!#!#$!"#!##!$#!"#!"%!!!"!#!$"#!!!#!$###"%!"!"!"!!!#!$###"!#!#$!"#!##!$#!"#!"% "#!!!#!$###"!#!#$!"#!##!$#!"#!"% binding site A C G T background A 0.4 C 0.3 G 0.2 T 0.2 q = (q ij ) i=1..5, j=a,c,g,t r = (r j ) j=a,c,g,t Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

6 Isochores G. Bernardi / Gene 241 (2000) Fig. 6. Ideogram of human chromosomes at a 850 band resolution (Francke, 1994) showing the H3+ bands as red bands (from Saccone et al., 1999). Black and grey bands are G(Giemsa) bands, white and red bands are R(Reverse) bands. isochores alternating GC rich and GC poor regions of several kb due to biased gene conversion dependent on local recombination rate

7 Examples of Hidden Markov models Isochores A 0.40 C 0.06 G 0.09 T 0.45 A 0.25 C 0.25 G 0.30 T 0.20 δ 1 δ P R 1 ɛ ɛ observed sequence (S) hidden sequence (H) ACTGCGA PPPRRRP p(s, H) = π P e P (A) (1 δ) e P (C) (1 δ) e P (T ) δ e R (G) (1 ɛ) e R (C) (1 ɛ) e R (G) ɛ e P (A) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

8 Examples of Hidden Markov models Isochores A 0.40 C 0.06 G 0.09 T 0.45 A 0.25 C 0.25 G 0.30 T 0.20 δ 1 δ P R 1 ɛ ɛ observed sequence (S) hidden sequence (H) ACTGCGA PPPRRRP mean length of P-tracks is 1 δ, and 1 ɛ geometric distributions for R-tracks. Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

9 Examples of Hidden Markov models Isochores with begin and end state A 0.40 A 0.25 C 0.06 C 0.25 G 0.09 G 0.30 T 0.45 T 0.20 Begin P R End transition probabilities B P R E B P R E Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

10 Examples of Hidden Markov models Secondary structure H L E 3 hidden states (L,H and E) in each state, 20 emission probabilities (amino-acids) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

11 Examples of Hidden Markov models Secondary structure H L E transition probabilities between hidden states L H E L H E q = (q kl ) k,l=l,h,e emission probabilities A C... Y L H E e = (e k (x)) k=l,h,e, x=a,c,...,y Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

12 Examples of Hidden Markov models Secondary structure H L E!!!!!!!""""""""""""!!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!!"""""" p(s, H) = π L e L (G) q LH e H (L) q HH e H (L) q HH e H (E) q HH e H (L)... Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

13 Enhancers A 1 A 2 A 3 A 4 1 ɛ O E 1 δ B 1 B 2 B 3 ATGCAAACATCCCGACACTAGAACCATCAGCAGGACACTAGCA! OOOOEEEAAAAEEEBBBEEBBBEEEEAAAAEEEEOOOOOOOOO! states O and E emit bases according to background probabilities states A i and B i emit bases according to position specific scoring matrices of transcription factors 1 and 2.

14 Enhancers A 1 A 2 A 3 A 4 1 ɛ O E 1 δ B 1 B 2 B 3 ATGCAAACATCCCGACACTAGAACCATCAGCAGGACACTAGCA! OOOOEEEAAAAEEEBBBEEBBBEEEEAAAAEEEEOOOOOOOOO! δ < ɛ tracks of E are smaller than tracks of O a way of modelling clustering of transcriptions factors

15 Formal definition A Hidden Markov chain M characterized by an alphabet of hidden states s 1, s 2,..., s K an alphabet of observable symbols v 1, v 2,..., v P initial probabilities (over hidden states) π (vector of dim K ) transition probabilities (between hidden states) q kl (K K matrix) emission probabilities e k (v p ) (K vectors of dim P) A realization: 2 parallel series of random variables the hidden state path h = (h t ) t=1..t the observed sequence of emitted symbols: x = (x t ) t=1..t Probability of the joint (hidden and observed) sequence p(x, h M) = π(h 1 )e h1 (x 1 ) [ T 1 t=1 q ht h t+1 e ht+1 (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60 ]

16 Formal definition HMM with begin and end states by convention hidden state 0 will stand for begin and end state begin/end state does not emit symbols for k = 1... K : q 0k : probability of starting chain in state k (q 0k = π k ) q k0 : probability of ending chain after being in state k joint probability p(x, h M) = q 0h1 e h1 (x 1 ) [ T 1 t=1 q ht h t+1 e ht+1 (x t+1 ) ] q ht 0 Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

17 Formal definition The main algorithmic problems of HMM paths given an observed sequence x find the most likely hidden path integrate probability over all paths sample paths according to their probability learning HMMs estimating parameters (transition and emission probabilities) choosing structure (how many hidden states, etc) two types of learning supervised: given annotated examples (x, h pairs) unsupervised: given non annotated examples (x only) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

18 Most likely path: Viterbi algorithm Most likely path: Viterbi algorithm h = (h t ) t=1..t : the sequence of hidden state (hidden path) x = (x t ) t=1..t : the observed sequence of emitted symbols p(x, h M): the joint probability question Given x = (x t ) t=1..t, find ĥ = (ĥt) t=1..t such that: ĥ = max h p(x, h M) maximum is over P T possible paths cannot be computed by brute force searching but exhaustive search is possible using dynamic programming Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

19 Most likely path: Viterbi algorithm Viterbi algorithm example H L E!!!!!!!""""""""""""!!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!!"""""" Given HMM (with emission and transition probabilities specified) and a protein sequence, find most likely secondary structure annotation Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

20 Most likely path: Viterbi algorithm Viterbi recursion Definition v k (t): probability of the most probable path ending in state k at time t. v k (t) = max h 1,...,h t 1 p(h 1,..., h t 1, h t = k, x M) ptr k (t): hidden state at time t 1 of this most probable path recursion v l (t + 1) = max v k (t)q kl e l (x t+1 ) k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

21 Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 ptr 2 (t) v 1 (t) 2 3 ptr 1 (t) ptr 3 (t) v 2 (t) v 3 (t)... Definition v k (t): probability of the most probable path ending in state k at time t. ptr k (t): hidden state at time t 1 of this most probable path Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

22 Most likely path: Viterbi algorithm Viterbi recursion M V K R L recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)?... v 1 (t + 1)? = v 1 (t)q 11 e 1 (L) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

23 Most likely path: Viterbi algorithm Viterbi recursion M V K R L recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)?... v 1 (t + 1)? = v 2 (t)q 21 e 1 (L) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

24 Most likely path: Viterbi algorithm Viterbi recursion M V K R L recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)?... v 1 (t + 1)? = v 3 (t)q 31 e 1 (L) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

25 Most likely path: Viterbi algorithm Viterbi recursion M V K R L recursion v 1 (t) v 2 (t) v 3 (t) v 1 (t + 1)... v 1 (t + 1) = max k=1,2,3 v k(t)q k1 e 1 (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

26 Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 2 ptr 1 (t + 1) v 1 (t + 1)... 3 recursion ptr 1 (t + 1) = arg max k=1,2,3 v k(t)q k1 e 1 (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

27 Most likely path: Viterbi algorithm Viterbi recursion M V K R L 1 v 1 (t) v 1 (t + 1) 2 v 2 (t) v 2 (t + 1)... 3 v 3 (t) v 3 (t + 1) recursion v l (t + 1) = max v k (t)q kl e l (x t+1 ) k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

28 Most likely path: Viterbi algorithm Viterbi algorithm initialization v 0 (0) = 1, v k (0) = 0 for k < 0. recursion t = 0... T 1 v l (t + 1) = e l (x t+1 ) max k v k (t)q kl ptr l (t + 1) = argmax k v k (t)q kl termination p(x, ĥ M) = max k v k (T ) ĥ T = argmax k v k (T ) traceback t = T ĥ t = ptr t+1 (ĥt+1) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

29 Most likely path: Viterbi algorithm Complexity of Viterbi algorithm compute a product of probabilities at each time step t = 1..T for each state k = 1..K at time t and for each state l = 1..K at time t + 1 Complexity TK 2 linear in T : very efficient, even for long genomic sequences quadratic in K : should keep model simple (low number of hidden states) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

30 Most likely path: Viterbi algorithm Integrating over all paths. Baum Welsch algorithm h = (h t ) t=1..t : the sequence of hidden state (hidden path) x = (x t ) t=1..t : the observed sequence of emitted symbols p(x, h M): the joint probability question Given x = (x t ) t=1..t, compute p(x M) = h p(x, h M) sum is over P N possible paths again, exhaustive sum is possible using dynamic programming (forward and backward algorithms) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

31 Summing over paths Integrating over all paths. Baum Welsch algorithm question Given x = (x t ) t=1..t, compute p(x M) = h p(x, h M) sum is over P N possible paths again, exhaustive sum is possible using dynamic programming Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

32 Summing over paths Forward algorithm M V K R L 1 f 1 (t) definition f k (t) = p(x 1,..., x t, h t = k) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

33 Summing over paths Forward algorithm M V K R L recursion f 1 (t) f 2 (t) f 3 (t) f 1 (t + 1)... f l (t + 1) = e l (x t+1 ) k f k (t)q kl Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

34 Summing over paths Forward algorithm M V K R L 1 f 1 (t) f 1 (t + 1) 2 f 2 (t) f 2 (t + 1)... 3 f 3 (t) f 3 (t + 1) recursion f l (t + 1) = e l (x t+1 ) k f k (t)q kl Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

35 Summing over paths Forward algorithm initialization f 0 (0) = 1, f k (0) = 0 for k > 0. recursion t = 0... T 1 f l (t + 1) = e l (x t+1 ) k f k(t)q kl termination p(x M) = k f k(t ) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

36 Summing over paths Backward algorithm D F Y M A 1 b 1 (t) definition b k (t) = p(x t+1,..., x T h t = k) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

37 Summing over paths Backward algorithm D F Y M A recursion b 1 (t 1) b 1 (t) b 2 (t) b 3 (t) b k (t 1) = l q kl e l (x t )b l (t) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

38 Summing over paths Backward algorithm D F Y M A recursion b 1 (t 1) b 2 (t 1) b 3 (t 1) b 1 (t) b 2 (t) b 3 (t) b k (t 1) = l q kl e l (x t )b l (t) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

39 Summing over paths Backward algorithm initialization b k (L) = q k0. recursion t = T... 2 b k (t 1) = l e l(x t )b l (t)q kl termination p(x M) = l q 0le l (x 1 )b l (1) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

40 Summing over paths Marginal posterior probabilities question Given x = (x t ) t=1..t, compute the pointwise marginal probability: p(h t = k x, M) = 1 p(x M) h h t =k p(x, h M) h h t =k p(x, h) = p(x 1,..., x t, h t = k) p(x t+1,..., x T h t = k) = f t (k) b t (k) p(h t = k x) = f t(k) b t (k) p(x) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

41 Summing over paths Marginal posterior probabilities question Given x = (x t ) t=1..t, compute the pointwise marginal probability: Posterior decoding p(h t = k x, M) = 1 p(x M) h h t =k for each t = 1..T, find h t that maximizes p(h t x) different from Viterbi path does not even necessarily represent a valid path more stable than Viterbi p(x, h M) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

42 Sampling paths Sampling paths question Given x = (x t ) t=1..t, sample a path h = (h t ) t=1..t probabilistically from h p(h x, M) viterbi path is the most likely outcome (by definition) but suboptimal paths may matter in particular, if total probability is spread over many paths sampling and averaging gives an idea of uncertainty about path Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

43 Sampling paths Sampling paths question Given x = (x t ) t=1..t, sample a path h = (h t ) t=1..t probabilistically from h p(h x, M) could in principle be done by Gibbs sampling but a more efficient dynamic programming method exists Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

44 Sampling paths D F Y M A b 1 (1) b 2 (1) b 3 (1) Bayes theorem p(h 1 = k) = q 0k p(x 1... x T h 1 = k) = b k (1) p 1k = p(h 1 = k x 1... x T ) q 0k b k (1)e k (x 1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

45 Sampling paths D F Y M A b 1 (1) b 2 (1) b 3 (1) Compute (p 1k ) k=1..k, and sample h 1 at random: h 1 p 1k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

46 Sampling paths D F Y M A 1 b 1 (1) 2 3 Compute (p 1k ) k=1..k, and sample h 1 at random: h 1 p 1k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

47 Sampling paths D F Y M A b 1 (t + 1) b 2 (t + 1) b 3 (t + 1) Suppose we have sampled h 1... h t p(h t+1 = l h t = k) = q kl p(x t+1... x T h t+1 = l) = b l (t + 1)e l (x t+1 ) p t+1 l = p(h t+1 = l x t+1... x T, h 1... h t ) q kl b l (t + 1)e l (x t+1 ) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

48 Sampling paths D F Y M A b 1 (t + 1) b 2 (t + 1) b 3 (t + 1) Compute (p t+1 k ) k=1..k, and sample h t+1 at random:... h t+1 p t+1 k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

49 Sampling paths D F Y M A 1 b 1 (t + 1) 2 3 Compute (p t+1 k ) k=1..k, and sample h t+1 at random:... h t+1 p t+1 k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

50 Sampling paths D F Y M A until the entire path has been sampled Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

51 Learning HMM parameters and structure Learning HMM parameters question Given a training database of (x, h) annotated pairs learn the parameters of the HMM. just compute N kl total number of transitions from k to l in database N k M kp M k total number of transitions from k: N k = l N kl total number of emissions of symbol p when in state k total number of emissions in state k: M k = p M kp ML estimate q kl = N kl N k, e k (s p ) = M kp M k Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

52 Learning HMM parameters and structure Learning HMM parameters question Given a training database of (x, h) annotated pairs learn the parameters of the HMM. just compute N kl total number of transitions from k to l in database N k M kp M k total number of transitions from k: N k = l N kl total number of emissions of symbol p when in state k total number of emissions in state k: M k = p M kp Bayes estimate (add pseudocounts) q kl = N kl+n l N k + n, e k(s p ) = M kp + m p M k + m Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

53 Learning HMM parameters and structure Unsupervised training Viterbi training (approximate) start from rough parameter estimates use Viterbi algorithm to infer paths compute N kl and M kp on viterbi paths estimate parameters based on N kl and M kp iterate Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

54 Learning HMM parameters and structure Unsupervised training Viterbi training (approximate) start from rough parameter estimates use Viterbi algorithm to infer paths compute N kl and M kp on viterbi paths estimate parameters based on N kl and M kp iterate Baum Welsch training (exact: EM) start from rough parameter estimates use Baum Welsch algorithm to compute expectations of N kl and M kp over all paths estimate parameters based on those expectations N kl and M kp iterate Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

55 Examples Predicting secondary structure Martin et al 2006 Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

56 Examples Modelling protein architecture Handcrafted HMMs. Stultz et al Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

57 Examples Automatic annotation of gene structure Burge and Karlin 1997 Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

58 Examples Automatic annotation of gene structure Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

59 Hidden Markov models and sequence alignment Using HMMs for aligning sequences rn to article PREVIOUS NEXT View Larger Image Download original TIFF Download PowerPoint Friendly Image Figure 5. The Java GUI allows users to visualize the estimated alignment accuracy under FSA's statistical model. FSA's alignment is colored according the expected accuracy under FSA's statistical model (top) as well as according to the true accuracy (bottom) given from a comparison between FSA's alignment and the reference structural alignment. It is Nicolas Lartillot clear (Universite from inspection de Montréal) that accuracies estimated BIN6009 under FSA's statistical model may / 60

60 Hidden Markov models and sequence alignment Pairwise alignment Pair HMM for pairwise alignment 1 2δ sequence X sequence Y hidden sequence X ɛ δ 1 ɛ M δ 1 ɛ Y ɛ HEAG--AW P-AGGHA-... MXMMYYMX gap opening with probability δ gap extension with probability ɛ Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

61 Hidden Markov models and sequence alignment Pairwise alignment Pair-HMM. pairwise alignment X B M E Y sequence X sequence Y hidden sequence HEAG--AW P-AGGHA-... MXMMYYMX Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

62 Assessing homology between 2 sequences model for homologous sequences (M 1 ) X B M E Y HEAG--AW P-AGGHAmodel for non homologous sequences (M 0 ) B X Y E HEAGAW PAGGHA L = p(s 1, s 2 M 1 ) p(s 1, s 2 M 0 ) L > 1: homologous L < 1: not homologous (but needs calibration).

rn to article Hidden Markov models and sequence alignment Pairwise alignment Posterior decoding (FSA) PREVIOUS NEXT View Larger Image Download original TIFF Download PowerPoint Friendly Image Figure

63 rn to article Hidden Markov models and sequence alignment Pairwise alignment Posterior decoding (FSA) PREVIOUS NEXT View Larger Image Download original TIFF Download PowerPoint Friendly Image Figure 5. The Java GUI allows users to visualize the estimated alignment accuracy under FSA's statistical model. for each pair of positions, compute posterior probability of being aligned (i.e. jointly emitted by a match state) FSA's alignment is colored according the expected accuracy under FSA's statistical model (top) as well as according to the true accuracy (bottom) given from a comparison between FSA's alignment and the reference structural alignment. It is clear from inspection that accuracies estimated under FSA's statistical model correspond closely to the true accuracies. Sequences are from alignment BBS12030 in the RV12 dataset of BAliBASE 3 [24]. put in same colums positions that have high match probability Bradley et al PLOS Compute Biol Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

64 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Multiple alignment multiple HMM generalization of pair HMM with P > 2 sequences is possible however, number of hidden states exponential in P pair HMM merging pairwise alignments obtained using a pair HMM as in previous slide (posterior decoding) profile HMM Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!

sequences conserved blocks, alternating with

structured / loop regions of the conformation

65 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs!!!!!!!""""""""""""!!!!!!!!!!!!"""""""""""""""!!!!!!!!!####!!"" gaps are not uniformly distributed along sequences conserved blocks, alternating with regions of more variable length corresponds to structured / loop regions of the conformation Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

66 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

67 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM trained on seed alignment: FQDN-R FK-N-R FK-E-R FK-EVR Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

68 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM trained on seed alignment: FQDN-R FK-N-R FK-E-R FK-EVR applied on new sequence AEFWQ-RAI... 0ii1i2d4ii5 Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

69 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM overall procedure for each protein family make a seed alignment (based on superposition of 3D structures) train a profile HMM on seed alignment homologues should have higher p(s M): using Baum Welsch, search for homologues in databases. using Viterbi (or posterior decoding), align homologues with seed alignment Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

70 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs Profile HMM seed alignments are often small using mixture of priors to model match state emission probabilities (see TP) improves detection of distant homologues (Brown et al 1993) Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

71 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs exemple: globin family Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

72 Hidden Markov models and sequence alignment Multiple alignment: profile HMMs exemple: globin family Krogh et al 1994!"#$%&'(&)*+&,--.& Nicolas Lartillot (Universite de Montréal) BIN6009 may / 60

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models