Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain

Example: CpG Islands Wherever there is a CG (CpG) in the genome, C use to change chemically by methylation, which is then likely to mutate into a T. Thus CpG dinucleotides are less frequent than expected (less than 1/16) This methylation is sometimes suppressed in some regions (e.g. promoter regions of genes), hence they contain a high content of CpG. These regions are called CpG islands CpG islands are of variable length, between hundreds to thousands of base pairs. We would like to answer the following questions: given a DNA sequence, Is it part of a CpG island? Given a large DNA region, can we find CpG islands in it?

Example: CpG Islands + a st a st Transition probability between two adjacent positions in CpG islands Transition probability between two adjacent pos. outside CpG islands Given a sequence S the log-likelihood ratio is: C + G + P(G C) P(S +) σ = log P(S ) = N i=1 log a + i 1,i a i 1,i The larger the value of sigma, the more likely is to be a CpG island

Example: CpG Islands Given a large stretch of DNA of length L, we extract windows of l nucleotides: S (k ) = (s k+1,...,s k+ l ) 1 k L l l << L For each window we calculate σ(s (k ) ) P(S +) σ = log P(S ) = N i=1 log a + i 1,i a i 1,i Windows with σ(s (k ) ) > 0 (or above a set threshold) are possible CpG islands Limitation: the Markov chain model cannot determine the lengths of the CpG islands

Example: CpG Islands A + T + A - T - C + G + C - G - Can we build a single model for the entire sequence that incorporates both Markov chains?

Example: CpG Islands Include the probability to switch from one model to the other A + T + A - T - C + G + C - G - For every symbol, in addition to the transitions to symbols of the same type (state), we also consider the transitions to symbols of the other type (state). But this approach will require the estimation of many probabilities

Example: CpG Islands Simplify by defining the probability to switch from one model to the other A + T + A - T - C + G + C - G - We summarize the transitions between + and symbols into single transition probabilities The change between states + and - determines the limits of the CpG island

Example: CpG Islands We can extend the model with all possible transition and emission probabilities: P(+ +) = a ++ P(+ begin) = a +0 begin P( begin) = a 0 P( ) = a Emission of first symbol CpG (+) P( +) = a + Non CpG (-) P(a +) = e + (a) P(a ) = e (a) P(ab +) = e + (ab) P(+ ) = a + We have transition probabilities between states: P(ab ) = e (ab) a ++,a,a +,a + Emission of a pair of symbols in any position beyond the first one And emission probabilities in each state: e + (a),e + (ab),e (a),e (ab)

Example: CpG Islands In fact, we can redraw the CpG island model in the following way Each state is a Markov chain model We have transitions between states and each state can generate symbols

Example: The coin tossing problem A simpler version of the CpG island problem Game: to flip coins, which results in two possible outcomes: Head or Tail Two coins: Fair coin (F): Head (H) or Tail (T) with same probability = 1/2 P(H F) = e F (H ) = 0.5 P(T F) = e F (T ) = 0.5 Loaded coin (L): Head (H) with probability 3/4 and tail (T) with probability 1/4 P(H L) = e L (H ) = 0.75 P(T L) = e L (T ) = 0.25

Example: The coin tossing problem HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT Which coin produced the observed symbol? Fair coin (F): Head (H) or Tail (T) with same probability = 1/2 P(H F) = e F (H ) = 0.5 P(T F) = e F (T ) = 0.5 Loaded coin (L): Head (H) with probability 3/4 and tail (T) with probability 1/4 P(H L) = e L (H ) = 0.75 P(T L) = e L (T ) = 0.25

Example: The coin tossing problem The crooked dealer changes between Fair and loaded coins 10% of the time (90% of the time uses the same coin) We can define some sort of probability transition beween coin states : P(F F) = a FF = 0.9 P(L L) = a LL = 0.9 P(L F) = a FL = 0.1 P(F L) = a LF = 0.1

Example: The coin tossing problem We can represent this model graphically F T a FF = 0.9 e F (H) = 0.5 e F (T) = 0.5 a FL = 0.1 a LF = 0.1 e L (H) = 0.75 e L (T) = 0.25 a LL = 0.9 We have two states: fair and loaded Each state has emission probabilities (head or tail) There are transition probabilities between states (changing the coin)

HMM definition A Hidden Markov Model (HMM) is a triplet: M = (Σ,Q,Θ) Σ Q Θ Is an alphabet of symbols Is a set of states that can emit symbols from the alphabet Is a set of probabilities a kl Transition probabilities between states e k (b) Emission probabilities k Q b Σ k,l Q

HMM example Probability of transition between states π i and π i+1 no emission at the begin state Probability that s i was emitted from state π i a 13 Probability of transition from state 1 to state 3 e 2 (A) Probability of emitting character A in state 2 begin/end states are silent states on the path: do not emit symbols

HMM definition A path in the model M is a sequence of states Π = (π 1,...,π L ) We have added two states initial and final π 0 π N +1 Given a sequence S = s 1 s 2 s 3...s N of symbols The transition probabilities between states (k and l), and the emission probabilities (of character b at state k) are written as: P(S Π) = P(s 1...s N π 1...π N ) a kl = P(π i = l π i 1 = k) = P(π i π i 1 ) e k (b) = P(s i = b π i = k) = P(s i π i ) The probability that sequence S is generated by model M given the path = P(π 1 π 0 )P(s 1 π 1 ) P(π 2 π 1 )P(s 2 π 2 )... P(π N π N 1 )P(s N π N ) P(π N+1 π N ) = a π0,π 1 e π1 (s 1 ) a π1,π 2 e π2 (s 2 )... a π N 1,π N e π N (s N ) = a π0,π 1 a πi,π i+1 e πi (s i ) N i=1

Why Hidden? In Markov chains, each state accounts for each part of the sequence In HMMs we distinguish between the observable parts of the problem (emitted characters) and the hidden parts (states): In a Hidden Markov model, there are multiple states that could account for the same part of the sequence: that actual state is hidden. That is, there is no one-to-one correspondence between states and observations.

HMM example Π=0113 S=AAC There is a probability associated to this sequence, given by the transition and L emission probabilities: P(S Π) = a π 0, π 1 e π i (s i ) a π i, π i+1 i =1 P(AAC Π) = a 01 e 1 (A) a 11 e 1 (A) a 13 e 3 (C) a 35 = 0.5 0.4 0.2 0.4 0.8 0.3 0.6

Why Hidden? E.g. The toss of a coin is an observable, the coin being used is the state, which we cannot observe. HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT Which coin produced the observed symbol? Fair coin (F): P(H F) = e F (H ) = 0.5 P(T F) = e F (T ) = 0.5 Loaded coin (L): P(H L) = e L (H ) = 0.75 P(T L) = e L (T ) = 0.25 E.g. given a biological sequence, interesting features are hidden (genes, splice-sites, etc ) We will have to calculate the probability of being in a particular state

HMMs are memory-less At each time (or position) step, the only thing that affects future (or downstream) states and emissions is the current state P(π i = l "what ever happened so far") = P(π i = l π 1,π 2,...,π i 1,s 1,s 2,...,s i 1 ) = P(π i = l π i 1 = k) = a kl At each time/position, the emission of a character only depends on the current state: P(s i = b "what ever happened so far") = P(s i = b π 1,π 2,...,π i,s 1,s 2,...,s i 1 ) = P(s i = b π i = k) = e k (b)

HMMs can be seen as generative models We can sample paths of states (e.g. the orange path). This orange path in the HMM generates a sequence: Π=0113 S=AAC Generate paths using the transition probabilities Emit symbols using the emission probabilities

HMM questions However. Π=0113 S=AAC This is not necessarily the most probable sequence of states This is also not the total probability of the sequence of symbols, since other combination of states can give the same sequence of symbols.

The LEARNING problem HMM questions Goal: find the transition and emission probabilities of the model: learn the model The EVALUATION problem What is the probability that this sequence has been generated by the crooked dealer? The DECODING problem We must find the underlying sequence of states that led to the observed sequence of results: we must label the results by its state The POSTERIOR DECODING problem Goal: find the probability that the dealer was using a biased coin at a particular time

The LEARNING problem HMM questions (algorithms) What are the parameters of the model? Maximum likelihood estimation, Baum-Welch, EM The EVALUATION problem How likely is a given sequence The Forward algorithm The DECODING problem What is the most probable path for generating a given sequence? The Viterbi algorithm The POSTERIOR DECODING problem Which one is the most likely state in a given position? The Forward-Backward algorithm

The LEARNING problem HMM questions (algorithms) What are the parameters of the model? Maximum likelihood estimation, Baum-Welch, EM The EVALUATION problem How likely is a given sequence The Forward algorithm The DECODING problem What is the most probable path for generating a given sequence? The Viterbi algorithm The POSTERIOR DECODING Which one is the most likely state in a given position? The Forward-Backward algorithm

HMM questions The LEARNING problem Given a series of labelled sequences HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT loaded fair loaded fair Goal: find the transition and emission probabilities of the model: learn the model F L a FF = 0.9 e F (H) = 0.5 e F (T) = 0.5 a FL = 0.1 e L (H) = 0.75 e L (T) = 0.25 a LL = 0.9 a LF = 0.1

HMM Parameter estimation The simplest way to obtain the parameters of the model is from a set of labelled observations. E.g. a number of sequences for which we know the actual sequence of states. A kl E k (b) Count the number of transitions between states k and l Count the number of times the symbol b is emitted by state k We can estimate the probabilities as follows: a kl = A kl A, e k(b) = E k(b) kl' l' b' E k (b')

HMM Parameter estimation This estimation is called Maximum likelihood, since the probability of all sequences given the parameters is maximal: P(s 1...s N θ ) = P(s j θ ) N j =1 Is maximal for parameters estimated from frequencies and assuming that the sequences are independent To avoid overfitting, use pseudocounts: A kl A kl + r kl E k (b) E k (b)+ r k (b) Pseudocounts reflect our prior knowledge

The LEARNING problem HMM questions What are the parameters of the model? Maximum likelihood estimation, Baum-Welch, EM The EVALUATION problem How likely is a given sequence The Forward algorithm The DECODING problem What is the most probable path for generating a given sequence? The Viterbi algorithm The POSTERIOR DECODING Which one is the most likely state in a given position? The Forward-Backward algorithm

The DECODING problem Given a sequence of heads (H) and tails (T) HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT Is it possible to reproduce the coin types the dealer has used? Goal: to determine which coin produced each outcome We must find the underlying sequence of states that led to the observed sequence of results: we must label the results by its state. For instance: Observations States HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT LLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFLLLLLLLLFFFFFFFFF loaded fair loaded fair The label assignment has associated a probability

The DECODING problem Finding the most probable path: Viterbi algorithm Π We want to find the optimal path that maximizes the probability of the observed sequence P(S Π ) is maximal We can express this as follows: Π = arg max Π { P(S Π) } We solve this problem using Dynamic programming (maximization over all possible paths)

The DECODING problem Finding the most probable path: Viterbi algorithm We define a dynamic programming table of size Π x L, where Π is the number of states and L the length of the sequence Each element v k (i), k=1,, Π, i=1,,l represents the probability of the most probable path for the prefix s 1 s 2 s i (up to character s i ) and ending in state k v k (i) = max{ π 1,...,π i 1 } P(s...s,π...π,s,π = k) 1 i 1 1 i 1 i i We want to calculate v end (L) the probability of the most probable path accounting for all the sequence and ending in the end state

The DECODING problem Finding the most probable path: Viterbi algorithm We can define this probability recursively v k (i +1) = max{ π 1,...,π i } P(s...s,π,...,π,s,π = k) 1 i 1 i i+1 i+1 = max{ π 1,...,π i } P(s i+1,π i+1 = k s 1...s i,π 1...π i )P(s 1...s i,π 1...π i ) [ ] = max{ π 1,...,π i } P(s,π = k π )P(s...s,π...π,s,π ) i+1 i+1 i 1 i 1 1 i 1 i i [ ] = max l P(s i+1,π i+1 = k π i = l)max{ π 1,...,π i 1 } P(s...s,π...π,s,π = l) 1 i 1 1 i 1 i i [ ] [ ] = max l P(s i+1 π i+1 = k)p(π i+1 = k π i = l)v l (i) = e k (s i+1 )max l v l (i)a lk

The DECODING problem Finding the most probable path: Viterbi algorithm Conditional probability We can define this probability recursively v k (i +1) = max{ π 1,...,π i } P(s...s,π,...,π,s,π = k) 1 i 1 i i+1 i+1 = max{ π 1,...,π i } P(s i+1,π i+1 = k s 1...s i,π 1...π i )P(s 1...s i,π 1...π i ) [ ] = max{ π 1,...,π i } P(s,π = k π )P(s...s,π...π,s,π ) i+1 i+1 i 1 i 1 1 i 1 i i [ ] = max l P(s i+1,π i+1 = k π i = l)max{ π 1,...,π i 1 } P(s...s,π...π,s,π = l) 1 i 1 1 i 1 i i [ ] [ ] = max l P(s i+1 π i+1 = k)p(π i+1 = k π i = l)v l (i) = e k (s i+1 )max l v l (i)a lk

The DECODING problem Finding the most probable path: Viterbi algorithm Conditional probability We can define this probability recursively Markov property v k (i +1) = max{ π 1,...,π i } P(s...s,π,...,π,s,π = k) 1 i 1 i i+1 i+1 = max{ π 1,...,π i } P(s i+1,π i+1 = k s 1...s i,π 1...π i )P(s 1...s i,π 1...π i ) [ ] = max{ π 1,...,π i } P(s,π = k π )P(s...s,π...π,s,π ) i+1 i+1 i 1 i 1 1 i 1 i i [ ] = max l P(s i+1,π i+1 = k π i = l)max{ π 1,...,π i 1 } P(s...s,π...π,s,π = l) 1 i 1 1 i 1 i i [ ] [ ] = max l P(s i+1 π i+1 = k)p(π i+1 = k π i = l)v l (i) = e k (s i+1 )max l v l (i)a lk

The DECODING problem Finding the most probable path: Viterbi algorithm Conditional probability We can define this probability recursively Markov property v k (i +1) = max{ π 1,...,π i } P(s...s,π,...,π,s,π = k) 1 i 1 i i+1 i+1 Optimal substructure property = max{ π 1,...,π i } P(s i+1,π i+1 = k s 1...s i,π 1...π i )P(s 1...s i,π 1...π i ) [ ] = max{ π 1,...,π i } P(s,π = k π )P(s...s,π...π,s,π ) i+1 i+1 i 1 i 1 1 i 1 i i [ ] = max l P(s i+1,π i+1 = k π i = l)max{ π 1,...,π i 1 } P(s...s,π...π,s,π = l) 1 i 1 1 i 1 i i [ ] [ ] = max l P(s i+1 π i+1 = k)p(π i+1 = k π i = l)v l (i) = e k (s i+1 )max l v l (i)a lk

The DECODING problem Finding the most probable path: Viterbi algorithm Conditional probability We can define this probability recursively Markov property v k (i +1) = max{ π 1,...,π i } P(s...s,π,...,π,s,π = k) 1 i 1 i i+1 i+1 Optimal substructure property = max{ π 1,...,π i } P(s i+1,π i+1 = k s 1...s i,π 1...π i )P(s 1...s i,π 1...π i ) [ ] = max{ π 1,...,π i } P(s,π = k π )P(s...s,π...π,s,π ) i+1 i+1 i 1 i 1 1 i 1 i i [ ] = max l P(s i+1,π i+1 = k π i = l)max{ π 1,...,π i 1 } P(s...s,π...π,s,π = l) 1 i 1 1 i 1 i i [ ] [ ] = max l P(s i+1 π i+1 = k)p(π i+1 = k π i = l)v l (i) = e k (s i+1 )max l v l (i)a lk Definition of v k (i) Definition of emission and transition probabilities

The DECODING problem Finding the most probable path: Viterbi algorithm 1.- Initialization v begin (0) = 1 v k (0) = 0, k begin 2.- Recursion: for positions i = 0,...,L 1 and states l Q v l (i +1) = e l (s i+1 ) max k Q { } v k (i) a kl 3.- Termination P(S Π * ) = max{ v k (L) a k, end } k Q

The DECODING problem Finding the most probable path: Viterbi algorithm (keep the pointers for back-tracking of states) 1.- Initialization v begin (0) = 1 v k (0) = 0, k begin 2.- Recursion: for positions i = 0,...,L 1 and states l Q v l (i +1) = e l (s i+1 ) max ptr i+1 (l) = arg max k Q k Q { } v k (i) a kl { } v k (i) a kl Keep track of most probable path 3.- Termination P(S Π * ) = max{ v k (L) a k, end } k Q π * L = arg max{ v k (L) a k, end } k Q Find optimal path following pointers back, starting at π L using the recursion: π * i = ptr(π * i+1 )

The DECODING problem Finding the most probable path: Viterbi algorithm Use logarithms to avoid underflow problems: 1.- Initialization v begin (0) = 1 v begin (0) = 0 v k (0) = 0, k begin v k (0) =, k begin 2.- Recursion: for positions i = 0,...,L 1 and states l Q v l (i +1) = e l (s i+1 ) max ptr i+1 (l) = argmax k Q k Q { } v k (i) a kl { } v k (i) a kl ( ) + max{ } v l (i +1) = log e l (s i+1 ) v k(i)+ log(a kl ) k Q ptr i+1 (l) = arg max{ v k (i)+ log(a kl )} k Q 3.- Termination P(S Π * ) = max v k (L) a k, end k Q { } π * L = arg max{ v k (L) a k, end } k Q PL(S,Π * ) = max{ v k(l)+ log(a k, end )} k Q π * L = arg max{ v k (L)+ log(a k, end )} k Q Find optimal path using the same recursion π * i = ptr(π * i+1 )

Exercise

Viterbi algorithm - Example From a DNA sequence, we want to predict whether a region is likely to be exonic or intronic We do this by labelling the observed bases in terms of a number of states (E: exon, I: intron), e.g.: AGGGCTAGGGTGTCTGGTGCAACTCAGCAGCTCTGTACCGTCCCTAGCCTAAGTATGTTCG EEEEEEEEEEEIIIIIIIIIIIIIIIIIIIEEEEEEEEEIIIIIIIIIIIIIIEEEEEEEE Provided a series of transition probabilities between states and emission probabilities for symbols (nucleotides), we want to determine the most probable labeling of an input sequence (the most probable path/parse)

Viterbi algorithm - Example Simple HMM for 5 splice-site recognition (Eddy 2004) This model describes sequences where the initial part is exonic (homogeneous composition), then there is a nucleotide that will form the splice-site (mostly G), and the rest of the sequence belongs to the intron (rich in A and T). ACATCGTAGTGA CGACTGTAATGT GCGTAGAAGTAT TACGGGTAAATA The parameters are in general derived from a training (labelled) set The problem consists in finding the 5 splice-site (donor) in an unknown DNA sequence

Viterbi algorithm - Example Given the path Π = E E D I I (E: exon, I: intron, D: donor) The probability of observing the sequence: S= C A G T A P(S Π ) = a_be x e_e(c) x a_ee x e_e(a) x a_ed x e_d(g) x a_di x e_i(t) x a_ii x e_i(a) = 1.0 x 0.25 x 0.9 x 0.25 x 0.1 x 0.95 x 1.0 x 0.4 x 0.9 x 0.4 = 0.0007695 For this other path of P(S Π ) E D I I I C A G T A = a_be x e_e(c) x a_ed x e_d(a) x a_di x e_i(g) x a_ii x e_i(t) x a_ii x e_i(a) = 1.0 x 0.25 x 0.1 x 0.05 x 1.0 x 0.1 x 0.9 x 0.4 x 0.9 x 0.4 = 0.0001620 It is smaller

Viterbi algorithm - Example The same model but using logarithms The logs of the probabilities of the different sequence of states for the same DNA sequence (sequence of states)

Viterbi algorithm - Exercise Fill in the Viterbi dynamic programming matrix and obtain the most probable path Sequence States

Viterbi algorithm - Exercise Fill in the Viterbi dynamic programming matrix and obtain the most probable path

Viterbi algorithm - Exercise For instance, using the log-form of the algorithm: v 3 (8) = v I (T) = e I (8) + max { v k(7) + a ki } k {D,I } { } = e I (8) + max v I (7) + a II ;v D (7) + a DI = 0.92 + max{ 11.57 0.11; 11.24 + 0} = 0.92 11.24 = 12.16 The maximum is given by the transition from D to I: ptr 8 (3) = ptr 8 (I) = D only need to use the possible transitions. Not allowed transitions can be given a 0 probability (- in log-form)

Exercise: fill in the Viterbi matrix and recover the optimal path of states using the 5 ss model in log scale and the following sequence: Viterbi matrix: x - A C C C G A G T A A begin Exon donor intron end Optimal path of states: i 0 1 2 3 4 5 6 7 8 9 10 Π*

Exercise. (exam 2014-2015). Consider the hidden Markov model of the coin-tossing problem: S" P(F F) = 3/4! P(F S)! F! L! P(L F) = 1/4! P(h F) = 1/2! P(t F) = 1/2! P(F L) = 1/4! P(L S)! P(h L) = 3/4! P(t L) = 1/4! P(L L) = 3/4! where S is the start (silent) state, and F and L are the fair and loaded coin states, respectively. Find a value (or condition) for P(F S) such that the most probable path for two heads in a row, hh, is given by the Fair coin, i.e. FF. Can you give an interpretation to this result?

Exercise. (exam 2014-2015). Note: Use the Viterbi algorithm. Recall that the Viterbi algorithm defines a dynamic programming matrix v k (i), which represents the probability of the most probable path for the sequence prefix s 1 s 2 s i ending at state k, for k=1,, Π, i=1,,l, where Π is the number of states in the model and L the length of the sequence. Also, recall that the Viterbi matrix is filled with the following algorithm: Initialization: v start (0) =1 v k (0) = 0, k start Recursion: Finalization: v l (i +1) = e l (s i+1 ) max ptr i+1 (l) = argmax P(S Π * ) = π L * = argmax k {1,..., Π } k {1,..., Π } max k {1,..., Π } k {1,..., Π } { } v k (i) a kl { } v k (i) a kl { } v k (L) a k,end { } v k (L) a k,end Where e l (s i ) is the emission probability of the symbol s i in the state π l and a kl is the transition probability between states π k and π l. The optimal path Π * is recovered using the recursion:

HMM questions The EVALUATION problem Given a sequence of heads (H) and tails (T) HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT Assuming that we have modelled the coins correctly, what is the probability that this sequence has been generated by the crooked dealer? HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT e.g. P(S)= 1.3x10-35

How likely is a sequence? But that is the probability of the sequence, given a specific path We want the probability of the sequence over all paths, i.e.: P(S) = P(s 1...s L ) = P( s 1...s L,π 0...π N ) π But the number of possible paths can grow exponentially: there are N L possible paths for the case of N states and a sequence of length L π The Forward algorithm allows to calculate this probability efficiently This algorithm will allow us to calculate also the probability of being in a specific state k in a position i of the sequence.

The Forward algorithm We define the probability of generating the first i letters of S ending up in state k (forward probability): f k (i) = P(s 1...s i,π i = k) f k (i) = P(s 1...s i,π i = k) = P(s 1...s i 1,π 1...π i 1,π i = k)e k (s i ) π 1...π i 1 = P(s 1...s i 1,π 1...π i 2,π i 1 = l)a lk e k (s i ) l π 1...π i 1 = P(s 1...s i 1,π i 1 = l)a lk e k (s i ) l = e k (s i ) f l (i 1)a lk l We obtain a recursion formula for this probability: we can apply a dynamic programming approach

The Forward algorithm 1.- Initialization f begin (0) = 1 f k (0) = 0, k begin Probability that we are in the begin state and have observed zero characters from the sequence 2.- Recursion: for positions i = 0,...,L 1 and states l Q f l (i +1) = e l (s i+1 ) k Q f k (i) a kl 3.- Termination P(S) = k Q f k (L) a k, end Probability that we are in the end state and have observed the entire sequence Forward algorithm is very similar to Viterbi. The only difference is that here we do a sum instead of a maximisation

The Forward algorithm: Example S = TAGA

The Forward algorithm: Example

The Forward algorithm In some cases the algorithm can be more efficient by taking into account the minimum number of steps that must be taken to reach a state Eg. For this model, we do not need to initialize or calculate the values: f 3 (0), f 4 (0), f 5 (0), f 5 (1)

HMM questions The POSTERIOR DECODING Given a sequence of heads and tails HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT Is it possible to catch him cheating? Goal: find the probability that the dealer was using a biased coin at a particular time HTTHHHHHHTTHTHHTHTTTHTHTHHTTHHTHHHHHHTTTTHTHHTHT P(Loaded)=80%

The posterior probability of a state (Backward algorithm) We want to calculate the most probable state given an observation s i For instance, we want to catch the coin dealer cheating For this, we calculate the probability distribution of the i th position: the probability that a given position in the sequence has an underlying state: P(π i = k S) We can compute the posterior probability in the following way: P(π i = k S) = P(S,π i = k) P(S)

The posterior probability of a state (Backward algorithm) We can compute the posterior probability in the following way: P(π i = k S) = P(S,π i = k) P(S) Consider the following join probability probability of generating the first i letters of S ending up in state k P(S,π i = k) = P(s 1...s i,π i = k)p(s i+1...s L s 1...s i,π i = k) = P(s 1...s i,π i = k)p(s i+1...s L π i = k) Forward f k (i) Backward b k (i) Probability of generating the rest of the sequence (s i+1 s L ), starting from a given state (k) and ending at the end state

The posterior probability of a state (Backward algorithm) P(π i = k S) = P(S,π i = k) P(S) Given by Forward and Backward algorithms b k (i) = P(s i+1...s L π i = k) = P(s i+1 s i+2...s L,π i+1...π N π i = k) π i+1...π N Given by the Forward algorithm = P(s i+1 s i+2...s L,π i+1 = l,π i+2...π N π i = k) l π i+1...π N = e l (s i+1 )a kl P(s i+2...s L,π i+2...π N π i+1 = k) l π i+1...π N = e l (s i+1 )a kl b l (i +1) l We can again apply dynamic programming to calculate the backward probabilities

The posterior probability of a state (Backward algorithm) 1.- Initialization i = L b k (L) = a k,end, k Q 2.- Recursion: for positions i = L 1,...,1 and states l Q b k (i) = a kl e l (s i+1 ) b l (i +1) l Q 3.- Termination P(S) = a begin, l e l (s 1 ) b l (1) l Q Not usually necessary, as we normally use Forward to derive P(S)

Posterior decoding We can ask at each position, what is the most likely state that gave that character in the sequence. (I.e. the probability that a symbol is produced by a given state, given the full sequence) E.g.: we want to know the probability for each G to be emitted by the state D (donor site): Viterbi Forward-Backward

Posterior decoding If P(fair) > 0.5, the roll is more likely to be generated by a fair die than a loaded die

References What is a hidden Markov model? Sean R. Eddy (2004) Nature Biotechnology 22(10)1315 Biological Sequence Analysis: Probabilis5c Models of Proteins and Nucleic Acids Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Cambridge University Press, 1999 Problems and Solu5ons in Biological Sequence Analysis Mark Borodovsky, Svetlana Ekisheva Cambridge University Press, 2006