Genome 373: Hidden Markov Models II Doug Fowler
Review From Hidden Markov Models I What does a Markov model describe?
Review From Hidden Markov Models I A T A Markov model describes a random process of transitions from one state to another in a state space.
Review From Hidden Markov Models I A T A Markov model describes a random process of transitions from one state to another in a state space. What does a Markov model produce, if used generatively?
Review From Hidden Markov Models I A T Sequence: AAAATTTT A Markov model describes a random process of transitions from one state to another in a state space. A sequence of states/symbols And what governs the sequence?
Review From Hidden Markov Models I A T Sequence: AAAATTTT A Markov model describes a random process of transitions from one state to another in a state space. A sequence of states/symbols And what governs the sequence? The transition probabilities
Review From Hidden Markov Models I Sequence: AAAATTTT We learned that a Markov model describes a random process of transitions from one state to another in a state space. What is hidden in a hidden Markov model and how does this relate to emission probabilities?
Review From Hidden Markov Models I A T rich rich A: 0.8 A: 0.2 T: 0.2 T: 0.8 Sequence: AAAATTTT State path:??????? We learned that a Markov model describes a random process of transitions from one state to another in a state space. In a HMM, states are unknown to us and associated with a set of emission probabilities so that many different state paths can generate a given sequence
Review From Hidden Markov Models I This image cannot currently be displayed. AAAATTTT Sequence: State path #1: aaaat t t State path #2: t t t t aaaa L Y P (x, ) =a 0 1 i=1 e i (x i )a i i+1 Finally, recall that we can calculate the probability of any particular (hidden) state path giving rise to a sequence! P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1
Review From Hidden Markov Models I A T rich rich A: 0.8 A: 0.2 T: 0.2 T: 0.8 AAAATTTT Sequence: State path #1: aaaat t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 This is the crux of a HMM and illustrates how we can use HMMs to calculate the probability of a particular state path if we have a model and emission/transition probabilities P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1
Outline The Viterbi Algorithm (or, how can we find the most probable state path?) The Forward-Backward Algorithm (or, how can we find the probability of a state at a particular time) What is an algorithm, anyhow? A procedure for solving a problem
Recalling Our Motivation Given a sequence, we want to be able to predict the major features of genes in the sequence (e.g. create gene models) Start TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTAGACTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATGA GAGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT Exon 1 Intron 1 Exon 2 Stop TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTATGCTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATAG AGGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT
Recalling Our Motivation We want a model that can predict whether each base in a sequence is in one of a known set of states (intergenic, start exon, intron, stop) Start TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTAGACTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATGA GAGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT Exon 1 Intron 1 Exon 2 Stop TGAATCAAGTTAGAAGTTATGGAGCATAATAACATG TGGATGGCCAGTGGTCGGTTGCTACACCCCTGCCGC AACGTTGAAGGTCCCGGATTATGCTGGCTGGATCTA TGCCGTGACACCCGTTATACTCCATTACCGTCTGTG GGTCACAGCTTGTTGTGGACTGGATTGCCATTCTCT CAGTGTATTACGCAGGCCGGCGCACGGGTCCCATAT AAACCTGTCATAGCTTACCTGACTCTACTTGGAAAT GTGGCTAGGCCTTTGCCCACGCACCTGATCGGTCCT CGTTTGCTTTTTAGGACCGGATGAACTACAGAGCAT TGCAAGAATCTCTACCTGCTTTACAAAGTGCTGGAT CCTATTCCAGCGGGATGTTTTATCTAAACACGATAG AGGGAGTATTCGTCAGGCCACATGGCTTTCTTGTTC TGGTCGGATCCATCGTTGGCGCCCGACCCCCCCATT CCATAGTGAGTTCTTCGTCCGAGCCATTGTATGCCA GATCGACAGACAGATAGCGGATCCAGTATATCCCTG GAAACTATAGACGCACAGGTTGGAATCTTAAGTGAA GTCGCGCGTCCAAACCCAGCTCTATTTTAGTGGTCA TGGGTTCTGGTCCCCCCGAGCCGCGGAACCGATTAG GACCATGTACAACAATACTTATTAGTCATCTTTTAG ACACAATCTCCCTGCTCAGTGGTATATGGTTTTTGC TATAATTAGCCACCCTCATAAGTTGCACTACTTCTG CGACCCAAATGCACCCTTACCACGAAGACAGGATTG TCCGATCCTATATTACGACTTT
How Can We Find the Most Probable Path? Can anyone tell me a way to find the most probable state path? Hint: we talked about a way to calculate the probability of any individual state path given a sequence: P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1
Could We Work Out Every Possibility? Simplest answer: calculate all possible state path probabilities and choose the largest However, there is a big problem with this way of doing things
Could We Work Out Every Possibility? Simplest answer: calculate all possible state path probabilities and choose the largest However, there is a big problem with this way of doing things which is that there are a very large number of possible state paths! In fact, there are S^N possibilities for S states and N symbols
Could We Work Out Every Possibility? No. Simplest answer: calculate all possible state path probabilities and choose the largest However, there is a big problem with this way of doing things which is that there are a very large number of possible state paths! In fact, there are S^N possibilities for S states and N symbols A T rich rich A: 0.8 A: 0.2 T: 0.2 T: 0.8 Two states, 100 positions 2 100 =1.3 10 30 Even a fast computer won t help you much
Sound Like A Familiar Problem? You all have seen something very similar to this already the goal is to find the optimal path without having to explicitly test every possibility
Sound Like A Familiar Problem? You all have seen something very similar to this already sequence alignment There, you learned about dynamic programming approaches to find the best alignment between two sequences without examining all the possibilities
Sound Like A Familiar Problem? You all have seen something very similar to this already sequence alignment There, you learned about dynamic programming approaches to find the best alignment between two sequences without examining all the possibilities The Viterbi algorithm is similar, finding the most probable state path given a sequence and a model without examining all the possible state paths
The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 Let s go back to our AT example, with one small change (note A-rich emission probabilities). Can someone talk through the parts of this model?
Begin 0.5 The Viterbi Algorithm A We can write down a graph of all possible state paths for an example sequence (AAT) T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 0.5 a-rich a-rich a-rich A A T
The Viterbi Algorithm Begin 0.5 0.5 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.5*0.7 = 0.35 0.5*0.2 = We calculate the probability of each transition/emission step
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T What calculation should we do here, to get P(A,A a-rich, a-rich)?
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 Multiply P(A a-rich) by the appropriate transition and emission probabilities
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.5*0.2 = **0.7 = 0.007 We calculate the probability of each transition/emission step
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 We calculate the probability of each transition/emission step and discard all but the most likely path leading to each state
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 0.5*0.2 = A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T How about for the state at position 2? Take a minute to do the two probability calculations.
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.2 = 0.007 0.5*0.2 = **0.2 = 0.018 And which path should we discard?
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.2 = 0.007 0.5*0.2 = **0.2 = 0.018 And which path should we discard?
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Here is the answer for the third step of the 8 (2 states ^ 3 symbols) possible paths, the Viterbi algorithm leaves us with two
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Which should we pick?
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 We can pick the most likely
Begin 0.5 0.5 The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich 0.5*0.7 = 0.35 A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Note, we didn t switch to this step what would happen if we the kept getting T s?
Begin 0.5 0.5 The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich 0.5*0.7 = 0.35 A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Eventually, the likeliest state path would become one with a transition to a state!
Begin 0.5 0.5 The Viterbi Algorithm A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich 0.5*0.7 = 0.35 A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 Said another way, if several paths converge on a particular state instead of recalculating them all when we calculate probabilities for the next step we discard the less likely paths
Begin 0.5 0.5 The Viterbi Algorithm 0.5*0.7 = 0.35 A T rich rich A: 0.7 A: 0.2 T: 0.3 T: 0.8 a-rich a-rich a-rich A A T 0.35**0.7 = 0.22 0.22**0.3 = 0.0594 0.5*0.2 = **0.2 = 0.018 0.22**0.8 = 0.0176 For practical reasons, we typically operate in log space (i.e. take the log of the probabilities), since the probabilities get very small very quickly
Outline The Viterbi Algorithm (or, how can we find the most probable state path?) The Forward-Backward Algorithm (or, how can we find the probability of a state at a particular time)
A slightly different question P ( i = k x) What if we are interested in the probability that the HMM was in a particular state k at a particular position i?
A slightly different question P ( i = k x) Any thoughts about conceptually how to do this?
A slightly different question P ( i = k x) = P (x, i = k) P (x) We can obtain this probability by dividing the probability of all state paths with i = k by the sum of the probability of all paths
A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) We can obtain this probability by dividing the probability of all state paths with i = k by the sum of the probability of all paths
A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) We can obtain this probability by dividing the probability of all state paths with i = k by the sum of the probability of all paths
A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) What problem are we doing to run into here, without an algorithm to help?
A slightly different question P ( i = k x) = P (x, i = k) P (x) P (x, i = k) = X i=k P ( x) P (x) = X P ( x) As before, the number of possible state paths is too large to brute force
The forward-backward algorithm P ( 2 = A, A, T ) Begin a-rich a-rich a-rich A A T Let s revisit our simple example. Our goal is to calculate the probability, given the model and the sequence, that the state at position 2 was
The forward-backward algorithm P ( 2 = A, A, T ) Begin a-rich a-rich a-rich A A T What arrows should we remove to illustrate possible paths through state space that correspond to our question?
The forward-backward algorithm Begin a-rich a-rich a-rich A A T These are the possible paths through state space where 2 =
The forward-backward algorithm Begin a-rich a-rich a-rich A A T Let s first just consider the forward part of the problem: probability of seeing AA and reaching the state
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T 0.5*0.7 = 0.35 0.5*0.2 = f (2) = 0.2 (0.35 + ) In the forward algorithm, we sum all the joint transition/emission probabilities leading to 2 =
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T 0.5*0.7 = 0.35 0.5*0.2 = f (2) = 0.025 This gives us probability of seeing AA and reaching the state
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T But, if our goal is to calculate Why? P ( 2 = A, A, T ) we re not done yet.
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T But, if our goal is to calculate P ( 2 = A, A, T ) we re not done yet. Why? Because we have the rest of the sequence to account for!
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T So, let s consider the backward part of the problem, which is the probability of getting the rest of the sequence given that 2 =
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T b (2) = ( 0.3+ 0.8) In the backward algorithm we sum the emission and transition probabilities across all states
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T P ( 2 = A,A,T) = f (2) b (2) P (x) Now, we can solve our problem. The probability of the model being in a state at position 2 is equal to the product of the forward and backward probabilities divided by probability of all paths P(x). How could we obtain this quantity?
The forward-backward algorithm Begin 0.5 0.5 a-rich a-rich a-rich A A T P ( 2 = A,A,T) = f (2) b (2) P (x) One way is to use the forward algorithm, summing over all the possible ending states of the final position
General form of the F-B Algorithm f k,i = e k (x i ) X l f l,i 1 a lk P (transitioning from l to k) P (emitting x i i = k) P (sequence from 1 to i i = k) P (sequence from 1 to i-1 i 1 = l) In our simple example with a three symbol sequence, we calculated one step forward and one step backward to the middle position. The forward algorithm is recursive, with each calculation being reused rather than recomputed
General form of the F-B Algorithm f k,i = e k (x i ) X l f l,i 1 a lk b k.i = X l e l (x i+1 )b l,i+1 a kl The power of these algorithms is that they eliminate the need to calculate all possible state paths
Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: results:
Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: moving forward, find the likeliest path for each state in each position and discard all the rest results:
Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: moving forward, find the likeliest path for each state in each position and discard all the rest results: the most likely state path
Viterbi vs F-B Algorithms Viterbi start: at the beginning of the sequence of symbols x algorithm: moving forward, find the likeliest path for each state in each position and discard all the rest results: the most likely state path This process is often referred to as decoding an HMM, because it reveals the most likely sequence of (hidden, encoded) states Begin 0.5 0.5 a-rich a-rich a-rich A A T
Viterbi vs F-B Algorithms F-B Algorithm start: at the i th position algorithm: results:
Viterbi vs F-B Algorithms F-B Algorithm start: at the i th position algorithm: moving forward or backward, sum the probabilities of paths leading to a particular state π i = k results:
Viterbi vs F-B Algorithms F-B Algorithm start: at the i th position algorithm: moving forward or backward, sum the probabilities of paths leading to a particular state π i = k results: the probability that the model was in state k at position i The F-B algorithm can be used to solve many other decoding problems (e.g. find the most probable state at position i)