Hidden Marov Models (Part ) BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mar Craven craven@biostat.wisc.edu Fall 0 A simple HMM A G A G C T C T! given say a T in our input sequence, which state emitted it?
The hidden part of the problem! we ll distinguish between the observed parts of a problem and the hidden parts! in the Marov models we ve considered previously, it is clear which state accounts for each part of the observed sequence! in the model above, there are multiple states that could account for each part of the observed sequence this is the hidden part of the problem Simple HMM for gene finding Figure from A. Krogh, An Introduction to Hidden Marov Models for Biological Sequences AAAAA A A A CTACA start C G C G C G start CTACC CTACG CTACT T T T GCTAC TTTTT
The parameters of an HMM! as in Marov chain models, we have transition probabilities a l = P(" i = l " i# = ) probability of a transition from state to l! represents a path (sequence of states) through the model! since we ve decoupled states and characters, we might also have emission probabilities e (b) = P(x i = b " i = ) probability of emitting character b in state a e (A) A simple HMM with emission parameters probability of a transition from state to state probability of emitting character A in state 0. 0. 0.5 A 0. C 0. G 0. T 0. A 0. C 0. G 0. T 0. 0.6 0 5 0.5 A 0. A 0. 0.9 C 0. 0. C 0. G 0. G 0. T 0. T 0. 0.
Three important questions! How liely is a given sequence? the Forward algorithm! What is the most probable path for generating a given sequence? the Viterbi algorithm! How can we learn the HMM parameters given a set of sequences? the Forward-Bacward (Baum-Welch) algorithm! let " 0. 0.5 Path notation be a vector representing a path through the HMM A 0. C 0. G 0. T 0. 0. A 0. C 0. G 0. T 0. 0.6 0 5 0.5 A 0. A 0. 0.9 C 0. 0. C 0. G 0. G 0. T 0. T 0. 0. x = A x = A x = C " 0 = 0 " = " = " = " = 5
How liely is a given sequence?! the probability that the path the sequence is generated: x x L " 0 " N is taen and L # P(x x L, " 0. " N ) = a 0" e " i (x i )a " i " i+ i= (assuming / are the only silent states on path) How liely is a given sequence? 0. 0. 0.5 A 0. C 0. G 0. T 0. A 0. C 0. G 0. T 0. 0.6 0 5 0.5 A 0. A 0. 0.9 C 0. 0. C 0. G 0. G 0. T 0. T 0. 0. P(AAC," ) = a 0 # e (A) # a # e (A) # a # e (C) # a 5 = 0.5 # 0. # 0. # 0. # # 0.# 0.6
How liely is a given sequence?! the probability over all paths is: # " P(x x L ) = P(x x L, " 0 " N )! Number of paths! for a sequence of length L, how many possible paths through this HMM are there? A 0. C 0. G 0. T 0. 0 A 0. C 0. G 0. T 0. L!! the Forward algorithm enables us to compute efficiently P(x x L )
How liely is a given sequence: the Forward algorithm f (i)! define to be the probability of being in state having observed the first i characters of x! f N (L)! we want to compute, the probability of being in the state having observed all of x!! can define this recursively The Forward algorithm! because of the Marov property, don t have to explicitly enumerate every path use dynamic programming instead 0. 0. 0.5 A 0. C 0. G 0. T 0. A 0. C 0. G 0. T 0. 0.6 0 5 0.5 A 0. A 0. 0.9 C 0. 0. C 0. G 0. T 0. G 0. T 0. 0.! e.g. compute f ( ) using f i! ), f ( i ) i (!
The Forward algorithm initialization: f 0 (0) = probability that we re in start state and have observed 0 characters from the sequence f ( 0) = 0, for that are not silent states The Forward algorithm recursion for emitting states (i = L): f l (i) = e l (i) recursion for silent states: f l (i) = " # f (i)a l f (i ")a l
The Forward algorithm termination: P(x) = P(x x L ) = f N (L) = " f (L)a N probability that we re in the state and have observed the entire sequence Forward algorithm example 0. 0.5 A 0. C 0. G 0. T 0. 0. A 0. C 0. G 0. T 0. 0.6 0 A 0. A 0. 5 0.5 0.9 C 0. 0. C 0. G 0. G 0. T 0. T 0. 0.! given the sequence x = TAGA
Forward algorithm example! given the sequence x = TAGA! initialization f (0) f 0) = 0 f (0) 0 0 = ( 5 =! computing other values f () = e (T) " ( f 0 (0)a 0 + f (0)a ) =... ( ) = 0.5 ( ) 0." " 0.5 + 0 " 0. f () = 0. " " 0.5 + 0 " f () = e (A) " ( f 0 ()a 0 + f ()a ) = ( ) 0. " 0 " 0.5 + 0.5 " 0. P(TAGA) = f 5 () = ( f ()a 5 + f ()a 5 ) Forward algorithm note! in some cases, we can mae the algorithm more efficient by taing into account the minimum number of steps that must be taen to reach a state! e.g. for this HMM, we don t need to initialize or compute the values 0 5 f f 5 (0), (0), f f 5 (0), ()
Three important questions! How liely is a given sequence?! What is the most probable path for generating a given sequence?! How can we learn the HMM parameters given a set of sequences? Finding the most probable path: the Viterbi algorithm! define v (i) to be the probability of the most probable path accounting for the first i characters of x and ing in state!! we want to compute v N (L), the probability of the most probable path accounting for all of the sequence and ing in the state! can define recursively, use DP to find v N (L) efficiently
Finding the most probable path: the Viterbi algorithm! initialization: v 0 (0) = v ( 0) = 0, for that are not silent states The Viterbi algorithm! recursion for emitting states (i = L): l l i [ v ( i a ] v ( i) = e ( x ) max! )! recursion for silent states: l l [ v ( i a ] v ( i) = max ) l l [ v ( i a ] ptr ( i ) = arg max ) l l [ v ( i a ] ptr ( i ) arg max! ) = eep trac of most l probable path
The Viterbi algorithm! termination: P(x,") = max! L = arg max ( v (L)a N ) ( v ( L) a ) N! tracebac: follow pointers bac starting at! L Three important questions! How liely is a given sequence?! What is the most probable path for generating a given sequence?! How can we learn the HMM parameters given a set of sequences?