Hidden Markov Models (HMMs) November 14, 2017
inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes have open reading frames and are segmented (interrupted by introns) in eukaryotic genomes. Looking at a large genomic region with lots of open reading frames, which ones belong to genes?
inferring a hidden truth: simple HMMs hidden states (intended words, gene presence) are related to the observed states (static, lots of open reading frames). Each hidden state has one observed state. Characteristics of the current hidden state are governed in some way by the previous hidden state, but not that state s previous state or any earlier states. Characteristics of the current observed state are governed by the current hidden state.
Markov chains 3 states of weather: sunny, cloudy, rainy Observed once a day at the same time All transitions are possible, with some probability Each state depends only on the previous state
Markov chains: another view start state1 state2 state3 state4 state5 time t1 t2 t3 t4 t5
Markov chains State transition matrix: the probability of the weather today given yesterday s weather The rows of the transition matrix must sum to one Initial distribution must be defined (day one: p(sunny)=?, p(cloudy)=?, p(rainy)=?...)
Markov chains P(x L x L-1, x L-2,... x 1 ) = P(x L x L-1 ) for all L. What does this mean?? The current state, L, does not depend on anything but the previous state. This is the memoryless property. Very important.
First-order Markov model P(x) = probability of a particular sequence of observations x = {x 1, x 2,... x n } p ij = probability that if the previous symbol is i, the next symbol will be j. Under this model, p(accgata) (probability of observing this sequence) is just p A p AC p CC p CG p GA p AT p TA where p AC = p(there will be a C after an A) = p(c A), and that probability does NOT depend on anything in the sequence besides that preceding A. Then p(accgata) = p 1 Πp ij
Higher order Markov chains Sunny = S, cloudy = C 2nd order Markov model: weather depends on yesterday plus the day before Not all state transitions are possible! SSCSCC = S 1 S 2 + S 2 C 3 + C 3 S 4 + S 4 C 5 + C 5 C 6 SS SC CS CC
Hidden Markov Models Back to the weather example. All we can observe now is the behavior of a dog only he can see the weather, we cannot!!! Dog can be in, out, or standing pathetically on the porch. This depends on the weather in a quantifiable way. How do we figure out what the weather is if we can only observe the dog?
Hidden Markov Models Dog s behavior is the emission of the weather (the hidden states) Output matrix = emission probabilities Hidden states = system described by Markov model Observable states = side effects of the Markov model
Hidden Markov Models: another view observation 1 observation 2 observation 3 observation 4 observation 5 πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 time t1 t2 t3 t4 t5 q s can be sunny, cloudy, or rainy. Observations are the dog s behavior (in, out, porch)
Hidden Markov Models All we observe is the dog: IOOOIPIIIOOOOOPPIIIIIPI What s the underlying weather (the hidden states)? How likely is this sequence, given our model of how the dog works? What portion of the sequence was generated by each state?
Hidden Markov Models start: p(c)=0.2, p(r)=0.2, p(s)=0.6 All we observe is the dog: IOIOIPI weather? guess: RSRSRRR? p(dog s behavior) = 0.023 but p(rsrsrrr) is only 0.00012. CCCCCCC? p(dog) = 0.002 but p(weather) = 0.0094
Hidden Markov Models: the three questions Evaluation Given a HMM, M, and a sequence of observations, x Find P(x M) Decoding Given a HMM, M, and a sequence of observations, x Find the sequence Q of hidden states that maximizes P(x, Q M) Learning Given an unknown HMM, M, and a sequence of observations, x Find parameters θ that maximize P(x θ, M)
Hidden Markov model: Five components 1. A set of N hidden states S 1, S 2, S N S 1 = sunny S 2 = cloudy S 3 = rainy N=3
Hidden Markov model: Five components 2. An alphabet of distinct observation symbols A = {in, out, porch} = {I,O,P}
Hidden Markov model: Five components 3. Transition probability matrix P = (p ij ) where q t is the shorthand for the hidden state at time t. q t = S i means that the hidden state at time t was state S i p ij = P(q t+1 = S j q t = S i ) transition matrix: hidden states!
Hidden Markov model: Five components 4. Emission probabilities: For each state S i and a in A b i (a) = p(s i emits symbol a) The probabilities b i (a) form an NxM matrix where N=#hidden states, M=#observed states b 1 (O) = p(s 1 emits out ) = 0.7
Hidden Markov model: Five components 5. An initial distribution vector π = (π i ) where π i = P(q 1 = S i ). start: p(c)=0.2, p(r)=0.2, p(s)=0.6 p(q 1 = S 1 ) = probability that the (hidden) first state is sunny = 0.6 so π = (0.6,0.2,0.2) NOTE that the first emitted state is not specified in the initial distribution vector. That s part of the model.
HMMs: another view x = {O,I,I,O,P} S = {S, C, R} x1 x2 x3 x4 x5 bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5) πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 Sq1 Sq2 Sq3 Sq4 Sq5 time t1 t2 t3 t4 t5
HMM: solve problem 1 (evaluation) Given a HMM, M, and a sequence, x, find P(x M) this tells you how unusual the observations are, regardless of hidden states. One way to do this is brute force: find all possible sequences of hidden states calculate p(x Q) for each then p(x) = Σp(X Q)p(Q) (sum over ALL hidden state sequences Q) But this takes an exponential number of calculations... 2NT N where N=#hidden states and T=length of observed sequence
HMM: solve problem 1 (evaluation) Given a HMM, M, and a sequence, x, find P(x M) Forward algorithm: Calculate the probability of the sequence of observations up to and including time t: P(x 1 x 2 x 3 x 4... x t ) =?? that s the same problem. If we knew the hidden state for x t, we could use that, so let α(t, i) = P(x 1 x 2 x 3 x 4... x t, q t = S i ) (a joint probability)
HMM: solve problem 1 (evaluation)
evaluation I O P P I bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5) πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 Sq1 Sq2 Sq3 Sq4 Sq5 time t1 t2 t3 t4 t5=t
HMM: solve problem 1 (evaluation) Given a HMM, M, and a sequence, x, find P(x M) We can also use the Backward algorithm for this problem. Briefly: where T is the total number of observed states. Generalize:
evaluation I O P P I bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5) πq1 pq1q2 pq2q3 pq3q4 pq4q5 q1 q2 q3 q4 q5 Sq1 Sq2 Sq3 Sq4 Sq5 time t1 t2 t3 t4 t5=t
HMM forward-backward algorithm If you re following this you realize the forward and backward algorithms alone are just formalizations of the brute force method! How does this help? Now, if we fix the identity of the hidden state at time t, we can calculate the probability of the sequence. These formulas come in handy later.
HMM forward-backward algorithm The forward and backward algorithms work together: I know how to calculate the probability of a sequence up to a time point, if I know the hidden state at that time point (α(t,i)). I know how to calculate the probability of an observed sequence from the end backward to a time point, given the hidden state at that time point (β(t,i))
HMM forward-backward algorithm dog: IOOOPPPIO 123456789 p(1-4, 4=S)*p(4-9 4=S) p(1-4, 4=C)*p(4-9 4=C) p(1-4, 4=R)*p(4-9 4=R)} p(iooopppio)
HMM: solve problem 2 (decoding) Decoding: Given a HMM M and a sequence x, find the sequence π of states that maximizes P(x, Q M) We ll use the Viterbi algorithm. Assumptions needed for the Viterbi algorithm: 1) the observed and hidden events must be in a sequence 2) an observed event must correspond to one and only one hidden event 3) computing the most likely sequence up to point t depends only on the observed event at t and the most likely sequence up to t-1.
HMM: solve problem 2 (decoding) Viterbi algorithm
HMM: solve problem 2 (decoding) Viterbi algorithm In this problem, we don t really care what the probability of the sequence is. It happened. What we want to know is whether it was sunny when the dog was inside on the third day.
HMM: solve problem 2 (decoding) Viterbi algorithm: uses a form of dynamic programming to decode hidden state at each time point, using the forward and backward algorithms. Observed: OOIP What is q3?
HMM: solve problem 2 (decoding) Viterbi algorithm: uses a form of dynamic programming to decode hidden state at each time point, using the forward and backward algorithms. OOIP What is q3? time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P
HMM: solve problem 2 (decoding) OOIP What is q3? time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P need to figure out the most likely hidden states & look at which is at t3.
HMM: solve problem 2 (decoding) time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P
HMM: solve problem 2 (decoding) time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P
HMM: solve problem 2 (decoding) time t1 t2 t3 t4 hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4 bq1(o) bq2(o) bq3(i) bq4(p) emitted O O I P = π(q1 ) * bq1(o) * p12 * bq2(o) * p23 * bq3(i) = bq4(p) * p34
HMM: solve problem 2 (decoding) Sunny Cloudy Rainy O O I P
HMM: solve problem 2 (decoding) O O I P = π(q1 ) * bq1(o) * p12 * bq2(o) * p2s * bs(i) = bq4(p) * ps4
HMM: solve problem 2 (decoding) O O I P
HMM: solve problem 2 (decoding) O O I P
HMM: solve problem 2 (decoding) The Viterbi algorithm is very powerful and can distinguish subtle features of strings Originally designed for speech processing Dishonest casino problem
One example: very crude gene finding evaluating nonoverlapping chunks of 21bp sequence C= coding (no stop codon) N= noncoding (one or more stop codons) n i+1 exon intron intergenic 21bp coding 21bp noncoding n i exon 0.4 0.5 0.1 intron 0.2 0.8 0 intergenic 0.1 0 0.9 exon 0.90 0.1 intron 0.2 0.8 intergenic 0.3 0.7 CCCCCNNNNNNCCNNNCCCCCCCCCNNCNCCCNNNNNN
Applications of HMMs and Markov Models Sequence alignment pairwise and multiple PFAM HMMPro HMMER SAM Making profiles to describe sequence families Finding signals in DNA Gene finding (GLIMMER, GENSCAN) Motif finding Segmentation analysis (microarray data, any signals) Finding CpG islands