Hidden Markov Models

Size: px

Start display at page:

Download "Hidden Markov Models"

Alban McDaniel
5 years ago
Views:

1 s Ben Langmead Department of Computer Science Please sign guestbook ( to tell me briefly how you are using the slides. For original Keynote files, me

2 Sequence models Can Markov chains find CpG islands in a sea of genome? MC assigns a score to a string; doesn t naturally give a running score across a long sequence Probability of being in island Genome position But we can adapt it using a sliding window

3 Sequence models Probability of being in island Genome position Choice of k requires assumption about island lengths If k is too large, we miss small islands If k is too small, we see many small islands We d like a method that switches between Markov chains when entering or exiting a CpG island

4 Sequence models Something like this: A inside C inside A outside C outside T inside G inside T outside G outside Switching edges

5 p 1 p 2 p 3 p 4 p n... x 1 x 2 x 3 x 4 x n... Steps 1 through n p = { p 1, p 2,..., p n } is a sequence of states (AKA a path). Each p i takes a value from set Q. We do not observe p. x = { x 1, x 2,..., x n } is a sequence of emissions. Each x i takes a value from set. We do observe x.

6 p 1 p 2 p 3 p 4 p n... x 1 x 2 x 3 x 4 x n... Edges convey conditional independence x 2 is conditionally independent of everything else given p 2 p 4 is conditionally independent of everything else given p 3

7 Example: occasionally dishonest casino Dealer repeatedly flips a coin. Sometimes the coin is fair, with P(heads) = 0.5, sometimes it s loaded, with P(heads) = 0.8. Between each flip, dealer switches coins (invisibly) with prob loaded.. T H H T H Emissions are heads/tails, states are loaded/fair

8 p 1 p 2 p 3 p 4 p n... x 1 x 2 x 3 x 4 x n... Joint probability of a given p, x is easy to calculate Repeated applications of multiplication rule Simplification using Markov assumptions (implied by edges above) Product of conditional probabilities (1 per edge), times marginal: P(p 1 )

9 p 1 p 2 p 3 p 4 p n.. x 1 x 2 x 3 x 4 x n.. P( p 1, p 2,..., p n, x 1, x 2,..., x n ) = n P( x k p k ) k = 1 n P( p k p k-1 ) P( p 1 ) k = 2 Q emission matrix E encodes P( x i p i )s E [p i, x i ] = P( x i p i ) Q Q transition matrix A encodes P( p i p i-1 )s A [p i-1, p i ] = P( p i p i-1 ) Q array I encodes initial probabilities of each state I [p i ] = P( p 1 )

10 Dealer repeatedly flips a coin. Coin is sometimes fair, with P(heads) = 0.5, sometimes loaded, with P(heads) = 0.8. Dealer occasionally switches coins, invisibly to you. After each flip, dealer switches coins with probability 0.4 F L H T A: E: F F L L Q emission matrix E encodes P( x i p i )s E [p i, x i ] = P( x i p i ) Q Q transition matrix A encodes P( p i p i-1 )s A [p i-1, p i ] = P( p i p i-1 )

11 A F L E H T Given A & E (right), what is the joint probability of p & x? F L F L p F F F L L L F F F F F x T H T H H H T H T T H P( x i p i ) P( p i p i-1 ) If P( p 1 = F ) = 0.5, then joint probability = =

12 Given flip outcomes (heads or tails) and the conditional & marginal probabilities, when was the dealer using the loaded coin? There are many possible ps, but one of them is p*, the most likely given the emissions. p* = argmax P( p x ) p = argmax P( p, x ) p Finding p* given x and using the Markov assumption is often called decoding. Viterbi is a common decoding algorithm. Andrew Viterbi

13 : Viterbi algorithm Fill in a dynamic programming matrix S: S Loaded Fair Q H H T T H T H H H H H x max P(xF, HHTT) x max P(xL, HHTTHTHH) x S k, i = greatest joint probability of observing the length-i prefix of x and any sequence of states ending in state k

14 : Viterbi algorithm Say x i is Heads s Fair, i = P(Heads Fair) max { s k,i-1 P(Fair k) } k { Fair, Loaded } Emission prob Transition prob

15 : Viterbi algorithm Say x i is Heads s Fair, i = P(Heads Fair) max { s k,i-1 P(Fair k) } k { Fair, Loaded } Emission prob Transition prob s Loaded, i-1 P(Fair Loaded) max s Fair, i-1 P(Fair Fair) s Fair, i Heads P(Heads Fair)

16 : Viterbi algorithm Say x i is Heads s Fair, i = P(Heads Loaded) max { s k,i-1 P(Loaded k) } k { Fair, Loaded } Emission prob Transition prob s Loaded, i-1 P(Loaded Loaded) s Loaded, i max s Fair, i-1 P(Loaded Fair) Heads P(Heads Loaded)

17 : Viterbi algorithm Loaded Fair Q H H T T H T H H H H H x

18 : Viterbi algorithm Loaded Fair Q H H T T H T H H H H H x Dealer repeatedly flips a coin. Sometimes the coin is fair, with P(heads) = 0.5, sometimes it s loaded, with P(heads) = 0.8. Between each flip, dealer switches coins (invisibly) with prob. 0.4.

19 : Viterbi algorithm Loaded Fair H H T T H T H H H H H x Q Assume we start with Fair/Loaded with equal probability S L, 0 = P(H L) 0.5 S F, 0 = P(H F) 0.5 S L, 0 = S F, 0 =

20 : Viterbi algorithm Loaded Fair H H T T H T H H H H H x Q S L, 1 = P(H L) 0.5 P(L L) max{ 0.5 P(L F) A F L F L E H T F L

21 : Viterbi algorithm Loaded Fair H H T T H T H H H H H x Q S L, 1 = P(H L) 0.4 P(L L) max{ 0.25 P(L F) A F L F L E H T F L

22 : Viterbi algorithm Loaded Fair H H T T H T H H H H H x Q S L, 1 = max{ = A F L F L E H T F L

23 : Viterbi algorithm Loaded Fair H H T T H T H H H H H x Q S F, 1 = P(H F) 0.4 P(F L) max{ 0.25 P(F F) A F L F L E H T F L

24 : Viterbi algorithm Loaded Fair H H T T H T H H H H H x Q S F, 1 = max{ = = 0.08 A F L F L E H T F L

25 : Viterbi algorithm L L L L F F F F F L L Loaded Fair 5E-04 8E-05 Arrow corresponds to term that wins the maximum Traceback: Start from greatest score in final column Keep asking "how did I get here?" (which predecessor state "won" the maximum) until we reach 1st column

26 : Viterbi algorithm How much work is this? Q = set of states, n = length of emission string n Q # s k, i values to calculate = n Q, each involves max over Q products O(n Q 2 ) Matrix A has Q 2 elements, E has Q elements, I has Q elements

27 : Viterbi algorithm >>> hmm = HMM({"FF":0.6, "FL":0.4, "LF":0.4, "LL":0.6},... {"FH":0.5, "FT":0.5, "LH":0.8, "LT":0.2},... {"F":0.5, "L":0.5}) >>> prob, _ = hmm.viterbi("ththhhthtth") >>> print prob e-06 >>> prob, _ = hmm.viterbi("ththhhthtth" * 100) >>> print prob 0.0 Repeat string 100 times Occassionally dishonest casino setup What happened? Underflow!

28 : Viterbi algorithm >>> hmm = HMM({"FF":0.6, "FL":0.4, "LF":0.4, "LL":0.6},... {"FH":0.5, "FT":0.5, "LH":0.8, "LT":0.2},... {"F":0.5, "L":0.5}) >>> prob, _ = hmm.viterbi("ththhhthtth") >>> print prob e-06 >>> prob, _ = hmm.viterbi("ththhhthtth" * 100) >>> print prob 0.0 >>> logprob, _ = hmm.viterbil("ththhhthtth" * 100) >>> print logprob log-space Viterbi Repeat string 100 times Solution: switch to log probabilities. Multiplies become adds.

29 We know what an HMM is, how to calculate joint probability, and how to find the most likely path given an emission (Viterbi) Can we design an HMM for finding CpG islands? p 1 p 2 p 3 p 4 p n... x 1 x 2 x 3 x 4 x n...

30 Idea 1: Q = { inside, outside }, = { A, C, G, T } p 1 p 2 p 3 p 4 p n I I O I O I O I O... O x 1 x 2 x 3 x 4 x n A C A C A C A C... A C G G G G G T T T T T

31 Idea 1: Q = { inside, outside }, = { A, C, G, T } I inside O outside A I O E A C G T Fraction of Is followed by Os I O Transition matrix I O Emission matrix Estimate as fraction of nucleotides inside islands that are C

32 Example 1 using HMM idea 1: A I O I O E A C G T I O I x: ATATATACGCGCGCGCGCGCGATATATATATATA p: (from Viterbi) OOOOOOOIIIIIIIIIIIIIIOOOOOOOOOOOOO

33 Example 2 using HMM idea 1: A I O I O E A C G T I O I x: ATATCGCGCGCGATATATCGCGCGCGATATATAT p: (from Viterbi) OOOOIIIIIIIIOOOOOOIIIIIIIIOOOOOOOO

34 Example 3 using HMM idea 1: A I O I O E A C G T I O I x: ATATATACCCCCCCCCCCCCCATATATATATATA p: OOOOOOOIIIIIIIIIIIIIIOOOOOOOOOOOOO (from Viterbi) Oops - clearly not a CpG island

35 Idea 2: Q = { A i, C i, G i, T i, A o, C o, G o, T o }, = { A, C, G, T } p 1 p 2 p n A i C i G i T i A i C i G i T i... A i C i G i T i A o C o G o T o A o C o G o T o A o C o G o T o x 1 x 2 x n A C A C... A C G G G T T T

36 Idea 2: Q = { A i, C i, G i, T i, A o, C o, G o, T o }, = { A, C, G, T } A inside C inside A outside C outside T inside G inside T outside G outside All inside-outside edges

37 Idea 2: Q = { A i, C i, G i, T i, A o, C o, G o, T o }, = { A, C, G, T } A A i C i G i T i A o C o G o T o A i C i G i T i A o C o G o T o Estimate P(C i T i ) as # T i C i s divided by # T i s Transition matrix E A C G T A i C i G i T i A o C o G o T o Emission matrix

38 Trained transition matrix: A C G T a c g t A: , , , , , , , C: , , , , , , , G: , , , , , , , T: , , , , , , , a: , , , , , , , c: , , , , , , , g: , , , , , , , t: , , , , , , , Uppercase = inside, lowercase = outside

39 Trained transition matrix A: Once inside, we re likely to stay inside for a while A i C i G i T i A o A i C i G i T i A o C o G o T o Red: low probability Yellow: high probability When we exit, last inside base is a G Black: probability = 0 C o G o T o When we enter, first inside base is a C Same for outside

40 Viterbi result; lowercase = outside, uppercase = inside: atatatatatatatatatatatatatatatatatatatatcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcgcg CGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCG CGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGatatatatatatatatatatatatatatatatatatatatatatatatatatat

41 Viterbi result; lowercase = outside, uppercase = inside: atatatatatatatatatatatatatatatatatatatatgggggggggggggggggggggggggggggggggggggggggggggggggggggggggg gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg ggggggggggggggggggggggggggggggggggggggggggggatatatatatatatatatatatatatatatatatatatatatatatatatatat

42 Many of the Markov chains and HMMs we ve discussed are first order, but we can also design models of higher orders First-order Markov chain:... Second-order Markov chain:......

43 For higher-order HMMs, Viterbi s k, i no longer depends on just the previous state assignment s Fair, i-2 p i-2 F p i-1 F s Fair, i-1 p i F s Fair, i s Loaded, i-2... L x i-1 L s Loaded, i-1 x i-1... L x i H T H T H T Equivalently, we can expand the state space, as we did for CpG islands.

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)