VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by B Chor (Tel Aviv University) and E Alpaydın (MIT)

Introduction to Hidden Markov Models

A stochastic finite state machine (Markov Chain Model) for weather 3

A Hidden Markov Model for Weather -model -model What can we do MORE with this? Annotation of realization with underlying state (here H / L ) (which alternative model most likely produced which position) 4

Once more MARKOV MODEL Tim Conrad, VL AlDaBi, WT015/16 5

Markov process with a non-hidden observation process stochastic automata Three urns each full of balls of one color S 1 : red, S 2 : blue, S 3 : green O P 0. 5, 0. 2, 0. 3 0. 4 0. 2 0. 1 0. 3 0. 6 0. 1 S 1,S1,S3,S3 O A, PS PS S PS S PS S 1 T 1 a 11 a A 1 13 a 33 0. 5 0. 4 0. 3 0. 8 0. 048 1 3 1 0. 3 0. 2 0. 8 3 3 6

A Plot of 100 observed numbers for the stochastic automata 7

Histogram for the stochastic automata The proportions reflect the stationary distribution of the chain 8

Finally HIDDEN MARKOV MODEL Tim Conrad, VL AlDaBi, WT015/16 9

From Markov To Hidden Markov The previous model assumes that each state can be uniquely associated with an observable event Once an observation is made, the state of the system is then trivially retrieved This model, however, is too restrictive to be of practical use for most realistic problems To make the model more flexible, we will assume that the outcomes or observations of the model are a probabilistic function of each state Each state can produce a number of outputs according to a unique probability distribution, and each distinct output can potentially be generated at any state These are known a Hidden Markov Models (HMM), because the state sequence is not directly observable, it can only be approximated from the sequence of observations produced by the system 10

Hidden Markov Models States are not observable Discrete observations {v 1,v 2,...,v M } are recorded; a probabilistic function of the state Emission probabilities b j (m) P(O t =v m q t =S j ) Example: In each urn, there are balls of different colors, but with different probabilities. For each observation sequence, there are multiple state sequences 11

Hidden Sequence n urns containing colored balls v distinct colors Each urn has a (possibly) different distribution of colors Sequence generation algorithm: 1. (Behind the curtain) Pick initial urn according to some random process. 2. (Behind the curtain) Randomly pick ball from the urn. 3. Show it to the audience and put it back. 4. (Behind the curtain) Select another urn according to random selection process associated with the urn. 5. Repeat steps 2 4 Tim Conrad, VL AlDaBi, WT015/16 12

Hidden Markov Models (T) (E) Tim Conrad, VL AlDaBi, WT015/16 13

Typical questions Tim Conrad, VL AlDaBi, WT015/16 14

HMMs: Main Problems Evaluation Given a particular realization (observations): What is the probability that it has been produced by given HMM? (-> forward OR backward algorithm) Decoding Given a particular realization (observations): What is the most likely STATE (hidden!) sequence that produced this realization? (-> Viterbi algorithm) Training Given a model structure (!) and training data: What are the best model parameters? (-> Maximul-Likelihood Estimation, Baum-Welch=Forward-Backward Algorithm)

MODEL SELECTION Tim Conrad, VL AlDaBi, WT015/16 16

The coin-toss problem Consider the following scenario Assume that you are placed in a room with a curtain Behind the curtain there is a person performing a cointoss experiment This person selects one of several coins, and tosses it: heads (H) or tails (T) The person tells you the outcome (H,T), but not which coin was used each time Your goal is to build a probabilistic model that best explains a sequence of observations O={o1,o2,o3,o4, }={H,T,T,H,, } The coins represent the states; these are hidden because you do not know which coin was tossed each time The outcome of each toss represents an observation A likely sequence of coins may be inferred from the observations, but this state sequence will not be unique 17

The Coin Toss Example 1 coin The Markov model is observable since there is only one state In fact, we may describe the system with a deterministic model where the states are the actual observations (see figure) The model parameter P(H) may be found from the ratio of heads and tails O= H H H T T H S = 1 1 1 2 2 1 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 18

The Coin Toss Example 2 coins 19

The Coin Toss Example 3 coins 20

1, 2 or 3 coins? Which of these models is best? Since the states are not observable, the best we can do is select the model that best explains the data (e.g., Maximum Likelihood criterion) Whether the observation sequence is long and rich enough to warrant a more complex model is a different story, though 21

FROM COINS TO DNA Tim Conrad, VL AlDaBi, WT015/16 22

Hidden Markov Models In 1989 Gary Churchill introduced the use of HMM for DNA-segmentation. CENTRAL IDEAS: The (DNA) string is generated by a system The system can be a number of distinct states The system can change between states with probability T In each state the system emits symbols to the string with probability E

Example: Change points in Lambda-Phage 0.0002 0.9998 CG RICH AT RICH 0.9998 0.0002 A: 0.2462 C: 0.2476 G: 0.2985 T: 0.2077 A: 0.2700 C: 0.2084 G: 0.1981 T: 0.3236 24

Example: Change points in Lambda-Phage 0.0002 0.9998 CG RICH AT RICH 0.9998 0.0002 A: 0.2462 C: 0.2476 G: 0.2985 T: 0.2077 A: 0.2700 C: 0.2084 G: 0.1981 T: 0.3236 25

Hidden Markov Models T(1,2) STATE 1 STATE 2 T(2,3) STATE 3 A: p A_1 T: p T_1 C: p C_1 G: p G_1 A: p A_2 T: p T_2 C: p C_2 G: p G_2 A: p A_3 T: p T_3 C: p C_3 G: p G_3 s = h = TTCACTGTGAACGATCCGAATCGACCAGTACTACGGCACGTTGCCAAAGCGCTTATCTAGC 1111111111111111111111112222222222222333333333333333333333333 26

HMM for gene prediction Intron Donor Acceptor Exon the Markov model: Start codon Stop codon Intergenic q 0 the input sequence: AGCTAGCAGTATGTCATGGCATGTTCGGAGGTAGTACGTAGAGGTAGCTAGTATAGGTCGATAGTACGCGA the gene prediction: exon 1 exon 2 exon 3

28 TRANSITION MATRIX = the probability of a state change: EMISSION PROBABILITY = symbol probability distribution in a certain state ) ( ), ( 1 k h l h P l k T l i ) ( ), ( k h b s P b k E i i HMM Essentials

HMM Essentials INITIAL PROBABILITY of a state : T( 0, k) P( h1 k) sequence of the states visited: h sequence of the generated symbols: s 29

HMM Essentials Probability of the hidden states h: P( h) T(0, h ) T( h1, h2 ) T ( h n 1, h 1 n Probability of generated symbol string s given the hidden states h ) P ( s h ) E ( h, s 1) E ( h 2, s 2 ) E ( h n, s 1 n ) 30

HMM Essentials Joint probability of symbol string s (observations) and hidden states h: P( s, h) P( s h) P( h) 31

32 Theorem of total probability : Most likely (hidden) sequence: n j n j j j j P P P P H H h h h h s h s s ) ( ) ( ), ( ) ( ), ( arg max * h s h h P n H HMM Essentials

ALGORITHMS Tim Conrad, VL AlDaBi, WT015/16 33

34 (1) Probability of a sequence s given a HMM is: (2) The most probable (hidden) sequence is: n j n j j j j P P P P H H h h h h s h s s ) ( ) ( ), ( ) ( ), ( arg max * h s h h P n H Algorithms for HMM computations

How to get probability of sequence s? FORWARD ALGORITHM Tim Conrad, VL AlDaBi, WT015/16 35

What is p(s)? In Markov chains, the probability of a sequence was calculated by the equation : P(s) P(s L s L-1 ) P(s L-1 s L-2 ) (s 2 s 1 )P(s 1 ) P(s 1 ) L i2 t s s i1 i What is the probability P(x) for an HMM?

HMM Recognition For a given model M = {T, E, p} and a given state sequence h 1 h 2 h 3 h L,, the probability of an observation (symbol) sequence s 1 s 2 s 3 s L is P(s h,m) = e_h 1 s 1 e_h 2 s 2 e_h 3 s 3 e_h T s T For a given hidden Markov model M = {T, E, p} the probability of the state sequence h 1 h 2 h 3 h L is (the initial probability of h 1 is taken to be ph 1 ) P(h M) = ph 1 t_h 1 h 2 t_h 2 h 3 t_h 3 h 4 t_h L-1 h L So, for a given HMM, M the probability of an observation sequence s 1 s 2 s 3 s T is obtained by summing over all possible state sequences 37

HMM Recognition (cont.) P(s M) = S P(s h) P(h M) = S h ph 1 e_h 1 s 1 t_h 1 h 2 e_h 2 s 2 t_h 2 h 3 eh 2 s 2 Requires summing over exponentially many paths Can this be made more efficient? 38

HMM Recognition (cont.) Why isn t it efficient? O(2LH L ) For a given state sequence of length L we have about 2L calculations P(h M) = ph 1 t_h 1 h 2 t_h 2 h 3 t_h 3 h 4 t_h L-1 h L P(s h) = e_h 1 s 1 e_h 2 s 2 e_h 3 s 3 eh L s L There are H L possible (hidden) state sequences So, if H=5, and L=100, then the algorithm requires 200x5 100 computations We can use the forward-backward (F-B) algorithm to do things efficiently 39

The FORWARD algorithm Given a sequence s of length n and an HMM with parameters (T,E,p): 1. Create table F of size H x(n+1); 2. Initialize i=0; F(0,0)=1; V(k,0)=0 for k>0; 3. For i=1:n, compute each entry using the recursive relation: F(j,i) = E(j,s(i))* k {F(k,i-1)*T(k,j) } pointer(i,j) = arg max k {V(k,i-1)*T(k,j) } 4. OUTPUT: P(s) = k {F(k,n)} 40

How to get h*? DECODING (VITERBI) Tim Conrad, VL AlDaBi, WT015/16 41

Decoding Most probable path for sequence CGCG Tim Conrad, VL AlDaBi, WT015/16 42

Decoding INPUT : A hidden Markov model M = (T,E,p) and a sequence s S, for which the generating path h = (h 1,, h L ) is unknown. QUESTION : What is the most probable generating path h for s? In general there may be many state sequences that could give rise to any particular sequence of symbols. If we know the identity of h i, then the most probable sequence on i+1,,n does not depend on observations before time i

The VITERBI Dynamic Programming algorithm Given a sequence s of length n and an HMM with parameters (T,E,p): 1. Create table V of size H x(n+1); 2. Initialize i=0; V(0,0)=1; V(k,0)=0 for k>0; 3. For i=1:n, compute each entry using the recursive relation: V(j,i) = E(j,s(i))*max k {V(k,i-1)*T(k,j) } pointer(i,j) = arg max k {V(k,i-1)*T(k,j) } 4. OUTPUT: P(s,h*) = max k {V(k,n)} 5. Trace-back: i=n:1, using: h* i-1 = pointer(i, h* i ) 6. OUTPUT: h*(n) = max k {V(k,n)} Time complexity: O(L S 44 2 ) Space complexity: O(L S )

Some comments In Viterbi, Forward and Backward algorithms : Complexity - Time complexity: O(L Q 2 ) - Space complexity: O(L Q ) Implementation: should be done in log space to avoid underflow errors

NEXT SLIDS NOT RELEVANT FOR EXAM Tim Conrad, VL AlDaBi, WT015/16 46

The Baum-Welch algorithm is a heuristic algorithm for finding a solution to the problem of PARAMETER ESTIMATION Tim Conrad, VL AlDaBi, WT015/16 47

The EXPECTATION MAXIMIZATION algorithm Given a sequence s and an HMM with unknown (T,E): 1. Initialize h, E and T; 2. Given s and h estimate E and T just by counting the symbols; 3. Given s, E and T estimate h e.g. with Viterbi-algorithm; 4. Repeat steps 2 and 3 until some criterion is met. 48

Mehr Informationen im Internet unter medicalbioinformatics.de/teaching Tim Conrad AG Medical Bioinformatics Weitere Fragen www.medicalbioinformatics.de