VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Similar documents
L23: hidden Markov models

O 3 O 4 O 5. q 3. q 4. Transition

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. Three classic HMM problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

HIDDEN MARKOV MODELS

Hidden Markov Models for biological sequence analysis I

Parametric Models Part III: Hidden Markov Models

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Hidden Markov Models

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Markov Chains and Hidden Markov Models. = stochastic, generative models

Hidden Markov Models

HMM: Parameter Estimation

Hidden Markov Models (I)

Introduction to Machine Learning CMU-10701

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

HMMs and biological sequence analysis

Lecture 11: Hidden Markov Models

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Lecture 3: Markov chains.

Hidden Markov Models. Terminology and Basic Algorithms

STA 414/2104: Machine Learning

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models (HMMs) November 14, 2017

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Data Mining in Bioinformatics HMM

Genome 373: Hidden Markov Models II. Doug Fowler

Hidden Markov Models

STA 4273H: Statistical Machine Learning

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Statistical Sequence Recognition and Training: An Introduction to HMMs

Hidden Markov Models (HMMs)

Machine Learning for natural language processing

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Modelling

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Info 2950, Lecture 25

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

Lecture 9. Intro to Hidden Markov Models (finish up)

Computational Genomics and Molecular Biology, Fall

Statistical Methods for NLP

Brief Introduction of Machine Learning Techniques for Content Analysis

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Dynamic Approaches: The Hidden Markov Model

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Lecture 7 Sequence analysis. Hidden Markov Models

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Hidden Markov Models. Hosein Mohimani GHC7717

Linear Dynamical Systems (Kalman filter)

BMI/CS 576 Fall 2016 Final Exam

Today s Lecture: HMMs

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Multiscale Systems Engineering Research Group

order is number of previous outputs

Introduction to Hidden Markov Models (HMMs)

Hidden Markov Models

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

DNA Feature Sensors. B. Majoros

Statistical Methods for NLP

Hidden Markov Models 1

Hidden Markov Models

Statistical NLP: Hidden Markov Models. Updated 12/15

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Basic math for biology

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Advanced Data Science

1 What is a hidden Markov model?

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Biochemistry 324 Bioinformatics. Hidden Markov Models (HMMs)

Pair Hidden Markov Models

Conditional Random Field

Hidden Markov Models NIKOLAY YAKOVETS

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Lecture 6: Entropy Rate

Graphical Models Seminar

Introduction to Artificial Intelligence (AI)

Hidden Markov Models

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Hidden Markov Models

HMM part 1. Dr Philip Jackson

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Stephen Scott.

11.3 Decoding Algorithm

Chapter 4: Hidden Markov Models

Transcription:

VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by B Chor (Tel Aviv University) and E Alpaydın (MIT)

Introduction to Hidden Markov Models

A stochastic finite state machine (Markov Chain Model) for weather 3

A Hidden Markov Model for Weather -model -model What can we do MORE with this? Annotation of realization with underlying state (here H / L ) (which alternative model most likely produced which position) 4

Once more MARKOV MODEL Tim Conrad, VL AlDaBi, WT015/16 5

Markov process with a non-hidden observation process stochastic automata Three urns each full of balls of one color S 1 : red, S 2 : blue, S 3 : green O P 0. 5, 0. 2, 0. 3 0. 4 0. 2 0. 1 0. 3 0. 6 0. 1 S 1,S1,S3,S3 O A, PS PS S PS S PS S 1 T 1 a 11 a A 1 13 a 33 0. 5 0. 4 0. 3 0. 8 0. 048 1 3 1 0. 3 0. 2 0. 8 3 3 6

A Plot of 100 observed numbers for the stochastic automata 7

Histogram for the stochastic automata The proportions reflect the stationary distribution of the chain 8

Finally HIDDEN MARKOV MODEL Tim Conrad, VL AlDaBi, WT015/16 9

From Markov To Hidden Markov The previous model assumes that each state can be uniquely associated with an observable event Once an observation is made, the state of the system is then trivially retrieved This model, however, is too restrictive to be of practical use for most realistic problems To make the model more flexible, we will assume that the outcomes or observations of the model are a probabilistic function of each state Each state can produce a number of outputs according to a unique probability distribution, and each distinct output can potentially be generated at any state These are known a Hidden Markov Models (HMM), because the state sequence is not directly observable, it can only be approximated from the sequence of observations produced by the system 10

Hidden Markov Models States are not observable Discrete observations {v 1,v 2,...,v M } are recorded; a probabilistic function of the state Emission probabilities b j (m) P(O t =v m q t =S j ) Example: In each urn, there are balls of different colors, but with different probabilities. For each observation sequence, there are multiple state sequences 11

Hidden Sequence n urns containing colored balls v distinct colors Each urn has a (possibly) different distribution of colors Sequence generation algorithm: 1. (Behind the curtain) Pick initial urn according to some random process. 2. (Behind the curtain) Randomly pick ball from the urn. 3. Show it to the audience and put it back. 4. (Behind the curtain) Select another urn according to random selection process associated with the urn. 5. Repeat steps 2 4 Tim Conrad, VL AlDaBi, WT015/16 12

Hidden Markov Models (T) (E) Tim Conrad, VL AlDaBi, WT015/16 13

Typical questions Tim Conrad, VL AlDaBi, WT015/16 14

HMMs: Main Problems Evaluation Given a particular realization (observations): What is the probability that it has been produced by given HMM? (-> forward OR backward algorithm) Decoding Given a particular realization (observations): What is the most likely STATE (hidden!) sequence that produced this realization? (-> Viterbi algorithm) Training Given a model structure (!) and training data: What are the best model parameters? (-> Maximul-Likelihood Estimation, Baum-Welch=Forward-Backward Algorithm)

MODEL SELECTION Tim Conrad, VL AlDaBi, WT015/16 16

The coin-toss problem Consider the following scenario Assume that you are placed in a room with a curtain Behind the curtain there is a person performing a cointoss experiment This person selects one of several coins, and tosses it: heads (H) or tails (T) The person tells you the outcome (H,T), but not which coin was used each time Your goal is to build a probabilistic model that best explains a sequence of observations O={o1,o2,o3,o4, }={H,T,T,H,, } The coins represent the states; these are hidden because you do not know which coin was tossed each time The outcome of each toss represents an observation A likely sequence of coins may be inferred from the observations, but this state sequence will not be unique 17

The Coin Toss Example 1 coin The Markov model is observable since there is only one state In fact, we may describe the system with a deterministic model where the states are the actual observations (see figure) The model parameter P(H) may be found from the ratio of heads and tails O= H H H T T H S = 1 1 1 2 2 1 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 18

The Coin Toss Example 2 coins 19

The Coin Toss Example 3 coins 20

1, 2 or 3 coins? Which of these models is best? Since the states are not observable, the best we can do is select the model that best explains the data (e.g., Maximum Likelihood criterion) Whether the observation sequence is long and rich enough to warrant a more complex model is a different story, though 21

FROM COINS TO DNA Tim Conrad, VL AlDaBi, WT015/16 22

Hidden Markov Models In 1989 Gary Churchill introduced the use of HMM for DNA-segmentation. CENTRAL IDEAS: The (DNA) string is generated by a system The system can be a number of distinct states The system can change between states with probability T In each state the system emits symbols to the string with probability E

Example: Change points in Lambda-Phage 0.0002 0.9998 CG RICH AT RICH 0.9998 0.0002 A: 0.2462 C: 0.2476 G: 0.2985 T: 0.2077 A: 0.2700 C: 0.2084 G: 0.1981 T: 0.3236 24

Example: Change points in Lambda-Phage 0.0002 0.9998 CG RICH AT RICH 0.9998 0.0002 A: 0.2462 C: 0.2476 G: 0.2985 T: 0.2077 A: 0.2700 C: 0.2084 G: 0.1981 T: 0.3236 25

Hidden Markov Models T(1,2) STATE 1 STATE 2 T(2,3) STATE 3 A: p A_1 T: p T_1 C: p C_1 G: p G_1 A: p A_2 T: p T_2 C: p C_2 G: p G_2 A: p A_3 T: p T_3 C: p C_3 G: p G_3 s = h = TTCACTGTGAACGATCCGAATCGACCAGTACTACGGCACGTTGCCAAAGCGCTTATCTAGC 1111111111111111111111112222222222222333333333333333333333333 26

HMM for gene prediction Intron Donor Acceptor Exon the Markov model: Start codon Stop codon Intergenic q 0 the input sequence: AGCTAGCAGTATGTCATGGCATGTTCGGAGGTAGTACGTAGAGGTAGCTAGTATAGGTCGATAGTACGCGA the gene prediction: exon 1 exon 2 exon 3

28 TRANSITION MATRIX = the probability of a state change: EMISSION PROBABILITY = symbol probability distribution in a certain state ) ( ), ( 1 k h l h P l k T l i ) ( ), ( k h b s P b k E i i HMM Essentials

HMM Essentials INITIAL PROBABILITY of a state : T( 0, k) P( h1 k) sequence of the states visited: h sequence of the generated symbols: s 29

HMM Essentials Probability of the hidden states h: P( h) T(0, h ) T( h1, h2 ) T ( h n 1, h 1 n Probability of generated symbol string s given the hidden states h ) P ( s h ) E ( h, s 1) E ( h 2, s 2 ) E ( h n, s 1 n ) 30

HMM Essentials Joint probability of symbol string s (observations) and hidden states h: P( s, h) P( s h) P( h) 31

32 Theorem of total probability : Most likely (hidden) sequence: n j n j j j j P P P P H H h h h h s h s s ) ( ) ( ), ( ) ( ), ( arg max * h s h h P n H HMM Essentials

ALGORITHMS Tim Conrad, VL AlDaBi, WT015/16 33

34 (1) Probability of a sequence s given a HMM is: (2) The most probable (hidden) sequence is: n j n j j j j P P P P H H h h h h s h s s ) ( ) ( ), ( ) ( ), ( arg max * h s h h P n H Algorithms for HMM computations

How to get probability of sequence s? FORWARD ALGORITHM Tim Conrad, VL AlDaBi, WT015/16 35

What is p(s)? In Markov chains, the probability of a sequence was calculated by the equation : P(s) P(s L s L-1 ) P(s L-1 s L-2 ) (s 2 s 1 )P(s 1 ) P(s 1 ) L i2 t s s i1 i What is the probability P(x) for an HMM?

HMM Recognition For a given model M = {T, E, p} and a given state sequence h 1 h 2 h 3 h L,, the probability of an observation (symbol) sequence s 1 s 2 s 3 s L is P(s h,m) = e_h 1 s 1 e_h 2 s 2 e_h 3 s 3 e_h T s T For a given hidden Markov model M = {T, E, p} the probability of the state sequence h 1 h 2 h 3 h L is (the initial probability of h 1 is taken to be ph 1 ) P(h M) = ph 1 t_h 1 h 2 t_h 2 h 3 t_h 3 h 4 t_h L-1 h L So, for a given HMM, M the probability of an observation sequence s 1 s 2 s 3 s T is obtained by summing over all possible state sequences 37

HMM Recognition (cont.) P(s M) = S P(s h) P(h M) = S h ph 1 e_h 1 s 1 t_h 1 h 2 e_h 2 s 2 t_h 2 h 3 eh 2 s 2 Requires summing over exponentially many paths Can this be made more efficient? 38

HMM Recognition (cont.) Why isn t it efficient? O(2LH L ) For a given state sequence of length L we have about 2L calculations P(h M) = ph 1 t_h 1 h 2 t_h 2 h 3 t_h 3 h 4 t_h L-1 h L P(s h) = e_h 1 s 1 e_h 2 s 2 e_h 3 s 3 eh L s L There are H L possible (hidden) state sequences So, if H=5, and L=100, then the algorithm requires 200x5 100 computations We can use the forward-backward (F-B) algorithm to do things efficiently 39

The FORWARD algorithm Given a sequence s of length n and an HMM with parameters (T,E,p): 1. Create table F of size H x(n+1); 2. Initialize i=0; F(0,0)=1; V(k,0)=0 for k>0; 3. For i=1:n, compute each entry using the recursive relation: F(j,i) = E(j,s(i))* k {F(k,i-1)*T(k,j) } pointer(i,j) = arg max k {V(k,i-1)*T(k,j) } 4. OUTPUT: P(s) = k {F(k,n)} 40

How to get h*? DECODING (VITERBI) Tim Conrad, VL AlDaBi, WT015/16 41

Decoding Most probable path for sequence CGCG Tim Conrad, VL AlDaBi, WT015/16 42

Decoding INPUT : A hidden Markov model M = (T,E,p) and a sequence s S, for which the generating path h = (h 1,, h L ) is unknown. QUESTION : What is the most probable generating path h for s? In general there may be many state sequences that could give rise to any particular sequence of symbols. If we know the identity of h i, then the most probable sequence on i+1,,n does not depend on observations before time i

The VITERBI Dynamic Programming algorithm Given a sequence s of length n and an HMM with parameters (T,E,p): 1. Create table V of size H x(n+1); 2. Initialize i=0; V(0,0)=1; V(k,0)=0 for k>0; 3. For i=1:n, compute each entry using the recursive relation: V(j,i) = E(j,s(i))*max k {V(k,i-1)*T(k,j) } pointer(i,j) = arg max k {V(k,i-1)*T(k,j) } 4. OUTPUT: P(s,h*) = max k {V(k,n)} 5. Trace-back: i=n:1, using: h* i-1 = pointer(i, h* i ) 6. OUTPUT: h*(n) = max k {V(k,n)} Time complexity: O(L S 44 2 ) Space complexity: O(L S )

Some comments In Viterbi, Forward and Backward algorithms : Complexity - Time complexity: O(L Q 2 ) - Space complexity: O(L Q ) Implementation: should be done in log space to avoid underflow errors

NEXT SLIDS NOT RELEVANT FOR EXAM Tim Conrad, VL AlDaBi, WT015/16 46

The Baum-Welch algorithm is a heuristic algorithm for finding a solution to the problem of PARAMETER ESTIMATION Tim Conrad, VL AlDaBi, WT015/16 47

The EXPECTATION MAXIMIZATION algorithm Given a sequence s and an HMM with unknown (T,E): 1. Initialize h, E and T; 2. Given s and h estimate E and T just by counting the symbols; 3. Given s, E and T estimate h e.g. with Viterbi-algorithm; 4. Repeat steps 2 and 3 until some criterion is met. 48

HMM for gene prediction Intron Donor Acceptor Exon the Markov model: Start codon Stop codon Intergenic q 0 the input sequence: AGCTAGCAGTATGTCATGGCATGTTCGGAGGTAGTACGTAGAGGTAGCTAGTATAGGTCGATAGTACGCGA the gene prediction: exon 1 exon 2 exon 3

Mehr Informationen im Internet unter medicalbioinformatics.de/teaching Tim Conrad AG Medical Bioinformatics Weitere Fragen www.medicalbioinformatics.de