Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Similar documents
Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models for biological sequence analysis

CSCE 471/871 Lecture 3: Markov Chains and

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

HIDDEN MARKOV MODELS

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models

Markov Chains and Hidden Markov Models. = stochastic, generative models

Introduction to Machine Learning CMU-10701

Stephen Scott.

Hidden Markov Models

Lecture 9. Intro to Hidden Markov Models (finish up)

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ron Shamir, CG 08

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

11.3 Decoding Algorithm

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models (I)

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

6 Markov Chains and Hidden Markov Models

Hidden Markov Models. Terminology, Representation and Basic Problems

HMM: Parameter Estimation

Statistical Sequence Recognition and Training: An Introduction to HMMs

Advanced Data Science

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. x 1 x 2 x 3 x K

Today s Lecture: HMMs

Lecture 5: December 13, 2001

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Pair Hidden Markov Models

EECS730: Introduction to Bioinformatics

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Biological sequence analysis. Cambridge University Press, ISBN (Chapter 3)

Hidden Markov Models and some applications

Hidden Markov Models

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Hidden Markov Models

Data Mining in Bioinformatics HMM

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Basic math for biology

STA 414/2104: Machine Learning

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models. Terminology and Basic Algorithms

Linear Dynamical Systems (Kalman filter)

Hidden Markov Models and some applications

Statistical Methods for NLP

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models,99,100! Markov, here I come!

STA 4273H: Statistical Machine Learning

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models

CS711008Z Algorithm Design and Analysis

Markov chains and Hidden Markov Models

Hidden Markov Models. x 1 x 2 x 3 x N

Parametric Models Part III: Hidden Markov Models

Hidden Markov Modelling

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos

BMI/CS 576 Fall 2016 Final Exam

Pairwise alignment using HMMs

Stephen Scott.

L23: hidden Markov models

Outline of Today s Lecture

Biology 644: Bioinformatics

p(d θ ) l(θ ) 1.2 x x x

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Hidden Markov Models

Hidden Markov Models

Markov Models & DNA Sequence Evolution

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Lecture 11: Hidden Markov Models

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Lecture 7 Sequence analysis. Hidden Markov Models

Grundlagen der Bioinformatik, SS 08, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models 1

O 3 O 4 O 5. q 3. q 4. Transition

Dynamic Approaches: The Hidden Markov Model

Chapter 4: Hidden Markov Models

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

HMM : Viterbi algorithm - a toy example

Hidden Markov models in population genetics and evolutionary biology

Transcription:

Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

music recognition deal with variations in - actual sound - timing

deal with variations in - actual sound actual base - timing insertions/deletions application: gene finding

basic questions start with sequence of observations probabilistic model of our domain does this sequence belong to a certain family? Markov chains can we say something about the internal structure? (indirect observation) HMM: Hidden Markov Models

introduction discrete time discrete space 0.3 A 0.3 0.4 0.2 B no state history: present state only states, transitions 1 C 0.8 P(X) P(X,Y) P(X Y) X and Y X, given Y

Markov model t AA A B model M = (Q,P,T) states Q initial probabilities p x transition probabilities t xy matrix / graph t AC first order: no history C observation X sequence of states probability ( observation given the model )

example 0.7 A 0.3 0.2 B 0.4 0.8 C 0.6 Q = { A, B, C } P = ( 1, 0, 0 ) T = unique starting state A.7.3 0 0.2.8.4 0.6 0.6 0.3 0.5 P( AABBCCC M ) = A 0.4 B 0.6 C 1.7.3.2.8.6.6 A A B B C C C 0.1 0.5 1 7 3 2 8 6 6 10-6 = 1.2 10-2 vs 1 6 4 3 6 5 5 10-6 = 2.2 10-2

Markov model t AA A B initial state x 0 fixed ~ initial probabilities final state [not in this picture] t AC t 0A 0 C t 0C t 0B small values: underflow

comparing models M1 best explained by which model? P(X M1) vs. P(X M2) P(M1 X) vs. P(M2 X)!! M2 Bayes: P(A B) = P(B A) P(A) / P(B) P(M1 X) P(X M1) P(M1) = P(M2 X) P(X M2) P(M2)

bases are not random motto

observed frequencies A C island non island application: CpG islands + A C G T A 0.180 0.274 0.426 0.120 C 0.171 0.368 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.384 0.182 - A C G T A 0.300 0.205 0.285 0.210 C 0.322 0.298 0.078 0.302 G 0.248 0.246 0.298 0.208 T 0.177 0.239 0.292 0.292 G T consecutive CG pair CG TG mostly rare, although islands occur signal (e.g.) promotor regions

basic questions observation: DNA sequence model: CpG islands / non-islands does this sequence belong to a certain family? Markov chains is this a CpG island (or not)? can we say something about the internal structure? HMM: Hidden Markov Models where are the CpG islands?

application: CpG islands + A C G T A 0.180 0.274 0.426 0.120 C 0.171 0.368 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.384 0.182 island - A C G T A 0.300 0.205 0.285 0.210 C 0.322 0.298 0.078 0.302 G 0.248 0.246 0.298 0.208 T 0.177 0.239 0.292 0.292 non island score X = ACGT 0.274 0.274 0.125 0.205 0.078 0.208 = 2.82

application: CpG islands bits (log 2 ) LLR A C G T A -0.74 0.42 0.58-0.80 C -0.91 0.30 1.81-0.69 G -0.62 0.46 0.33-0.73 T -1.17 0.57 0.39-0.68 log-score X = ACGT log 0.274 0.274 0.125 = 0.42 + 1.81 0.73 = 1.50 0.205 0.078 0.208

CpG Log-Likelihood Ratio LLR A C G T A -0.74 0.42 0.58-0.80 C -0.91 0.30 1.81-0.69 G -0.62 0.46 0.33-0.73 T -1.17 0.57 0.39-0.68 LLR(ACGT) = 0.42+1.81 0.73 = 1.50 ( 0.37 per base ) is a (short) sequence a CpG island? compare with observed data (normalized for length) where (in long sequence) are CpG islands? first approach: sliding window length of window?

empirical data is a (short) sequence a CpG island? compare with observed data (normalized for length)

where (in long sequence) are CpG islands? first approach: sliding window CpGplot ACCGATACGATGAGAATGAGCAATGTAGTGAATCGTTTCAGCTACT CTCTATCGTAGCATTACTATGCAGTCAGTGATGCGCGCTAGCCGCG TAGCTCGCGGTCGCATCGCTGGCCGTAGCTGCGTACGATCTGCTGT ACGCTGATCGGAGCGCTGCATCTCAACTGACTCATACTCATATGTC TACATCATCATCATTCATGTCAGTCTAGCATACTATTATCGACGAC TGATCGATCTGACTGCTAGTAGACGTACCGAGCCAGGCATACGACA TCAGTCGACT

CpGplot observed vs. expected percentage putative islands Islands of unusual CG composition EMBOSS_001 from 1 to 286 Observed/Expected ratio > 0.60 Percent C + Percent G > 50.00 Length > 50 Length 114 (51..164)

hidden Markov model where (in long sequence) are CpG islands? second approach: hidden Markov model

What is a hidden Markov model? Sean R Eddy Nature Biotechnology 22, 1315-1316 (2004) Eddy (2004)

weather emission probabilities P( )=0.1 P( )=0.2 P( )=0.7 0.6 0.2 H 0.3 0.4 M 0.1 0.5 0.1 0.4 P( )=0.3 P( )=0.4 P( )=0.3 p H = 0.4 p M = 0.2 p L = 0.4 initial probabilities transition probabilities L P( )=0.6 P( )=0.3 P( )=0.1 0.4 observed weather vs. pressure

weather (0.1, 0.2, 0.7) p H = 0.4 p M = 0.2 p L = 0.4 0.6 0.2 H 0.3 0.4 M 0.1 0.5 0.1 0.4 L (0.3, 0.4, 0.3) (,, ) ( R, C, S ) 0.4 (0.6, 0.3, 0.1) P( RCCSS HHHHH ) = 1 2 2 7 7 = 196 (x10-5 ) P( RCCSS MMMMM ) = 3 4 4 3 3 = 432 (x10-5 ) P( RCCSS, HHHHH ) = 4 1 6 2 6 2 6 7 6 7 = 1016 (x10-7 ) P( RCCSS, MMMMM ) = 2 3 2 4 2 4 2 3 2 3 = 14 (x10-7 )

what we see x y hidden Markov model model M = (Σ,Q,T) states Q transition probabilities t AA e Ax e Ay observation observe states indirectly emission probabilities hidden A t AC C underlying process B probability observation given the model? there may be many state seq s

HMM main questions observation X Σ* t pq probability of this observation? most probable state sequence? how to find the model? training

most probable state vs. optimal path probability! 0.4 0.4 1 0.7 1 1 0.5 1 0.6 0.6 0.5 0.3 1 probability of state * most probable state (over all state sequences) posterior decoding forward & backward probabilities * most probable path (= single state sequence) Viterbi

probability of observation dynamic programming: probability ending in state x i % A % B % C

probability ending in state probability of observation forward probability

weather (0.1, 0.2, 0.7) p H = 0.4 p M = 0.2 p L = 0.4 0.6 0.2 H 0.3 0.4 M 0.1 0.5 0.1 0.4 L (0.3, 0.4, 0.3) (,, ) ( R, C, S ) (0.6, 0.3, 0.1) 0.4 P( RCCSS ) = P( RC ) 1:R 2:C H 0 41 = 4 (46+64+241)2 = 144 (x10-4 ) M 0 23 = 6 (43+62+245)4 = 576 (x10-4 ) L 0 46 = 24 (41+64+244)3 = 372 (x10-4 ) 0 1

posterior decoding i i-th state equals q A % B forward backward

HMM main questions observation X Σ* most probable state sequence t pq again: cannot try all possibilities Viterbi probability of this observation? most probable state sequence? how to find the model? training

Viterbi algorithm most probable state sequence for observation (1) dynamic programming: probability ending in state x i v p (i) % A % B % C

Viterbi algorithm (2) traceback: most probable state sequence start with final maximum x L A B C

CpG islands ctd. 0.999 estimates 8 states A + vs A - unique observation each state + A C G T A 0.180 0.274 0.426 0.120 C 0.171 0.368 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.384 0.182 0.001 0.00001 - A C G T A 0.300 0.205 0.285 0.210 C 0.322 0.298 0.078 0.302 G 0.248 0.246 0.298 0.208 T 0.177 0.239 0.292 0.292 0.99999

dishonest casino dealer

dishonest casino dealer

dishonest casino dealer Observation 366163666466232534413661661163252562462255265252266435353336 Viterbi LLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF Forward FFLLLLLLLLLLLLFFFFFFFFLFLLLFLLFFFFFFFFFFFFFFFFFFFFLFFFFFFFFF Posterior (total) LLLLLLLLLLLLFFFFFFFFFLLLLLLFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

training sequences X (i) optimize score state sequences known count transitions pq count emissions b in p divide by total transitions in p emissions in q Parameter estimation Laplace correction

Baum-Welch state sequences unknown Baum-Welch training based on model expected number of transitions, emissions build new (better) model & iterate sum over all training sequences X sum over all positions i sum over all training sequences X sum over all positions i with x i =b

concerns: guaranteed to converge target score, not Θ unstable solutions! local maximum tips: repeat for several initial Θ start with meaningful Θ Viterbi training (an alternative) determine optimal paths recompute as if paths known score may decrease! Baum-Welch training