Lecture #5. Dependencies along the genome

Similar documents
Markov Chains, Random Walks on Graphs, and the Laplacian

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

Markov Chains Handout for Stat 110

Convex Optimization CMU-10725

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Markov Processes Hamid R. Rabiee

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Hidden Markov Models

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

STOCHASTIC PROCESSES Basic notions

Probability & Computing

Math Homework 5 Solutions

Hidden Markov Models for biological sequence analysis

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

Hidden Markov Models. Ron Shamir, CG 08

HMMs and biological sequence analysis

Stochastic processes and

Markov chains. Randomness and Computation. Markov chains. Markov processes

Lecture 20 : Markov Chains

CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions

Reinforcement Learning

Markov Chains and Stochastic Sampling

Hidden Markov Models

Stochastic Processes (Week 6)

Markov Chains (Part 3)

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Chapter 7. Markov chain background. 7.1 Finite state space

2. Transience and Recurrence

Chapter 16 focused on decision making in the face of uncertainty about one future

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Section 1.7: Properties of the Leslie Matrix

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Machine Learning CMU-10701

MAA704, Perron-Frobenius theory and Markov chains.

Statistical NLP: Hidden Markov Models. Updated 12/15

Biology 644: Bioinformatics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Lecture 9 Classification of States

ISM206 Lecture, May 12, 2005 Markov Chain

Stephen Scott.

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Homework set 2 - Solutions

HIDDEN MARKOV MODELS

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

2 Discrete-Time Markov Chains

Hidden Markov Models 1

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

CSCE 471/871 Lecture 3: Markov Chains and

MCMC: Markov Chain Monte Carlo

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Lecture 2 : CS6205 Advanced Modeling and Simulation

Hidden Markov Models for biological sequence analysis I

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

IEOR 6711: Professor Whitt. Introduction to Markov Chains

Entropy Rate of Stochastic Processes

Classification of Countable State Markov Chains

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6)

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

An Introduction to Entropy and Subshifts of. Finite Type

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Hidden Markov Models. x 1 x 2 x 3 x K

CHAPTER 6. Markov Chains

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

6.842 Randomness and Computation March 3, Lecture 8

Notes on generating functions in automata theory

Computational Genomics and Molecular Biology, Fall

STA 414/2104: Machine Learning

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Info 2950, Lecture 25

18.440: Lecture 33 Markov Chains

Lecture 5: Random Walks and Markov Chain

18.600: Lecture 32 Markov Chains

Countable state discrete time Markov Chains

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Inference in Bayesian Networks

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Lecture Notes: Markov chains

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities


Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Chapter 1. Functions 1.1. Functions and Their Graphs

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

Lecture Notes: Markov chains Tuesday, September 16 Dannie Durand

Linear Classification: Perceptron

Stochastic Processes

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

9 Forward-backward algorithm, sum-product on factor graphs

Some Definition and Example of Markov Chain

Transcription:

Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3., Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger s and Nir Friedman s. Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q() over the alpha bet {A,C,,G}. his is too restrictive for true genomes.. here are special subsequences in the genome, like AA within the regulatory area, upstream a gene. 2. he pattern CG is less common than expected for random sampling. We model such dependencies by Markov chains and hidden Markov model, which we define next. 2

Finite Markov Chain An integer time stochastic process, consisting of a domain D of m> states {s,,s m } and. An m dimensional initial distribution vector ( p(s ),.., p(s m )). 2. An m m transition probabilities matrix M= (m si s j ) For example, D can be the letters {A, C,, G}, p(a) the probability of A to be the st letter in a sequence, and m AG the probability that G follows A in a sequence. 3 Markov Chain (cont.) X X 2 X n- X n For each integer n, a Markov Chain assigns probability to sequences (x x n ) in D n (i.e, x i D) as follows: pxx ((,,... x)) px ( x) px ( xx x ) 2 n i i i i i2 n px () i2 mx i x i n 4 2

Markov Chain (cont.) X X 2 X n- X n Similarly, each X i is a probability distributions over D, which is determined by the initial distribution (p,..,p n ) and the transition matrix M. here is a rich theory which studies the properties of such Markov sequences (X,, X i, ). A bit of this theory is presented next. 5 Matrix Representation A A.9 5.2 B C D B.5.2. 5 C D he transition.3.8 probabilities Matrix M =(m st ) M is a stochastic Matrix: m t st he initial distribution vector (u u m ) defines the distribution of X (p(x =s i )=u u i ). After one move, the distribution is changed to X 2 = X M 6 3

Matrix Representation A B C D A.9 5. 5 Example: if X =(,,, ) then X 2 =(.2,.5,,.3).2 B.5.3 And if X =(,,.5,.5) then X 2 =(,.,.5,.4). C.2.8 D he i-th distribution is X i = X M i- Basic question: what can we tell on the distributi 7 Representation of a Markov Chain as a Digraph C.95 A A. 95 B. 5 D.2.5 B C. 2.5.2.3.8.5.2.3 D.8 Each directed edge AB is associated with the positive transition probability from A to B. 8 4

Properties of Markov Chain states States of Markov chains are classified by the digraph representation (omitting the actual probability values) A, C and D are recurrent states: they are in strongly connected components which are sinks in the graph. B is not recurrent it is a transien Alternative definitions: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient. 9 Another example of Recurrent and ransient States A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back. 5

Irreducible Markov Chains A Markov Chain is irreducible if the corresponding graph is strongly connected ed (and thus all its states are recurrent). E Periodic States E A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. (in the shown graph the period of A is 2). Exercise: All the states in the same strongly connected component have the same period A Markov Chain is periodic if all the states in it have a period k >. It is aperiodic otherwise. 2 6

Ergodic Markov Chains AM Markov chain is ergodic if :. the corresponding graph is strongly connected. 2. It is not peridoic Ergodic Markov Chains are important since they guarantee the corresponding Markovian process converges to a unique distribution, in which each state has a strictly positive probability. 3 Stationary Distributions for Markov Chains Let M be a Markov Chain of m states, and let V = (v,,v m ) be a probability distribution over the m states V = (v,,v m ) is stationary distribution for M if VM=V. (ie, if one step of the process does not change the distribution). V is a stationary distribution V is a non-negative left (row) Eigenvector of M with Eigenvalue. 4 7

Stationary Distributions for a Markov Chain M Exercise: A stochastic matrix always has a real left Eigenvector with Eigenvalue (hint: show that a stochastic matrix has a right Eigenvector with Eigenvalue. Note that the left Eigenvalues of a Matrix are the same as the right Eigenvlues). It can be shown that the above Eigenvector V can be non-negative. Hence each Markov Chain has a stationary distribution. 5 Good Markov chains A Markov Chain is good if the distributions X i, as i : () Converge to a unique distribution, independent of the initial distribution. (2) In that unique distribution, each state has a positive probability. he Fundamental heorem of Finite Markov Chains: A Markov Chain is good the corresponding graph is ergodic. We will prove the part, by showing that nonergodic Markov Chains are not good. 6 8

Examples of Bad Markov Chains A Markov chain is bad if either :. It does not converge to a unique distribution (independent on the initial iti distribution). ib ti 2. It does converge to u.d., but some states in this distribution have zero probability. 7 Bad case : Mutual Unreachabaility Consider two initial distributions: a) p(x =A)= (p(x = x)= if x A). b) p(x = C) = In case a), the sequence ence will stay at Afore forever. er In case b), it will stay in {C,D} for ever. Fact : If G has two states which are unreachable from each other, then {X i } cannot converge to a distribution which is independent on the initial distribution. 8 9

Bad case 2: ransient States Once the process moves from B to D, it will never come back. 9 Bad case 2: ransient States Fact 2: For each initial distribution, ds,with probability a transient state will be visited only a finite number of times. X Proof: Let A be a transient state, and let X be the set of states from which A is unreachable. It is enough to show that, starting from any state, with probability a state in X is reached after a finite number of steps (Exercise: complete the proof) 2

Corollary: A good Markov Chain is irreducible 2 Bad case 3: Periodic Markov Chains E Recall: A Markov Chain is periodic if all the states in it have a period k >. he above chain has period 2. In the above chain, consider the initial distribution p(b)=. hen states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps. 22

Bad case 3: Periodic States E Fact 3: In a periodic Markov Chain (of period k >) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions X i does not converge as i. Corollary: A good Markov Chain is not periodic 23 he Fundamental heorem of Finite Markov Chains: We have proved that non-ergodic Markov Chains are not good. A proof of the opposite direction below is based on Perron-Frobenius theory on nonnegative matrices, and is beyond the scope of this course: If a Markov Chain is ergodic, then. It has a unique stationary distribution vector V >,, which is a right (row) Eigenvector of the transition matrix. 2. For any initial distribution, the distributions X i, as i, converges to V. 24 2

Use of Markov Chains in Genome search: Modeling CpG Islands In human genomes the pair CG often transforms to (methyl-c) G which often transforms to G. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. hese areas are called CpG islands (-Cphosphate-G-). 25 Example: CpG Island (Cont.) We consider two questions: Question : Given a short stretch of genomic data, does it come from a CpG island? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length? We solve the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,} but different transition probabilities: 26 3

Example: CpG Island (Cont.) he + model: Use transition matrix M + = (m + st), Where: m + st = (the probability that t follows s in a CpG island) he - model: Use transition matrix M - = (m - st), Where: m - st = (the probability that t follows s in a non CpG island) 27 Example: CpG Island (Cont.) With this model, to solve Question we need to decide whether a given short sequence of letters is more likely to come from the + model or from the model. his is done by using the definitions of Markov Chain, in which the parameters are determined by known data and the log odds-ratio test. 28 4

Question : Using two Markov chains M + (For CpG islands): We need to specify p + (x i x i- ) where + stands for CpG Island. From Durbin et al we have: X i X i- A C G A.8.27.43.2 C p + (C.27 p + (.7 C) 4 C) G p.6 + (C G).8 p +(C ) p + (G G) p + ( G) (Recall: rows must add up to p p one; + (G columns + ( need n ) ) 29 Question : Using two Markov chains M - (For non-cpg islands): and for p - (x i x i- ) (where - stands for Non CpG island) we have: X i A C G A.3.2.29.2 X i- C p - p.7 7 -.32 (C C ( C 8 ) ) G.25.8 p - (C G ) p - (C p - (G G ) p - ( G ) p - (G p - ( ) 3 5

Discriminating between the two models X X 2 X L- X L Given a string x=(x(.x L ), now compute the ratio L p( x model) RAIO p( x model) p i L p i ( x x) i i ( x x) i i If RAIO>, CpG island is more likely. Actually the log of this ratio is computed: Note: p + (x x ) is defined for convenience as p + (x ). p - (x x ) is defined for convenience as p - (x ) 3 Log Odds-Ratio test aking logarithm yields p(x...x log Q log p(x...x ) L ) p(x i x i ) L l If logq >, then + is more likely (CpG island). If logq <, then - is more likely (non-cpg island). i log p(x x ) i i 32 6

Where do the parameters (transition- probabilities) come from? Learning from complete data, namely, when the label is given and every x i is measured: Source: A collection of sequences from CpG islands, and a collection of sequences from non- CpG islands. Input: uples of the form (x,, x L, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (X i =a, X i- =b) with label +, and with label -, say the numbers are N ba,+ and N ba,-. 33 Maximum Likelihood Estimate (MLE) of the parameters (using labeled data) X X 2 X L- X L he needed parameters are: P + (x ), p + (x i x i- ), p - (x ), p - (x i x i- ) he ML estimates are given by: Na, p ( X a) Where N x,+ is the number of N x, times letter x appear in CpG x islands in the dataset. ( Xi a Xi p N b ) ba, N bx, x Where N bx,+ is the number of times letter x appears after letter b in CpG islands in the dataset. Using MLE is justified when we have a large sample. he numbers appearing in our tables (taken from Durbin et al p. 5) are based on 6, nucleotides. 34 7

Question 2: Finding CpG Islands Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): A + A - C + G + + C - G - - he problem is that we don t know the sequence of states which are traversed, but just the sequence of letters. herefore we use here Hidden Markov 36 Hidden Markov Model M M M M S S 2 S L- S L x x 2 X L- x L A Markov chain (s,,s L ): ps (,, s) pss ( ) L i i i and for each state s and a symbol x we have p(x i =x S Application in communication: message sent tis (s,,s m ) but we receive (x,,x m ). Compute what is the most likely message sent. Application in speech recognition: word said is (s,,s m ) but we recorded (x,,x m ). Compute what is the most likely word said. L 37 8

Hidden Markov Model M M M M S S 2 S L- S L x x 2 X L- x L Notations: Markov Chain transition probabilities: p(s i+ = t S i Emission probabilities: p(x i = b S i = s) = e s (b) L For Markov Chains we know: p ps sl psi si i What is p(s,x) = p(s,,s L ;x,,x L )? s () (,, ) ( ) 38 Hidden Markov Model M M M M S S 2 S L- S L x x 2 X L- x L p(x i = b S i = s) = e s (b), means that the probability of x i depends only on the probability of s i. Formally, this is equivalent to the conditional independence d assumption: p(x i =x i x,..,x i-,x i+,..,x L,s,..,s i,..,s L ) = e si (x i ) hus sx p(, ) ps (,, s; x,..., x) ps ( s ) e ( x) L L i i si i i L 39 9