Models of Molecular Evolution

Similar documents
Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 150: Spring 2007

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Finite-Horizon Statistics for Markov chains

EVOLUTIONARY DISTANCES

An applied statistician does probability:

Linear Algebra Solutions 1

Lecture 21. David Aldous. 16 October David Aldous Lecture 21

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Markov Reliability and Availability Analysis. Markov Processes

Continuous Time Markov Chains

Chapter 5. Continuous-Time Markov Chains. Prof. Shun-Ren Yang Department of Computer Science, National Tsing Hua University, Taiwan

(implicitly assuming time-homogeneity from here on)

Units. Exploratory Data Analysis. Variables. Student Data

IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2011, Professor Whitt Class Lecture Notes: Tuesday, March 1.

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

16 : Markov Chain Monte Carlo (MCMC)

Birth and Death Processes. Birth and Death Processes. Linear Growth with Immigration. Limiting Behaviour for Birth and Death Processes

ELEMENTARY LINEAR ALGEBRA

Markov Chain Monte Carlo

QUEUING MODELS AND MARKOV PROCESSES

Lecture Notes: Markov chains Tuesday, September 16 Dannie Durand

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

Perron Frobenius Theory

Continuous-Time Markov Chain

Introduction to Restricted Boltzmann Machines

Stochastic optimization Markov Chain Monte Carlo

Stochastic process. X, a series of random variables indexed by t

Stochastic Processes

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

Monte Carlo Methods. Leon Gu CSD, CMU

MATH 307 Test 1 Study Guide

Quantum walk algorithms

Notes on Markov Networks

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

University of Wisconsin - Madison Department of Statistics. October 7, 2004

1 Ways to Describe a Stochastic Process

Markov Processes. Stochastic process. Markov process

MATH 446/546 Test 2 Fall 2014

Introduction. Genetic Algorithm Theory. Overview of tutorial. The Simple Genetic Algorithm. Jonathan E. Rowe

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Chapter 2 Class Notes Words and Probability

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Info 2950, Lecture 25

Section 9.2: Matrices.. a m1 a m2 a mn

Evolutionary Models. Evolutionary Models

Matrix Multiplication

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Mutation models I: basic nucleotide sequence mutation models

Biology 644: Bioinformatics

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

stochnotes Page 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Markov Chains Absorption Hamid R. Rabiee

Markov Chains Handout for Stat 110

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

Stochastic modelling of epidemic spread

Linear Dynamical Systems (Kalman filter)

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1

LIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

ISyE 6650 Probabilistic Models Fall 2007

Compartmental modeling

Markov Model. Model representing the different resident states of a system, and the transitions between the different states

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

3 Probabilistic Turing Machines

Assumptions and Transformations

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Markov Chains and MCMC

STA 4273H: Statistical Machine Learning

Lecture 3: Markov chains.

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Continuous Time Markov Chain Examples

SMSTC (2007/08) Probability.

Markov Chains and Transition Probabilities

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

The Transition Probability Function P ij (t)

STAT 380 Continuous Time Markov Chains

Probability and Stochastic Processes Homework Chapter 12 Solutions

Reading for Lecture 13 Release v10

ECEN 689 Special Topics in Data Science for Communications Networks

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Stochastic modelling of epidemic spread

Reconstruire le passé biologique modèles, méthodes, performances, limites

Sparse Interactions: Identifying High-Dimensional Multilinear Systems via Compressed Sensing

Linear Programming Methods

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Birth-Death Processes

What is the Matrix? Linear control of finite-dimensional spaces. November 28, 2010

Continuous time Markov chains

Discrete time Markov chains. Discrete Time Markov Chains, Definition and classification. Probability axioms and first results

Matrices and Vectors

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Transcription:

Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 / 14

Molecular Evolution Features Features of Molecular Evolution 1 Possible multiple changes on edges 2 Transition/transversion bias 3 Non-uniform base composition 4 Rate variation across sites 5 Dependence among sites 6 Codon position 7 Protein structure Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 2 / 14

Molecular Evolution A Famous Quote About Models Models Essentially, all models are wrong, but some are useful. George Box Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 3 / 14

Probability Models A probabilistic framework provides a platform for formal statistical inference Examining goodness of fit can lead to model refinement and a better understanding of the actual biological process Model refinement is a continuing area of research Most common models of molecular evolution treat sites as independent These common models just need to describe the substitutions among four bases at a single site over time. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 4 / 14

The Markov Property Use the notation X (t) to represent the base at time t. Formal statement: P {X (s + t) = j X (s) = i, X (u) = x(u) for u < s} = P {X (s + t) = j X (s) = i} Informal understanding: given the present, the past is independent of the future If the expression does not depend on the time s, the Markov process is called homogeneous. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 5 / 14

Rate Matrix Positive off-diagonal rates of transition Negative total on the diagonal Row sums are zero Example Q = {q ij } = 1.1 0.3 0.6 0.2 0.2 1.1 0.3 0.6 0.4 0.3 0.9 0.2 0.2 0.9 0.3 1.4 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 6 / 14

Alarm Clock Description If the current state is i, the time to the next event is exponentially distributed with rate q ii defined to be q i. Given a transition occurs from state i, the probability that the transition is to state j is proportional to q ij, namely q ij / k i q ik. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 7 / 14

Path Probability Density Calculation Example: Begin at A, change to G at time 0.3, change to C at time 0.8, and then no more changes before time t = 1. P {path} = P {begin at A} ( 1.1e (1.1)(0.3) 0.6 ) 1.1 ( 0.9e (0.9)(0.5) 0.3 ) 0.9 (e (1.1)(0.2)) Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 8 / 14

Transition Probabilities For a continuous time Markov chain, the transition matrix whose ij element is the probability of being in state j at time t given the process begins in state i at time 0 is P(t) = e Qt. A probability transition matrix has non-negative values and each row sums to one. Each row contains the probabilities from a probability distribution on the possible states of the Markov process. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 9 / 14

Examples P(0.1) = P(1) = 0 B @ 0 B @ 0.897 0.029 0.055 0.019 0.019 0.899 0.029 0.053 0.037 0.029 0.916 0.019 0.019 0.080 0.029 0.872 0.407 0.190 0.276 0.126 0.126 0.464 0.190 0.219 0.184 0.190 0.500 0.126 0.126 0.329 0.190 0.355 1 C A P(0.5) = 1 C A P(10) = 0 B @ 0 B @ 0.605 0.118 0.199 0.079 0.079 0.629 0.118 0.174 0.132 0.118 0.671 0.079 0.079 0.261 0.118 0.542 0.200 0.300 0.300 0.200 0.200 0.300 0.300 0.200 0.200 0.300 0.300 0.200 0.200 0.300 0.300 0.200 1 C A 1 C A Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 10 / 14

The Stationary Distribution Well behaved continuous-time Markov chains have a stationary distribution, often designated π (not the constant close to 3.14 related to circles). When the time t is large enough, the probability P ij (t) will be close to π j for each i. (See P(10) from earlier.) The stationary distribution can be thought of as a long-run average over a long time, the proportion of time the state spends in state i converges to π i. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 11 / 14

Parameterization The matrix Q = {q ij } is typically parameterized as q ij = r ij π j /µ for i j which guarantees that π will be the stationary distribution when r ij = r ji. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 12 / 14

Scaling The expected number of substitutions per unit time is the average rate of substitution which is a weighted average of the rates for each state weighted by their stationary distribution. µ = i π i q i If the matrix Q is reparameterized so that all elements are divided by µ, then the unit of measurement becomes one substitution. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 13 / 14

Time-reversibility The matrix Q is the matrix for a time-reversible Markov chain when π i q ij = π j q ji for all i and j. That is the overall rate of substitutions from i to j equals the overall rate of substitutions from j to i for every pair of states i and j. Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 14 / 14