What Is Conservation?

Similar documents
How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Lecture Notes: Markov chains

Lecture 4. Models of DNA and protein change. Likelihood methods

Reading for Lecture 13 Release v10

Letter to the Editor. Department of Biology, Arizona State University

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Phylogenetic Gibbs Recursive Sampler. Locating Transcription Factor Binding Sites

Quantifying sequence similarity

A Phylogenetic Gibbs Recursive Sampler for Locating Transcription Factor Binding Sites

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Maximum Likelihood in Phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Evolutionary Models. Evolutionary Models

Lab 9: Maximum Likelihood and Modeltest

EVOLUTIONARY DISTANCES

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Dr. Amira A. AL-Hosary

Consensus methods. Strict consensus methods

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Nucleotide substitution models

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Evolutionary Analysis of Viral Genomes

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Processes of Evolution

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Inferring Speciation Times under an Episodic Molecular Clock

Chapter 7: Models of discrete character evolution

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Lecture 4. Models of DNA and protein change. Likelihood methods

Estimating Divergence Dates from Molecular Sequences

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Constructing Evolutionary/Phylogenetic Trees

The Phylo- HMM approach to problems in comparative genomics, with examples.

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Phylogenetics. BIOL 7711 Computational Bioscience

Molecular evolution 2. Please sit in row K or forward

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Mutation models I: basic nucleotide sequence mutation models

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Phylogenetic Inference using RevBayes

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Molecular Evolution and Phylogenetic Tree Reconstruction

Constructing Evolutionary/Phylogenetic Trees

Taming the Beast Workshop

Maximum Likelihood in Phylogenetics

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Modeling Noise in Genetic Sequences

Using algebraic geometry for phylogenetic reconstruction

Metric learning for phylogenetic invariants

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Phylogenetic inference

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Phylogenetic Tree Reconstruction

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

Inferring Molecular Phylogeny

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Natural selection on the molecular level

AGREE or DISAGREE? What s your understanding of EVOLUTION?

Molecular Evolution, course # Final Exam, May 3, 2006

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

Understanding relationship between homologous sequences

Lecture 6 Phylogenetic Inference

Distances that Perfectly Mislead

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Computational Biology and Chemistry

Lecture Notes: BIOL2007 Molecular Evolution

A Statistical Test of Phylogenies Estimated from Sequence Data

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

Theory of Evolution Charles Darwin

Predicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Reconstruire le passé biologique modèles, méthodes, performances, limites

Lecture 11 Friday, October 21, 2011

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

X X (2) X Pr(X = x θ) (3)

Transcription:

What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation?

Lee A. Newberg, page 2 Definition Possibilities Sequence is said to be conserved across species... Parsimony if there are few base mismatches. Statistical #1 if the best model shows reduced phylogenetic distances between the species. Statistical #2 if the best model requires the incorporation of selection pressures. Statistical #1 vs. Statistical #2 They aren t necessarily the same!

Lee A. Newberg, page 3 Example: Selection Is Ten Times as Significant as Mutation Mutation: Suppose background mutation rate is 1% per 10 6 years, with each of A, C, G, and T equally likely to mutate. Selection: Suppose that after 10 5 years, the expected number of descendents of a C genotype (or G or T genotype) is 1% less than the expected number of A genotype descendents. Change in Equilibrium With a statistical model that incorporates selection pressures, we can compute the population equilibrium exactly: No Change in Mutation Rate (A,C,G,T) (0.90, 0.03, 0.03, 0.03). (1) However, even with these selection pressures and this skew equilibrium, we expect approximately one in 10 8 nucleotides to mutate each year.

Lee A. Newberg, page 4 Conclusion? Conservation would better be used to indicate the nonuniformity of an equilibrium distribution rather than a reduced rate of substitution. But This Isn t the Popular Definition So why do folks reduce the rate of mutation?

Lee A. Newberg, page 5 Parsimony: Counting Mismatches A skew distribution reduces the number of mismatches. Example With equilibrium distribution, e.g., (A,C,G,T) = (0.7, 0.1, 0.1, 0.1), infinitely evolutionarily distant species show joint probability distribution of 0.49 0.07 0.07 0.07 0.07 0.01 0.01 0.01 0.07 0.01 0.01 0.01 0.07 0.01 0.01 0.01, (2) regardless of the nucleotide mutation model. This is 48% mismatches, compared to 75% for neutral sites. Based upon parsimony criteria the species appear closer!

Lee A. Newberg, page 6 Statistician would say previous example shows statistical independence still infinitely distant. But,.... Statistical Phylogeny of Mixed Distributions First codon positions are conserved. Suppose first codon positions come in four kinds: A predominant with equilibrium (0.7, 0.1, 0.1, 0.1) and also C predominant, G predominant, and T predominant. For infinitely evolutionarily distant species, the joint probability distribution for the A predominant kind is as before: 0.49 0.07 0.07 0.07 0.07 0.01 0.01 0.01 0.07 0.01 0.01 0.01 0.07 0.01 0.01 0.01, (3) regardless of the nucleotide mutation model, and likewise for C predominant, G predominant, and T predominant with rows and columns appropriately permuted.

Lee A. Newberg, page 7 Statistical Phylogeny of Mixed Distributions, cont d If each of the four kinds is equally likely than the mixed joint distribution is: 0.13 0.04 0.04 0.04 0.04 0.13 0.04 0.04 0.04 0.04 0.13 0.04 0.04 0.04 0.04 0.13 This is the joint distribution one would get from: the model of Jukes & Cantor (1969); or. (4) the model of Felsenstein (1981) with uniform nucleotide distribution; or the model of Hasegawa et al. (1985) with uniform nucleotide distribution and a transition / transverion ratio of κ = 1. Regardless, the implied phylogenetic distance is 0.7662, not infinite. Based upon statistical criteria the species appear closer!

Lee A. Newberg, page 8 Conclusion? Because of a focus on mismatch counts in evolutionarily distant species; and/or a focus on mixtures of distributions folks have been misled (?!) into believing that conservation reduces the rate of mutation.

Lee A. Newberg, page 9 Where s the Math? (Newberg, 2005) Mutation Model In a short generation time ǫ the nucleotide substitution matrix won t be very different from the identity: I + ǫr. (5) For example, R AC = R AG = R AT = 10 8 /3 and ǫ = 0.02. Selection Model In a short generation time ǫ the selection model matrix won t be very different from the identity: I + ǫs. (6) For example, S CC = S GG = S TT = 10 7 and ǫ = 0.02.

Lee A. Newberg, page 10 Other Time Periods Each generation has a chance to mutate and then a chance to be selected out. Repeating for a time t gives M t = [(I + ǫr)(i + ǫs)] t/ǫ. (7) Starting with an ancestor with distribution β, the joint distribution with a descendent is given by J t = D β M t 1D β M t 1 T, (8) where β A 0 0 0 0 β D β = C 0 0, and (9) 0 0 β G 0 0 0 0 β T 1 = (1, 1, 1, 1). (10)

Lee A. Newberg, page 11 Off-Diagonal Elements of J t For closely related species, the expected number of nucleotide mismatches is proportional to the evolutionary distance. This constant of proportionality is the mutation rate relative to background. Equilibrium from Selection Pressures If, due to selection pressures, the equilibrium changes from β to θ, then we can show that ( ) Jt ods ods ( D θ R ), (11) t t=0 where ods( ) means off-diagonal sum. (Note, ǫ and S drop out.) OrthoGibbs, PhyloScan, etc. Note that even if S is not known, so long as θ is known (or estimated) we can calculate Formula 11. (Recall that R depends on only the background model.)

Lee A. Newberg, page 12 Calculating the Mutation Rate ods ( Jt t ) ods ( D θ R ), (12) t=0 If R is the model of Jukes & Cantor (1969) then this gives 1, regardless of the selection matrix S and the selection-sensitive distribution θ. With the model of Hasegawa et al. (1985), when the junk-dna equilibrium, (β A,β T,β C,β G ), equals (0.3, 0.3, 0.2, 0.2) and κ equals 3, the overall instantaneous rate of Formula 12 will fall in the interval [0.901, 1.148], (13) regardless of the selection matrix S and the selection-sensitive distribution θ. For reasonable background models, the number of mismatches between closely related species is nearly the same when considering functional vs. junk positions

Lee A. Newberg, page 13 Extreme Selection The analysis above discusses a fitness time scale of 10 7 years. Q. What if the selection time scale is one to a few generations? A. Mutation rates can go down. The formula for the mutation rate, with error term, is: ( ) Jt ods = ods ( D θ R ) [1 + O (ǫ(r + S))], (14) t t=0 Applicable to TFBSs? Extreme selection also gives an extreme distribution for θ, e.g., Do we see that? Conclusion? θ = (0.9999990, 0.0000003, 0.0000003, 0.0000003). (15) For TFBSs, selective pressures are sufficiently subtle, and conservation does not significantly affect the mutation rate.

Lee A. Newberg, page 14 References Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol, 17 (6), 368 376. PubMed 7288891. Hasegawa, M., Kishino, H. & Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol, 22 (2), 160 174. PubMed 3934395. Jukes, T. H. & Cantor, C. (1969) Evolution of protein molecules. In Mammalian Protein Metabolism, (Munro, H. M., ed.), vol. 3,. Academic Press. New York, NY pp. 21 132. Newberg, L. A. Selection pressures do not significantly affect mutation rates. In preparation.