Sequence Divergence & The Molecular Clock. Sequence Divergence

Similar documents
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Probabilistic modeling and molecular phylogeny

Evolutionary Analysis of Viral Genomes

Advanced topics in bioinformatics

Practical Bioinformatics

Aoife McLysaght Dept. of Genetics Trinity College Dublin

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Understanding relationship between homologous sequences

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

7. Tests for selection

Introduction to Molecular Phylogeny

part 3: analysis of natural selection pressure

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Supplementary Information for

Protein Threading. Combinatorial optimization approach. Stefan Balev.

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab)

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Natural selection on the molecular level

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Lecture Notes: BIOL2007 Molecular Evolution

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Concepts and Methods in Molecular Divergence Time Estimation

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

Lecture 15: Realities of Genome Assembly Protein Sequencing

Constructing Evolutionary/Phylogenetic Trees

The Trigram and other Fundamental Philosophies

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Crick s early Hypothesis Revisited

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site

Proteins: Characteristics and Properties of Amino Acids

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

Estimating Evolutionary Trees. Phylogenetic Methods

SUPPLEMENTARY DATA - 1 -

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

8/23/2014. Phylogeny and the Tree of Life

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

D. Incorrect! That is what a phylogenetic tree intends to depict.

types of codon models

Dr. Amira A. AL-Hosary

SENCA: A codon substitution model to better estimate evolutionary processes

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Get started on your Cornell notes right away

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Codon Distribution in Error-Detecting Circular Codes

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Electronic supplementary material

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

codon substitution models and the analysis of natural selection pressure

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

FUNDAMENTALS OF MOLECULAR EVOLUTION

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

The Journal of Animal & Plant Sciences, 28(5): 2018, Page: Sadia et al., ISSN:

Gene expression differences in human and chimpanzee cerebral cortex

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Lecture 11 Friday, October 21, 2011

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Evolutionary Change in Nucleotide Sequences. Lecture 3

Using algebraic geometry for phylogenetic reconstruction

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

C3020 Molecular Evolution. Exercises #3: Phylogenetics

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Inferring Speciation Times under an Episodic Molecular Clock

Translation. A ribosome, mrna, and trna.

Processes of Evolution

Edinburgh Research Explorer

Using an Artificial Regulatory Network to Investigate Neural Computation

- point mutations in most non-coding DNA sites likely are likely neutral in their phenotypic effects.

Number-controlled spatial arrangement of gold nanoparticles with

7.36/7.91 recitation CB Lecture #4

The Mathematics of Phylogenomics

Similarity or Identity? When are molecules similar?

AP Biology TEST #5 Chapters REVIEW SHEET

Transcription:

Sequence Divergence & The Molecular Clock Sequence Divergence v simple genetic distance, d = the proportion of sites that differ between two aligned, homologous sequences v given a constant mutation/substitution rate, d should provide a measure of time since divergence ² but this is complicated by multiple hits (homoplasy) ² corrected distance metrics account for the fact that there are not an infinite number of sites in a sequence 1

Expected sequence divergence v for neutral polymorphisms, substitution rate = mutation rate v thus, for two diverging lineages k = 2Tµ ² where k = the number of substitutions observed between two species and T is the time since divergence ² note that T and µ can be measured either in years or generations v solving for T T = k 2µ ² note that 2µ is often expressed as the rate of sequence divergence (i.e., twice the per lineage rate) Rates and Dates: Divergence Time Estimates v requires calibration with fossil or geological events v typically assumes a molecular clock ² Zuckerland & Pauling (1962) v but new methods allow a relaxation of the molecular clock assumption 2

3

Fleischer et al. 1998. Evolution on a volcanic conveyor belt: using phylogeographic reconstructions and K-Ar based ages of the Hawaiian Islands to estimate molecular evolutionary rates. Mol. Ecol. 7:533-545. 4

1.6% divergence per MY 0.8% per lineage per MY 5

suppose T 1 is known... µ= 1! # 2 " T 2 = K AB 2µ or... K AC 2T 1 + K BC 2T 1 ( ) / 2 µ= d AC + d BC T 2 = d AB µ T 1 $ & % Problems with dating v uncertainty in calibration points v fossil evidence provides lower bound on age only v variance of genetic distance estimates v saturation of genetic distances v extrapolation outside of calibrated range v ancestral polymorphism v **variation in substitution rate among lineages** 6

Dryocopus Indicator Prodotiscus Pteroglossus Relative Rates Test v compares genetic distances between two taxa (A, B) and an outgroup (C) v if evolutionary rate is constant, distances should be equal v d AC = d BC A B C 7

Dryocopus Indicator Prodotiscus Pteroglossus Differences Comparison Sites AG CT AC AT CG GT All TVs Dryocopus vs Indicator 8991 323 754 360 149 61 30 1677 600 Dryocopus vs Prodotiscus 8991 322 772 458 157 78 44 1831 737 p< 0.005 p~ 0.0001 8

Likelihood ratio test for rate constancy v compare the likelihood (probability) of the data when a molecular clock is enforced versus the likelihood when all branches are free to vary in length (product of time and mutation rate) No clock: -ln L = 27859.36 Clock: -ln L = 27904.29 2 x lnl = 89.86, p < 0.0001 n test statistic: 2 * ln L is distributed approximately as X 2 (chi-square) with n-2 degrees of freedom, where n = number of terminal taxa unconstrained tree: 2n-3 branch lengths constrained tree: n-1 branch lengths 9

Relative Rates Test w/ Multiple Taxa Anomalospiza Vidua estrildid finches A B ploceid finches C if evolutionary rates are constant, d AC = d BC d AC = 0.0925*; d BC = 0.0738* *calculated using the method of Steel et al. 1996 Syst. Biol. d AC = T 2 r 1 + ( T 2 T 1 )r 1 + T 1 r 2 = 0.0925 d BC = 2T 2 r 1 = 0.0738 d AB = T 1 ( r 1 + r 2 ) = 0.0798 setting r 1 =1, T 2 = d BC 2r 1 = 0.0369 T 1 = d AB + d BC d AC 2r 1 = 0.0306 T 2 T 1 A B C parasitic finches r 2 r 1 estrildid finches r 1 ploceid finches r 1 r 2 = d AB + d AC d BC 2T 1 =1.61 10

MrBayes: branch lengths (product of time and rate) BEAST: separate estimates of rate and time rate time Variation in Evolutionary Rate v rates may vary among lineages due to ² differences in life history ² especially generation time, metabolic rate ² diversifying natural selection ² but likely limited to few sites in few genes ² population history ² the rate of neutral evolution does not depend on population size ² the rate of nearly neutral evolution does! 11

Nearly Neutral Theory v what happens in small populations when selection is weak? ² changes in allele frequency due to drift and selection are approximately equal 2Ns 1 v probability of fixation for a new, nearly neutral allele: Pr( A fixed) = 2s 4 Ns 1 e w AA =1+ s, w Aa =1+ s/2, w aa =1 0.06 Probability of Fixation 0.05 0.04 0.03 0.02 0.01 s = -0.01 s = -0.001 s = -0.0001 0 10 100 1000 10000 Population Size Pr( A fixed) = 2s 4 Ns 1 e 12

What qualifies as nearly neutral? v Hamilton: 2s = 1/2N e or 4N e s = 1 ² value at which the processes of genetic drift and selection are equal v Hartl & Clark: 2Ns 1 v Hedrick: s < 1/(2N) or 2Ns < 1 v Ohta & Gillespie (1996): s 1/N or Ns 1 ( ) = 1 e 4 N esp Pr A fixed if p =1/ 2N 1 e 1 e 2s 2s = 4 Ns 4 Ns 1 e 1 e 4 Ns N = 500 s = -0.004 4N e s = -1 4N e s = 1 s = 0.004 13

Nearly Neutral Theory - Summary 2Ns 1 v the rate of neutral evolution is independent of population size 2Nµ 1 ² substitution rate equals mutation rate v in contrast, the fate of nearly neutral mutations depends on population size ² when N is small, the effect of genetic drift can be comparable to that of selection, making slightly deleterious mutations effectively neutral v thus, lineages experiencing small population size should accumulate both neutral and nearly neutral mutations, leading to a faster rate of sequence evolution 2N = µ Testing the Nearly Neutral Theory v how to distinguish neutral and nearly neutral mutations? ² synonymous (silent) versus non-synonymous (replacement) substitutions? ² synonymous likely to be neutral ² non-synonymous more likely to be deleterious 14

v Ohta (1994) - generation time effect differs between synonymous and non-synonymous mutations 0.608 0.817 1.575 0.760 0.966 1.274 ² interpreted as consequence of nearly neutral evolution ² inverse correlation between population size and body size/generation time Lohmueller et al. 2007. Proportionally more deleterious genetic variation in European than in African populations. Nature 451: 994-997. v used protein structure prediction to estimate the number of functionally consequential SNPs carried by each of 15 African Americans (AA) and 20 European Americans (EA) v higher heterozygosity in AA, but v the proportion of SNPs that are nonsynonymous is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%) v same result for SNPs that were inferred to be 'probably damaging' (15.9% in EA; 12.1% in AA) 15

Lohmueller et al. 2007. Proportionally more deleterious genetic variation in European than in African populations. Nature 451: 994-997. 16

Lohmueller et al. 2007. Proportionally more deleterious genetic variation in European than in African populations. Nature 451: 994-997. Estimating dn and ds v dn - non-synonymous divergence v ds - synonymous divergence 17

Nei-Gojobori (1986) Method 1. calculate number of potentially synonymous and non-synonymous sites (s + n = 3 per codon), disregarding stop codons S = n 3 j=1 i=1 N = 3n S f i e.g., Lysine: AAA: s = 1/3 AAG: s = 1/3 e.g., Leucine: TTA: s = 2/3 TTG: s = 2/3 CTA: s = 1-1/3 CTG: s = 1-1/3 CTC: s = 1 CTT: s = 1 Nei-Gojobori (1986) Method 1. calculate number of potentially synonymous and non-synonymous sites (s + n = 3 per codon), disregarding stop codons 2. calculate number of synonymous (s d ) and non-synonymous (n d ) differences for each codon 1. if one difference, then obvious 2. if two differences 18

two differences v E.g., 2 possible routes (1) TTT (Phe) - GTT (Val) - GTA (Val) 1 nonsynonymous, 1 synonymous (2) TTT (Phe) - TTA (Leu) - GTA (Val) 2 nonsynonymous s d = 0.5, n d = 1.5 Nei-Gojobori (1986) Method 1. calculate number of potentially synonymous and non-synonymous sites (s + n = 3 per codon), disregarding stop codons 2. calculate number of synonymous (s d ) and non-synonymous (n d ) differences for each codon 1. if one difference, then obvious 2. if two differences 3. if three differences 19

three differences v E.g., 6 possible routes (1) TTG (Leu) - ATG (Met) - AGG (Arg) - AGA (Arg) 2 nonsynonymous, 1 synonymous (2) TTG (Leu) - ATG (Met) - ATA (Ile) - AGA (Arg) (3) TTG (Leu) - TGG (Trp) - AGG (Arg) - AGA (Arg) (4) TTG (Leu) - TGG (Trp) - TGA (Stop) - AGA (Arg) (5) TTG (Leu) - TTA (Leu) - ATA (Ile) - AGA (Arg) (6) TTG (Leu) - TTA (Leu) - TGA (Stop) - AGA (Arg) s d = 0.75, n d = 2.25 Nei-Gojobori (1986) Method 3. dn = n d / n, 4. ds = s d / s where n d and s d are the total number of synonymous and non-synonymous differences across the sequence and n and s are the average number of synonymous and nonsynonymous sites in the two sequences 20

s s s d n d * * * * * * * ACG TAC GTA CGT TTG CCC AAG GAG Thr Tyr Val Arg Leu Pro Lys Glu 1 1 1 1 2/3 1 1/3 1/3 = 6.33 ACA TAC GTT TGT CTG CCA AGG GAC Thr Tyr Val Cys Leu Pro Arg Asp 1 1 1 1/2 4/3 1 1/3 1/3 = 6.5 1 0 1 0 1 1 0 0 0 0 0 1 0 0 1 1 dn = n d /n = 3/17.585 = 0.171 ds = s d /s = 4/6.415 = 0.624 dn/ds = 0.274 PAML (Phylogenetic Analysis using Maximum Likelihood) v versatile program for modeling sequence evolution ² basic transition matrix ² estimates d N and d S using maximum likelihood, where κ is the transition/transversion ratio and ω is equal to d N / d S & µπ j for a synonymous transversion ( µκπ j for a synonymous transition ( Q ij = ' µωπ j for a nonsynonymous transversion ( ( µωκπ j for a nonsynonymous transition )( 0 if 2 differences 21

dn: ds ratio v when measured in relation to the number of synonymous and non-synonymous sites ² dn/ds = 1 if all substitutions are neutral ² dn/ds > 1 suggests positive, diversifying selection ² dn/ds < 1 suggests purifying selection (i.e., constraints on protein evolution) Hughes & Nei 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167-170. v examined dn / ds in human MHC class 1 genes HLA-A, B & C (n = 12 sequences) dn / ds > 1: evidence of positive, diversifying selection 22

Positive Selection v testing at the gene level is a dull tool ² positive selection will usually affect one or a few codons, while the rest of the gene remains constrained (dn/ds << 1) ² nonetheless, genes associated with immune function and reproduction (self-recognition, sperm competition, sexual conflict) often have dn/ds > 1 v more sophisticated methods are available to identify individual sites under selection purifying selection is the norm: synonymous substitutions are more frequent than nonsynonymous substitutions in most genes 23

much greater variation among genes in nonsynonymous rate than in synonymous rate The avian constraint hypothesis... independent pairwise comparisons of dn and ds for mtdna protein-coding genes Stanley & Harrison MBE 1999 24

Woolfit & Bromham 2003 Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes. MBE 20:1545-1555. higher rate in endosymbiotic bacteria interpreted as a consequence of nearly neutral evolution if so, these organisms should also have a higher dn/ds ratio Moran (1996) Accelerated evolution and Muller's rachet in endosymbiotic bacteria. PNAS 93: 2873-2878 v Buchnera: endosymbiont of aphids v higher dn/ds than free living bacteria 25

12/5/14 dn/ds = 0.0409 Indicator Prodotiscus dn/ds = 0.0256 Dryocopus Pteroglossus Brood Parasites dn/ds = 0.0326 Anomalospiz a Vidua dn/ds = 0.0161 Lagonosticta Taeniopygia Brood Parasites 26

Constancy of the Molecular Clock v data on sequence divergence suggest that evolutionary rate in many organisms is roughly constant when time is measured in years even if generation times vary v if nearly neutral evolution is important, then the inverse correlation between generation time and population size may help to explain the relative constancy of rate among organisms with different life histories ² small critters have shorter generation time resulting in a higher rate of neutral evolution ² large critters have longer generation time but also smaller populations, resulting in a higher rate of nearly neutral evolution Popadin et al. (2007) Accumulation of slightly deleterious mutations in mitochondrial protein-coding genes of large versus small mammals. PNAS 104, 13390-13395. 27

But wait, are synonymous substitutions really neutral? v codon-bias ² favored codons (corresponding to more abundant trnas) are present at higher frequency in highly expressed genes than in genes with lower expression levels v base composition bias ² significant and sometimes substantial differences between lineages ² e.g., birds have > GC content than mammals 28

Non-independent evolution of base composition in mammalian mtdna 0.45 0.43 % C in proteincoding genes (variable positions only) 0.41 0.39 0.37 0.35 0.33 0.31 0.29 0.27 0.32 0.37 0.42 0.47 0.52 % C in trna and rrna genes (variable positions only) leads to changes in amino acid composition of mtdna proteins. 0.20 Proportion GCrich codons in protein genes 0.19 Ala, Arg, Gly, Pro 0.18 0.17 0.28 0.32 0.36 0.40 0.44 Proportion G + C nucleotides in RNA genes 29

Berglund et al. 2009 Hotspots of biased nucleotide substitutions in human genes. PLoS Biology 7: e1000026 v the fastest-changing genes in terms of amino acid substitutions show a biased pattern of fixation for AT-to-GC mutations AT GC = weak strong 30