Taming the Beast Workshop

Similar documents
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture Notes: Markov chains

Student Handout 2. Human Sepiapterin Reductase mrna Gene Map A 3DMD BioInformatics Activity. Genome Sequencing. Sepiapterin Reductase

Proteins: Characteristics and Properties of Amino Acids

, where we have X4 CYTOSINE :NT{C}=NT{X } = [ ].{4;XXXX} = [10 ].4;TGCA

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 15: Realities of Genome Assembly Protein Sequencing

Evolutionary Analysis of Viral Genomes

Mutation models I: basic nucleotide sequence mutation models

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Translation. A ribosome, mrna, and trna.

Inferring Molecular Phylogeny

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Properties of amino acids in proteins

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

Chemistry Chapter 22

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

Probabilistic modeling and molecular phylogeny

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

EVOLUTIONARY DISTANCES

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

In: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.

7. Tests for selection

In: P. Lemey, M. Salemi and A.-M. Vandamme (eds.). To appear in: The. Chapter 4. Nucleotide Substitution Models

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

UNIT TWELVE. a, I _,o "' I I I. I I.P. l'o. H-c-c. I ~o I ~ I / H HI oh H...- I II I II 'oh. HO\HO~ I "-oh

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Amino Acids and Peptides

7.36/7.91 recitation CB Lecture #4

MS/MS of Peptides Manual Sequencing of Protonated Peptides

What Is Conservation?

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Predicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure.

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Towards Understanding the Origin of Genetic Languages

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Phylogenetic Inference using RevBayes

Sequence comparison: Score matrices

Lecture 14 - Cells. Astronomy Winter Lecture 14 Cells: The Building Blocks of Life

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Organic Chemistry Option II: Chemical Biology

Maximum Likelihood in Phylogenetics

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

A phylogenetic view on RNA structure evolution

Understanding relationship between homologous sequences

Molecular Selective Binding of Basic Amino Acids by a Water-soluble Pillar[5]arene

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

LS1a Midterm Exam 1 Review Session Problems

The Phylo- HMM approach to problems in comparative genomics, with examples.

Solutions In each case, the chirality center has the R configuration

CHEMISTRY ATAR COURSE DATA BOOKLET

Protein Structure Bioinformatics Introduction

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers

Basic Principles of Protein Structures

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

7.05 Spring 2004 February 27, Recitation #2

Protein Secondary Structure Prediction

Letter to the Editor. Department of Biology, Arizona State University

Modeling Noise in Genetic Sequences

Models of Molecular Evolution

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Collision Cross Section: Ideal elastic hard sphere collision:

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Section Week 3. Junaid Malek, M.D.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Practice Midterm Exam 200 points total 75 minutes Multiple Choice (3 pts each 30 pts total) Mark your answers in the space to the left:

Genetic distances and nucleotide substitution models

Protein Fragment Search Program ver Overview: Contents:

Part 2: Chemical Evolution

A Plausible Model Correlates Prebiotic Peptide Synthesis with. Primordial Genetic Code

Counting labeled transitions in continuous-time Markov models of evolution

A study of matrix energy during peptide formation through chemical graphs

Transcription:

Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31

Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31

genotype sequence level UGGUGUUG UGGUUUG phenotype e.g. antigenic level: ntibody binding to HIV codon: three nucleotides encode for one amino acid one nucleotide change can already change the phenotype alphabet: 4 nucleotides: DN: G RN: UG 20 amino acids 3 / 31

genotype sequence level UGGUGUUG UGGUUUG phenotype e.g. antigenic level: ntibody binding to HIV codon: three nucleotides encode for one amino acid one nucleotide change can already change the phenotype alphabet: 4 nucleotides: DN: G RN: UG 20 amino acids When comparing two nucleotide sequences we have to keep in mind that they are the result of mutation during replication (genotypic level) and selection (phenotypic level). 3 / 31

G G way of arranging sequences to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences o find an alignment: concept of positional homology: nucleotides (or amino acids) show positional homology if they exist at equivalent positions in the respective sequence. Programs for alignment MUSLE, LUSL which can be called from e.g. liview, Meglign,... BES analysis starts with aligned sequences!!! file format.fas,.fasta,.nexus 4 / 31

for nucleotide substitions 5 / 31

he fundamental problem G G G G taxon 1 G G G taxon 2 G G taxon 3 6 / 31

he fundamental problem G G single substitution > G G G taxon 1 G G G taxon 2 G G taxon 3 6 / 31

he fundamental problem G G multiple substitutions > > G G taxon 1 G G G taxon 2 G G taxon 3 6 / 31

he fundamental problem G G convergent substitution > G G taxon 1 > G G G taxon 2 G G taxon 3 6 / 31

he fundamental problem G G G G G G > > > G G taxon 1 G G taxon 1 G G > G > G G G taxon 2 G G G taxon 2 G G G G G taxon 3 G G taxon 3 G G Problem of phylogenetics: We observe sequences but not their evolutionary history. hus we have to take all possible evolutionary trajectories into account. 6 / 31

he fundamental problem G G G G G G G G taxon 1 > > G G taxon 1 > G G > G G G G taxon 2 G G G taxon 2 > G G G G G taxon 3 G G taxon 3 G G Problem of phylogenetics: We observe sequences but not their evolutionary history. hus we have to take all possible evolutionary trajectories into account. he sequence evolution model appears in the posterior:...... P( )=P( )P( )P( )P( )P( )... G...... G...... P( )... G... 6 / 31

model for nucleotide substitutions State space of each nucleotide position: S = {,,, G} Example: ssume the process is at state -(a+b+c) G a b c G 7 / 31

model for nucleotide substitutions State space of each nucleotide position: S = {,,, G} Example: ssume the process is at state -(a+b+c) G a b c G Substitution rate matrix: G -(a+b+c) a b c d -(d+e+f) e f g h -(g+h+i) i G j k l -(j+k+l) 7 / 31

Site models in BES2 8 / 31

he easiest substitution model: J69 J69: named after H Jukes, R antor: Evolution of protein molecules. 1969 [Jukes and antor, 1969]. all substitution have the same rate, λ G Substitution rates: G λ λ λ λ λ λ λ λ λ G λ λ λ 9 / 31

ccounting for transition/transversion: K80 K80: named after M Kimura: simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. 1980. [Kimura, 1980] transitions happen at rate α, transversions at rate β pyrimidines (one ring) purines (two rings) transversion transition G Substitution rates: G α β β α β β β β α G β β α 10 / 31

ccounting for transition/transversion: HKY HKY: named after [Hasegawa et al., 1984, Hasegawa et al., 1985] accounting for transitions (rate α), transversions (rate β) after a long period of evolution, equilibrium frequencies are reached pyrimidines (one ring) purines (two rings) transversion transition G Substitution rates: G απ βπ βπ G απ βπ βπ G βπ βπ απ G G βπ βπ απ α β β π 0 0 0 = α β β β β α 0 π 0 0 0 0 π 0 β β α 0 0 0 π G 11 / 31

ccounting for transition/transversion: N93 N93: named after [amura and Nei, 1993] accounting for different transition rates between and as well as and G after a long period of evolution, equilibrium frequencies are reached pyrimidines (one ring) purines (two rings) transversion α 1 α 2 transition G Substitution rates: G α 1 π βπ βπ G α 1 π βπ βπ G βπ βπ α 2 π G G βπ βπ α 2 π 12 / 31

more general substitution model: GR GR (REV): generalised time-reversible model based on three papers: [avaré, 1986, Yang, 1994, Zharkikh, 1994] Substitution rates: G aπ bπ cπ G aπ dπ eπ G bπ dπ fπ G G cπ eπ fπ + quite flexible + time-reversible - not completely general 13 / 31

he most general substitution model implemented in BES2 but not in BEUti UNRES: unrestricted model first described in [Yang, 1994] each substitution has a (different) rate Substitution rates: G a b c d e f g h i G j k l + most general case + all other models are special cases of UNRES - mathematical very complicated and not handy to use - not time-reversible 14 / 31

in BEUti model parameters description J69 1 all substitutions have the same rate K80 2+3 accounts for transition and transversions, not in BEUti HKY 2+3 distinction between transition and transversions, including equilibrium frequencies N93 3+3 different rates for transitions GR 6+3 general, but still time-reversible UNRES 12 most general, not time-reversible, not in BEUti an be empirically estimated from the alignment or inferred alongside the substitution rates. 15 / 31

he fundamental problem - again G G G G taxon 1 G G G taxon 2 G G taxon 3 Problem of phylogenetics: We observe sequences but not their evolutionary history. hus we have to take all possible evolutionary trajectories into account. 16 / 31

he fundamental problem - again G G G G taxon 1 G G G taxon 2 G G taxon 3 Problem of phylogenetics: We observe sequences but not their evolutionary history. hus we have to take all possible evolutionary trajectories into account. So far we determined rates of nucleotide substitutions. But we need probabilities. 16 / 31

Nucleotide substitutions as (M) Definition of a Markov chain (see also [Ross, 1996]) stochastic process, i.e. a series of random experiments through time Nucleotide substitutions as M G p G p G p G time 17 / 31

Nucleotide substitutions as (M) Definition of a Markov chain (see also [Ross, 1996]) stochastic process, i.e. a series of random experiments through time Nucleotide substitutions as M G p G p G p G time lives on a state space and jumps to the different states p p G 17 / 31

Nucleotide substitutions as (M) Definition of a Markov chain (see also [Ross, 1996]) stochastic process, i.e. a series of random experiments through time Nucleotide substitutions as M G p G p G p G time lives on a state space and jumps to the different states p p G memorylessness: the probability of jumping to a state only depends on the actual state G p G p G p G time 17 / 31

Why are a great model for nucleotide substitutions memorylessness: a nucleotides substitution happens independently from the substitution history at this site substitution rate matrix defines the transition probabilities applying theories of linear algebra we can calculate the transition probability matrix according to: P(t) = e Qt = U diag(e ɛ 1t, e ɛ 2t, e ɛ 3t, e ɛ 4t )U 1 the transition probabilities take into account every possible substitution path (hapman-kolmogorov theorem) 18 / 31

Example of transition probabilities: J69 Substitution rates: 3λ λ λ λ Q = λ 3λ λ λ λ λ 3λ λ λ λ λ 3λ P(t) = e Qt G transition probability matrix: p 0 (t) p 1 (t) p 1 (t) p 1 (t) P(t) = p 1 (t) p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) with p 0 (t) = 1 4 + 3 4 e 4λt and p 1 (t) = 1 4 1 4 e 4λt 19 / 31

Example of transition probabilities: J69 Substitution rates: 3λ λ λ λ Q = λ 3λ λ λ λ λ 3λ λ λ λ λ 3λ P(t) = e Qt G transition probability matrix: p 0 (t) p 1 (t) p 1 (t) p 1 (t) P(t) = p 1 (t) p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) with p 0 (t) = 1 4 + 3 4 e 4λt and p 1 (t) = 1 4 1 4 e 4λt substitutions per site λ = 0.015 day transistion probabilities 0.0 0.2 0.4 0.6 0.8 1.0 p 0 (t) p 1 (t) 0 20 40 60 80 100 time in days 19 / 31

J69: Stationary distribution Suppose we have a sequence that evolves with rate 9 substitutions per site λ = 2.2/3 10 year. We follow the evolution of 4 different sites with at site 1, at site 2, at site 3 and G at site 4 at time point 0. How likely is it, that after time t has passed, there is a,, or G at the four different positions? o answer this question, we follow the time evolution of the transition probability matrix P(t): 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0.46 0.18 0.18 0.18 0.18 0.46 0.18 0.18 0.18 0.18 0.46 0.18 0.18 0.18 0.18 0.46 0.31 0.23 0.23 0.23 0.23 0.31 0.23 0.23 0.23 0.23 0.31 0.23 0.23 0.23 0.23 0.31 0 4.5x10 8 9x10 8 1.8x10 9 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 time/years when t stationary distribution is reached ny long sequence (e.g....) at time 0, will be composed of equal amounts of,,,g after time t 20 / 31

J69: ime transformation he times we look at, e.g. in species evolution, are very often very large. hus, instead of real time, we display an evolutionary time scale in terms of sequence distances. s one substitution happens at rate 3λ in J69 (keep in mind that in other models the expected time to substitution is different!), we expect one substitution to happen after time 1/(3λ). his is due to exponentially distributed waiting times for an event happening at a certain rate. his means, that we expect one substitution after 1 2.2 10 9 4.5 10 8 years in our example. 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0.46 0.18 0.18 0.18 0.18 0.46 0.18 0.18 0.18 0.18 0.46 0.18 0.18 0.18 0.18 0.46 0.31 0.23 0.23 0.23 0.23 0.31 0.23 0.23 0.23 0.23 0.31 0.23 0.23 0.23 0.23 0.31 0 4.5x10 8 9x10 8 1.8x10 9 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 time/years time in years expected time to 1 substitution t = 3λ d in J69 rick from physics: compare units: [t] =years [ d 3λ ] = # substitutions # substitutions/year 0 1 2 4 d=timex(3 λ) 21 / 31

22 / 31

Variable rates so far: all sites in the sequence evolve at the same rate but: substitution rates might differ over the genome mutation rates might differ over sites selective pressure might be different on the phenotypic level 23 / 31

Variable rates so far: all sites in the sequence evolve at the same rate but: substitution rates might differ over the genome mutation rates might differ over sites selective pressure might be different on the phenotypic level We extend the existing models, by replacing the constant rates by Γ-distributed random variables (notation: J69+Γ, HKY+Γ,... ) 23 / 31

Example: J69+Γ λ λr we replace the substitution rate λ by λr, where R is a Γ-distributed random variable with shape parameter α and mean 1. g(r) 0.0 0.5 1.0 1.5 2.0 r α=0.2 α=1 α=2 α=20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 24 / 31

Example: J69+Γ λ λr we replace the substitution rate λ by λr, where R is a Γ-distributed random variable with shape parameter α and mean 1. g(r) 0.0 0.2 0.4 0.6 0.8 1.0 r α=2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 In BEUti: hange number of Gamma ategory ount to allow for rate variation. 4 to 6 categories work normally well. 24 / 31

25 / 31

he codon sun codon consists of three nucleotides, translating to one of the 20 amino acids: hree-letter One-Letter Molecular mino cid bbreviation Symbol Weight lanine la 89Da rginine rg R 174Da sparagine sn N 132Da sparticacid sp D 133Da sparagineor asparticacid sx B 133Da ysteine ys 121Da Glutamine Gln Q 146Da Glutamicacid Glu E 147Da Glutamineor glutamicacid Glx Z 147Da Glycine Gly G 75Da Histidine His H 155Da Isoleucine Ile I 131Da Leucine Leu L 131Da Lysine Lys K 146Da Methionine Met M 149Da Phenylalanine Phe F 165Da Proline Pro P 115Da Serine Ser S 105Da hreonine hr 119Da ryptophan rp W 204Da yrosine yr Y 181Da Valine Val V 117Da [Sanger, 2015] [Promega, 2015] 26 / 31

Example: odon Overview over substitution rates to the same codon, the thickness of arrows represent different rates: (Ile) G (Val) (Leu) G (rg) (Leu) (Leu) (Gln) (Leu) G (Leu) (Pro) synonymous substitutions: does not change nonsynonymous substitutions: does change bigger arrows: transition smaller arrows: transversion adapted from [Yang, 2014] 27 / 31

Varying substitution rates amongst the codon positions [Bofkin and Goldman, 2007] have shown that in protein encoding regions second codon positions evolve more slowly than first codon positions third codon positions evolve faster than first codon positions 28 / 31

Varying substitution rates amongst the codon positions [Bofkin and Goldman, 2007] have shown that in protein encoding regions second codon positions evolve more slowly than first codon positions third codon positions evolve faster than first codon positions Different codon positions can have different evolutionary rates. BES2 allows for estimating these rates separately. file BES2.4.x/examples/nexus/primate-mtDN.nex 28 / 31

Including the choice of substitution rate model into your BES analysis 29 / 31

Rate models in BES2 BES2 allows for including different site models into your analysis ( Site Model tab in BEUti) Which site model is the best for your data? 30 / 31

Rate models in BES2 BES2 allows for including different site models into your analysis ( Site Model tab in BEUti) Which site model is the best for your data? : package bmodelest: Bayesian site model selection for nucleotide data 30 / 31

Rate models in BES2 BES2 allows for including different site models into your analysis ( Site Model tab in BEUti) Which site model is the best for your data? : package bmodelest: Bayesian site model selection for nucleotide data : package SubstBM: modelling across-site variation in the nucleotide 30 / 31

I - Bofkin, L. and Goldman, N. (2007). Variation in Evolutionary Processes at Different odon Positions. Molecular Biology and Evolution, 24(2):513 521. - Hasegawa, M., Kishino, H., and Yano,. (1985). Dating of the Human pe Splitting by a Molecular lock of Mitochondrial-Dna. Journal of, 22(2):160 174. - Hasegawa, M., Yano,., and Kishino, H. (1984). New Molecular lock of Mitochondrial-Dna and the Evolution of Hominoids. Proceedings of the Japan cademy Series B-Physical and Biological Sciences, 60(4):95 98. - Jukes,. and antor,. (1969). Evolution of protein molecules. Mammalian Protein Metabolism., pages 21 123. - Kimura, M. (1980). simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of molecular evolution, 16(2):111 120. - Promega (2015). he amino acids: https://www.promega.com/ /media/files/resources/technical references/amino acid abbreviations and molecular weights.pdf. - Ross, S. M. (1996). Stochastic Processes. Second edition. Wiley. - Sanger (2015). he codon sun: ftp://ftp.sanger.ac.uk/pub/yourgenome/downloads/activities/kras-cancer-mutation/krascodonwheel.pdf. - amura, K. and Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DN in humans and chimpanzees. Molecular Biology and Evolution, 10(3):512 526. - avaré, S. (1986). Some probabilistic and statistical problems in the analysis of DN sequences. In Some mathematical questions in biology DN sequence analysis (New York, 1984), pages 57 86. mer. Math. Soc., Providence, RI. - Yang, Z. (1994). Estimating the pattern of nucleotide substitution. Journal of molecular evolution, 39(1):105 111. - Yang, Z. (2014). Statistical pproach. Oxford University Press. - Zharkikh,. (1994). Estimation of evolutionary distances between nucleotide sequences. Journal of molecular evolution, 39(3):315 329. 31 / 31