Natural selection on the molecular level

Similar documents
Understanding relationship between homologous sequences

7. Tests for selection

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab)

Probabilistic modeling and molecular phylogeny

Lecture Notes: BIOL2007 Molecular Evolution

Statistics for Biology and Health. Series Editors K. Dietz, M. Gail, K. Krickeberg, A. Tsiatis, J. Samet

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Fitness landscapes and seascapes

Lecture 7 Mutation and genetic variation

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation

FUNDAMENTALS OF MOLECULAR EVOLUTION

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Sequence Divergence & The Molecular Clock. Sequence Divergence

Constructing Evolutionary/Phylogenetic Trees

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Maximum Likelihood in Phylogenetics

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Constructing Evolutionary/Phylogenetic Trees

EVOLUTIONARY DISTANCES

Lecture 4. Models of DNA and protein change. Likelihood methods

Phylogenetic Tree Reconstruction

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Dr. Amira A. AL-Hosary

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Processes of Evolution

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Evolutionary Analysis of Viral Genomes

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

Neutral Theory of Molecular Evolution

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

C.DARWIN ( )

8/23/2014. Phylogeny and the Tree of Life

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Phylogenetics: Building Phylogenetic Trees

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice

The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences

The Phylogenetic Handbook

Maximum Likelihood in Phylogenetics

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

7.36/7.91 recitation CB Lecture #4

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture Notes: Markov chains

Molecular Evolution, course # Final Exam, May 3, 2006

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

Classical Selection, Balancing Selection, and Neutral Mutations

Recombina*on and Linkage Disequilibrium (LD)

Processes of Evolution

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA

Cladistics and Bioinformatics Questions 2013

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogenetics. BIOL 7711 Computational Bioscience

Lab 9: Maximum Likelihood and Modeltest

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

REVIEWS. The evolution of gene duplications: classifying and distinguishing between models

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Concepts and Methods in Molecular Divergence Time Estimation

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Inferring Molecular Phylogeny

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

D. Incorrect! That is what a phylogenetic tree intends to depict.

The abundance of deleterious polymorphisms in humans

Phylogenetic inference

Quantifying sequence similarity

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Transcription:

Natural selection on the molecular level

Fundamentals of molecular evolution How DNA and protein sequences evolve?

Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change gene order } may impede recombinatios and fix a haplotype } Duplications } may involve genes or gene fragments } or entire chromosomes and genomes } the main source of evolutionary innovations } Horizontal gene transfer } including symbiotic events 3

Substitutions } Why transitions are more frequent? } there are more possible transversions } observed ts/tv ratio from ~2 (ndna) to ~15 (human mtdna) } exception plant mtdna } the ts/tv ratio varies significantly between lineages 4

Transitions and transversions } Why are the transitions more frequent? } Selection hypothesis (transitions are more likely to be silent and neutral) } But: } transitions are more frequent in rrna genes, pseudogees and noncoding regions } transitions are more frequent in 4-fold degenerate positions (codons like CUN Leu) 5

Transitions and transversions } Why are the transitions more frequent? } Mechanistic explanation mutagenesis and repair mechanisms } Transitions are the result of: } tautomeric shift of bases } deamination (e.g. oxidative) } Transitions cause less distortion of the double helix structure } less likely to be detected and repaired by the MMR system 6

Modeling DNA evolution } Ancestral sequence usually not available } The number of mutations has to be inferred from differences between present sequences } Requires correcting for multiple hits, particularly for more distant sequences 7

Modeling sequence evolution } Markov models the state of generation n +1 depends only on the state of generation n and the rule set (character substitution probability matrix) } There are many models with varying complexity } Parameters may include: } multiple hits (Poisson correction) } different substitution probabilities for various mutations } different substitution probabilities for various ositions in sequence } nucleotide frequencies 8

DNA evolution -the simplest model (Jukes- Cantor) A C G T A 1-3α α α α C α 1-3α α α G α α 1-3α α T α α α 1-3α } } } } equal nucleotide frequencies (f=0.25) equal substitution probabilities corrected distance: D JC = 3 4 ln(1 4 3 D) Example: D OBS =0.43 => D JC =0.64 9

Other models } Kimura (K80, two-parameter) - different probabilities for transitions and transversions } Felsenstein (F81), Hasegawa-Kishino-Yano (HKY85) - different nucleotide frequencies (F81) + different probabilities for transitions and transversions (HKY85) } GTR (General Time Reversible, Tavare 86) 10

The GTR model } Different probabilities for each substitution (but symmetrical, e.g. A T = T A) - 6 parameters } Different nucleotide frequencies - 4 parameters 11

The gamma distribution } In simple models each position in sequence has the same substitution probability - unrealistic } Different classes of substitution probabilities are modeled using the gamma distribution 12

On the level of the genetic code } A mutation can: } change the codon to another codon encoding the same aa } synonymous } change the codon to a codon encoding a different aa } nonsynonymous } change the codon to a STOP codon } nonsens } cause a frameshift } change gene expression 13

Mutations and the natural selection } We do not observe mutations } we observe differences between populations (species) } or intra-population variation (polymorphism) } Alleles formed by mutation are subject to selection } Allele frequencies can be changed by genetic drift } We observe mutations that are fixed entirely or partially (polymorphisms) in the gene pool 14

The fundamental question of molecular evolution } What is the role of genetic drift and natural selection in shaping sequence variation? } intrapopulation (polymorphisms) } interspecific } The question concerns quantitative differences } There is no doubt that evolutionary adaptations are a result of natural selection } But, how many of the differences we observe are adaptive? } And which ones? 15

Selection or drift? } Selectionism } the majority of fixed mutations were positively selected } most polymorphisms are maintained by selection } balancing selection, overdominance, frequency-based selection } Neutralism (Kimura, 1968) } the majority of fixed mutations were fixed by genetic drift (by chance) } the majority of polymorphisms are the result of drift } positively selected mutations are rare and are not significant for the quantitative analysis of molecular variation 16

Mutations and natural selection } deleterious } s < 0 } eliminated by selection (purifying/negative) } neutral } s 0 (more precisely, s 1/4N) } can be fixed or lost by genetic drift } beneficial } s > 0 } fixed by positive selection (influenced by drift for small s) 17

Selectionism vs neutralism } Selectionism: } most mutations are deleterious } most fixed mutations are beneficial } neutral mutations are rare (not more frequent than beneficial) } Neutralism } most mutations are deleterious or neutral } most fixed mutations are neutral } beneficial mutations are rare (much less frequent than neutral), but still important for adaptation 18

Selectionism vs neutralism selectionism neutralism pan-neutralism Neutralism is not pan-neutralism, nobody denies the selective importance of mutations 19

Neutral theory - indications } The rate of substitution and the degree of polymorphism are too high to be explained by selection alone } Constant rate of molecular evolution (molecular clock) } Less important sequences (pseudogenes, less important fragments of proteins) change faster than key functional sequences 20

The constant rate of molecular evolution } Many sequences evolve at a constant rate } The rate varies for different sequences, but for the same sequence remains constant between lineages } Tzw. zegar molekularny Amino acid differences for vertebrate globins 21

Drift and the rate of evolution } Genetic drift is a random process, but its rate is equal over long time } Depends only on mutation rate (one fixed change per 1/µ generations) } For selection a constant rate only if the environment changes at a constant rate (unlikely) } The rate of adaptive changes is not constant 22

Molecular clock } A consequence of the neutral model } Sequences diverge at a constant rate for a given gene } different rates for different genes } Relative rate test K AC - K BC = 0 A B C } tests for constant rate between lineages, but not ove time 23

Molecular clock - the generation problem } In the neutral model the fixation rate is: } Should be constant per generation } Different organisms have different generation times } The rate of evolution should not be constant over time 1 2Nµ 2N = µ } But that s what was observed 24

The constant rate of molecular evolution } Tzw. zegar molekularny Amino acid differences for vertebrate globins 25

The generation problem } Generation time varies among different organisms ~0,03 generations/year ~3 generations/year } Why does the evolution rate remain constant? 26

The near-neutral model } The original neutral model concerns purey neutral changes (s = 0), which are rare } Mutations behave like neutral if: s 1 4N e } Mutations with a small selection coefficient s will behave like neutral mutations in small populations } in large populations they will be subject to selection 27

Near-neutral model } There is a negative correlation between generation time and population size 28

The near neutral model ~0,03 generations/year s 1 4N e ~3 generations/year Long generation time Less mutations per year Short generation time More mutations per year Small population (small N e ) Large population (large N e ) More mutations behave like neutral and are fixed by drift More mutations are subject to selection Generation time and population size cancel each other, resulting in a constant rate (Ohta & Kimura, 1971). 29

The molecular clock } Rate constant over time for proteins and nonsynonymous substitutions } In DNA, } for synonymous mutations } pseudogenes } some noncoding sequences } the rate depends on the generation time 30

Eveolutionary rate and function Less important sequences (pseudogenes, less important fragments of proteins) change faster than key functional sequences 31

Degeneracy (redundancy) in the code 32 4-fold degenerate 2-fold degenerate

Eveolutionary rate and function Less important sequences (pseudogenes, less important fragments of proteins) change faster than key functional sequences 33

Evolutionary rate unit: PAM/10 8 years Evolutionary unit time: how long (in millions of years, 10 6 ) does it take to fix 1 mutation/100 aa (1 PAM) 34

Evolutionary rate } Negative (purifying selection) is the main factor influencing the rate of evolution } in more important sequences a mutation is more likely to be deleterious (and eliminated by selection) } in more important sequences a mutation is more likely to be neutral (and may be fixed by drift) } mutations that don t influence function will be neutral } pseudogenes } noncoding regions? } synonymous substitutions? 35

Neutralism - the current status } A foundation that explains many observations } high DNA and protein polymorphism } } molecular clock } many deviations, there is no global clock, local clocks can be found slower evolution of more important sequences } It is a null hypothesis for testing for the positive selection at the molecular level 36

Neutralism - the current status } verified by sequencing, currently on the genomic scale } Kimura: 1968 before DNA sequencing was invented 37

Neutralism - the current status } Smith & Eyre-Walker 2002 45% amino acid substitutions in Drosophila sp. fixed by selection } Andolfatto 2005 between D. melanogaster and D. simulans postive selection responsible for: } 20% DNA substitutions in introns and intragenic sequences } 60% DNA substitutions in UTR sequences 38

Neutralism - the current status } The main achievement of the neutral theory is the development of a mathematical frameworkto sudy the effects of selection and drift } Allowed to develop methods of testing for positive selection (using the neutral model as a null hypothesis) } A significant portion of the genome evolves according to the neutral model } a molecular clock can be found 39

40 Testing selection

Looking for traces of selection } Most sequences evolve in a constant, clocklike fashion } Deviation from the constant rate in a given lineage lineage-specific selection Constant rate Accelerated changes Slowed changes

Testing for selection } Assumption: synonymous mutations are neutral } For pairwise comparisons, calculate Ka (dn) number of nonsynonymous changes per number of nonsynonymous sites Ks (ds) number of synonymous changes per number of synonymous sites The rate Ka/Ks (ω) reflects selection ω=1 purely neutraln ω<1 negative (purifying) selection ω>1 positive selection

Testing for selection } The ω rate is rarely larger than 1 globally for the entire gene (exceptions, e.g. MHC genes) } Average ω between primates and rodents is 0.2, between human and chimp: 0.4 } Deviations from average ω for a given gene in a given lineage indicate selection } In a gene there can be sites with different ω, indicating selection acting on particular regions of the sequence

Testing for selection II } Comparing synonymous and nonsynonymous rates for intraspecific and interspecific comparisons

The McDonald-Kreitman test } Comparing synonymous and nonsynonymous rates for intraspecific and interspecific comparisons } In a neutral model both rates should be equal

Example: the ADH gene in 3 Drosophila species synonymous nonsynonymous rate intraspecific 42 2 ~0.05 interspecific 17 7 ~0.41 Conclusion: nonsynonymous changes favoured during speciation - not neutral

Comparing hypotheses } The likelihood criterion } PAML (Phylogenetic Analysis by Maximum Likelihood) author Ziheng Yang } http://abacus.gene.ucl.ac.uk/software/paml.html

Likelihood } } } Likelihood = the probability of obtaining the observed data under a given model and its parameters: P(data model) Probability relates to future events in a known model - e.g. for a symmetrical coin toss, probability of heads = 1/2 (two heads 1/4 etc.) Likelihood explains observations (past) with an unknown model - e.g. observing 50% heads and 50% tails is most likely for a symmetrical coin toss

The likelihood ratio test } LRT (Likelihood Ratio Test) } Comparing nested models with different parameter count } For each model calculate likelihoods (L1, L2) LR = 2 [ln(l1) ln(l2)] } The significance of LR is determined by comparing to χ2 for df (degrees of freedom) equal to the difference in parameter count

Example #1 Model 0 ω the same for the whole tree: likelihood L0 Model 1 ω for lineage #1 different, than for the rest of the tree: likelihood L1 Is the rate of evolution different for lineage #1? LR = 2 [ln(l1) ln(l0)] Is LR larger than χ 2 for df=1?

Input files for PAML } Sequences: PHYLIP format (e.g. from Clustal) } Tree: Newick format, with optional branch labels } Control file: codeml.ctl

Parentheses (Newick) tree format (Macaca,Cerco,(Pongo,(Gorilla,(Homo,Chimp)))); (Macaca,Cerco,(Pongo,(Gorilla,(Homo#1,Chimp)))); #1 #1 #1 #1 #1 #1 (Macaca,Cerco,(Pongo,(Gorilla,(Homo,Chimp))$1)); #1 #1 #1 #2 #1 (Macaca,Cerco,(Pongo,(Gorilla,(Homo#2,Chimp))$1));

seqfile = treefile = outfile = noisy = 9 verbose = 2 runmode = 0 * 0,1,2,3,9: how much rubbish on the screen * 1: detailed output, 0: concise output * 0: user tree; 1: semi-automatic; 2: automatic * 3: StepwiseAddition; (4,5):PerturbationNNI seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table clock = 0 * 0: no clock, unrooted tree, 1: clock, rooted tree model = 0 * models for codons: * 0:one, 1:b, 2:2 or more dn/ds ratios for branches NSsites = 0 icode = 0 * dn/ds among sites. 0:no variation, 1:neutral, 2:positive * 0:standard genetic code; 1:mammalian mt; 2-10:see below fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated kappa = 2 * initial or fixed kappa fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate omega = 1 * initial or fixed omega, for codons or codon-transltd AAs fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha alpha =.0 * initial or fixed alpha, 0:infinity (constant rate) Malpha = 0 * different alphas for genes ncatg = 4 * # of categories in the dg or AdG models of rates getse = 0 * 0: don't want them, 1: want S.E.s of estimates RateAncestor = 1 * (1/0): rates (alpha>0) or ancestral states (alpha=0) fix_blength = 1 * 0: ignore, -1: random, 1: initial, 2: fixed method = 0 * 0: simultaneous; 1: one branch at a time

Models } Branch models - is ω for a given branch (or a group of branches) different from the rest of the tree? } Site models - how many codon classes with respect to ω are there (is the selection similar for all the sites)? } Branch and site models - are there differently selected codon classes in a specific branch?

Brach models } Null hypothesis - ω is the same in all the branches } Test hypothesis - there are branches where ω is significantly different from the rest of the tree } The free model - a specific ω value for each branch } df = difference in parameter count (each specified branch adds 1)

Branch models seqfile = treefile = outfile = noisy = 9 verbose = 2 runmode = 0 * 0,1,2,3,9: how much rubbish on the screen * 1: detailed output, 0: concise output * 0: user tree; 1: semi-automatic; 2: automatic * 3: StepwiseAddition; (4,5):PerturbationNNI seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table clock = 0 * 0: no clock, unrooted tree, 1: clock, rooted tree model = 0 * models for codons: * 0:one, 1:b, 2:2 or more dn/ds ratios for branches NSsites = 0 icode = 0 * dn/ds among sites. 0:no variation, 1:neutral, 2:positive * 0:standard genetic code; 1:mammalian mt; 2-10:see below fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated kappa = 2 * initial or fixed kappa fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate omega = 1 * initial or fixed omega, for codons or codon-transltd AAs fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha alpha =.0 * initial or fixed alpha, 0:infinity (constant rate) Malpha = 0 * different alphas for genes ncatg = 4 * # of categories in the dg or AdG models of rates getse = 0 * 0: don't want them, 1: want S.E.s of estimates RateAncestor = 1 * (1/0): rates (alpha>0) or ancestral states (alpha=0) fix_blength = 1 * 0: ignore, -1: random, 1: initial, 2: fixed method = 0 * 0: simultaneous; 1: one branch at a time

Site models } Assume constant ω in all branches } Null hypothesis (M1a): two classes of codons: } } neutral (ω=1) under purifying selection (ω<1) } Test hypothesis (M2a): three classes of codons: } } } neutral (ω=1) under purifying selection (ω<1) under positive selection (ω>1) } Also other, more complex models (see PAML documentation)

Site models seqfile = treefile = outfile = noisy = 9 verbose = 2 runmode = 0 * 0,1,2,3,9: how much rubbish on the screen * 1: detailed output, 0: concise output * 0: user tree; 1: semi-automatic; 2: automatic * 3: StepwiseAddition; (4,5):PerturbationNNI seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table clock = 0 * 0: no clock, unrooted tree, 1: clock, rooted tree model = 0 * models for codons: * 0:one, 1:b, 2:2 or more dn/ds ratios for branches NSsites = 0 icode = 0 * dn/ds among sites. 0:no variation, 1:neutral, 2:positive * 0:standard genetic code; 1:mammalian mt; 2-10:see below fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated kappa = 2 * initial or fixed kappa fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate omega = 1 * initial or fixed omega, for codons or codon-transltd AAs fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha alpha =.0 * initial or fixed alpha, 0:infinity (constant rate) Malpha = 0 * different alphas for genes ncatg = 4 * # of categories in the dg or AdG models of rates getse = 0 * 0: don't want them, 1: want S.E.s of estimates RateAncestor = 1 * (1/0): rates (alpha>0) or ancestral states (alpha=0) fix_blength = 1 * 0: ignore, -1: random, 1: initial, 2: fixed method = 0 * 0: simultaneous; 1: one branch at a time

Branch and site models } In specified branches an additional class of codons with ω>1 } Three classes of codons: } } } neutral (ω=1) under purifying selection (ω<1) under positive selection (ω>1) } Null hypothesis: in the third class ω=1 (no positive selection)

Branch and site models seqfile = treefile = outfile = noisy = 9 verbose = 2 runmode = 0 * 0,1,2,3,9: how much rubbish on the screen * 1: detailed output, 0: concise output * 0: user tree; 1: semi-automatic; 2: automatic * 3: StepwiseAddition; (4,5):PerturbationNNI seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table clock = 0 * 0: no clock, unrooted tree, 1: clock, rooted tree model = 2 * models for codons: * 0:one, 1:b, 2:2 or more dn/ds ratios for branches NSsites = 2 icode = 0 * dn/ds among sites. 0:no variation, 1:neutral, 2:positive * 0:standard genetic code; 1:mammalian mt; 2-10:see below fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated kappa = 2 * initial or fixed kappa fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate omega = 1 * initial or fixed omega, for codons or codon-transltd AAs fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha alpha =.0 * initial or fixed alpha, 0:infinity (constant rate) Malpha = 0 * different alphas for genes ncatg = 4 * # of categories in the dg or AdG models of rates getse = 0 * 0: don't want them, 1: want S.E.s of estimates RateAncestor = 1 * (1/0): rates (alpha>0) or ancestral states (alpha=0) fix_blength = 1 * 0: ignore, -1: random, 1: initial, 2: fixed method = 0 * 0: simultaneous; 1: one branch at a time