MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

Similar documents
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Dr. Amira A. AL-Hosary

BECAUSE microsatellite DNA loci or short tandem

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Effects of Gap Open and Gap Extension Penalties

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

7. Tests for selection

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Phylogenetic inference

Phylogenetic analyses. Kirsi Kostamo

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

A (short) introduction to phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Letter to the Editor. Department of Biology, Arizona State University

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Theory of Evolution Charles Darwin

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Processes of Evolution

Estimating Evolutionary Trees. Phylogenetic Methods

Concepts and Methods in Molecular Divergence Time Estimation

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Phylogenetic Tree Reconstruction

Anatomy of a species tree

Constructing Evolutionary/Phylogenetic Trees

I of a gene sampled from a randomly mating popdation,

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

A Statistical Test of Phylogenies Estimated from Sequence Data

Estimating Divergence Dates from Molecular Sequences

Sequence evolution within populations under multiple types of mutation

Phylogenetics: Building Phylogenetic Trees

FIG S1: Plot of rarefaction curves for Borrelia garinii Clp A allelic richness for questing Ixodes

reciprocal altruism by kin or group selection can be analyzed by using the same approach (6).

Compartmentalization detection

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

C3020 Molecular Evolution. Exercises #3: Phylogenetics

LINKAGE DISEQUILIBRIUM, SELECTION AND RECOMBINATION AT THREE LOCI

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Conservation Genetics. Outline

Intraspecific gene genealogies: trees grafting into networks

BINF6201/8201. Molecular phylogenetic methods

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Taming the Beast Workshop

Cladistics and Bioinformatics Questions 2013

EVOLUTIONARY DISTANCES

Classification and Phylogeny

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Expected coalescence times and segregating sites in a model of glacial cycles

POPULATION GENETICS Biology 107/207L Winter 2005 Lab 5. Testing for positive Darwinian selection

Kei Takahashi and Masatoshi Nei

Classification and Phylogeny

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

How robust are the predictions of the W-F Model?

8/23/2014. Phylogeny and the Tree of Life

C.DARWIN ( )

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Chapter 26 Phylogeny and the Tree of Life

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogenetics in the Age of Genomics: Prospects and Challenges

Population Structure

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Algorithms in Bioinformatics

Molecular Markers, Natural History, and Evolution

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Unit 7: Evolution Guided Reading Questions (80 pts total)

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Molecular Evolution, course # Final Exam, May 3, 2006

Neutral behavior of shared polymorphism

Phylogenetic methods in molecular systematics

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

The Phylogenetic Handbook

Outline. Classification of Living Things

Introduction to Bioinformatics Introduction to Bioinformatics

Self-incompatibility alleles from Physalis: Implications for historical inference from balanced genetic polymorphisms

Figure A1. Phylogenetic trees based on concatenated sequences of eight MLST loci. Phylogenetic trees were constructed based on concatenated sequences

Multiple Sequence Alignment. Sequences

Theory of Evolution. Charles Darwin

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Transcription:

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were discussed. At the present time, three different methods, i.e., neighbor-joining, parsimony, and likelihood methods, are commonly used for constructing phylogenetic trees from DNA or protein sequences. Neighbor-joining is simplest and produces reasonable trees in most cases as long as sufficient amounts of data are used. Likelihood is the most sophisticated method and requires rather stringent conditions. In practice, however, all three methods generally give similar trees particularly when bootstrap tests are used for examining the reliability of the trees obtained (Nei 1996). That is, phylogenetic clusters that are supported by bootstrap tests are almost always the same for the trees constructed by the three methods. The primary factor for determining the reliability of phylogenetic trees is not the statistical method used but the quality and amount of data used (Russo et al. 1996). It is recommended that a phylogenetic tree always be presented with proper branch lengths and bootstrap values. Linearized trees: In recent years it has become popular to estimate the time of divergence between two different groups of organisms (e.g., monocotyledons and dicotyledons). In this case it is important first to test the hypotheses of molecular clock and then eliminate the evolutionary lineages (or genes) that do not obey the molecular clock. Once these lineages (or genes) are eliminated, it is possible to construct a linearized tree under the assumption of a constant rate of evolution (Takezaki et al. 1995). Gene diversity analysis: The extent of genetic diversity within populations is usually measured by gene diversity or heterozygosity (H) for allele frequency data and by nucleotide diversity (7t) for nucleotide sequence data (Nei 1987). The statistical methods for measuring these quantities are well established. By contrast, there is a controversy about the measure of genetic diversity among subpopulations. The extent of this diversity for allele frequency data is measured by Wright's (1951) Fs-r, Cockerham's (1969) 9 ornei's (1973, 1977) G^ The first two measures depend on the assumption that all populations diverged at the same time and the effective population size is the same for all of them (Figure 1A). Pons and Petit (1995) extended Nei's mathematical formula, but conceptually their formulation depends on the same assumption as Wright's. However, the assumption used by Wright is never satisfied in nature; all natural populations have a phylogenetic structure and the effective population size varies from population to population (Figure 1B). The phylogenetic relationship is unique for each set of subpopulations. For this reason, Nei proposed a coefficient of genetic differentiation (Ggy), which is independent of the population history and measures the extent of genetic variation for a given set of existing populations. Therefore, this is applicable to any group of populations. In this approach, Gyr is defined by GsT=(H,-Hs)/H^, where H; and H-r are the average gene diversity within subpopulations and the gene diversity for the entire population, respectively. G^ is estimated by the method given by Nei (1987). When Gs-r is estimated by using data from many loci, the variance of the estimate can be obtained by the bootstrap or thejackknife method. Nucleotide diversity analysis: The extent of interpopulational nucleotide variation may be measured by Rgr (=ys-r) proposed by Nei (1982). It is defined as RST =^ ""^sv^t, ' Evan Pugh Professor of Biology, Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, PA 16802, USA 78

where 713 and n-r are the average nucleotide diversity within subpopulations and the nucleotide diversity for the entire population. Rsj can be estimated by using estimates ofiis an^ "T defined by Nei (1982, 1987). The variance of the estimate ofrsj can be obtained by the bootstrap method. Pons and Petit (1996) defined n-, somewhat differently under the assumption ofwright's model of population differentiation. However, when the number of subpopulations is equal to or greater than 5, the difference between the two formulations is negligibly small. Computer programs: Various computer programs for phylogenetic and genetic diversity analyses have been developed in our laboratory. To obtain these programs, see the website, hppt://www.bio.psu.edu/imeg/neilab.htm. (A) Wright's model of population differentiation(b) Real population differentiation Populations 1 2 Figure 1. Wright's mathematical model of population differentiation and real differentiation. In diagram (A) all populations are assumed to be equally related with one another. Diagram (B) shows one schematic example of real population differentiation, in which different populations have different genetic relationships. Cockerham, C. C. 1969. Variance of gene frequencies. Evolution 23: 72-84. LITERATURE CITED Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70: 3321-3323. Nei M. 1977. F-statisticsandanalysisofgenediversityinsubdividedpopulations.Ann.Hum.Genet.41: 225-233. Nei M. 1982. Evolution of human races at the gene level. In: B. Bonne-Tamir (Editor), Human Genetics, Part A: The Unfolding Genome. Alan R. Liss, Inc., New York, pp. 167-181. Nei,M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York. Nei, M. 1996. Phylogenetic analysis in molecular evolutionary genetics. Annu.Rev.Genet.30: 371-403. Pons, 0. and R. J. Petit. 1995. Estimation, variance and optimal sampling of genetic diversity. I. Haploid locus. Theo'r. Appl. Genet. 90: 462-470. Pons 0. and R. J. Petit. 1996. Measuring and testing genetic differentiation with ordered versus unordered alleles. Genetics 144: 1237-1245. Russo C. A. M., N. Takezaki, and M. Nei. 1996. Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phytogeny. Mol.Biol.Evol. 13: 525-536. 79

where 713 and n-r are the average nucleotide diversity within subpopulations and the nucleotide diversity for the entire population. Rsj can be estimated by using estimates ofiis an^ "T defined by Nei (1982, 1987). The variance of the estimate ofrsj can be obtained by the bootstrap method. Pons and Petit (1996) defined n-, somewhat differently under the assumption ofwright's model of population differentiation. However, when the number of subpopulations is equal to or greater than 5, the difference between the two formulations is negligibly small. Computer programs: Various computer programs for phylogenetic and genetic diversity analyses have been developed in our laboratory. To obtain these programs, see the website, hppt://www.bio.psu.edu/imeg/neilab.htm. (A) Wright's model of population differentiation(b) Real population differentiation Populations 1 2 Figure 1. Wright's mathematical model of population differentiation and real differentiation. In diagram (A) all populations are assumed to be equally related with one another. Diagram (B) shows one schematic example of real population differentiation, in which different populations have different genetic relationships. Cockerham, C. C. 1969. Variance of gene frequencies. Evolution 23: 72-84. LITERATURE CITED Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70: 3321-3323. Nei M. 1977. F-statisticsandanalysisofgenediversityinsubdividedpopulations.Ann.Hum.Genet.41: 225-233. Nei M. 1982. Evolution of human races at the gene level. In: B. Bonne-Tamir (Editor), Human Genetics, Part A: The Unfolding Genome. Alan R. Liss, Inc., New York, pp. 167-181. Nei,M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York. Nei, M. 1996. Phylogenetic analysis in molecular evolutionary genetics. Annu.Rev.Genet.30: 371-403. Pons, 0. and R. J. Petit. 1995. Estimation, variance and optimal sampling of genetic diversity. I. Haploid locus. Theo'r. Appl. Genet. 90: 462-470. Pons 0. and R. J. Petit. 1996. Measuring and testing genetic differentiation with ordered versus unordered alleles. Genetics 144: 1237-1245. Russo C. A. M., N. Takezaki, and M. Nei. 1996. Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phytogeny. Mol.Biol.Evol. 13: 525-536. 79

Takezaki, N., A. Rzhetsky and M. Nei. 1995. Phylogenetic test of the molecular clock and linearized tree. Mol. f Biol. Evol. 12: 823-833. Wright, S. 1951. The genetical structure of populations. Ann. Eugen. 15: 323-354. 80