Australian bird data set comparison between Arlequin and other programs

Similar documents
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Notes 20 : Tests of neutrality

Genetic Variation in Finite Populations

Neutral Theory of Molecular Evolution

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Population Structure

The Combinatorial Interpretation of Formulas in Coalescent Theory

Space Time Population Genetics

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

How robust are the predictions of the W-F Model?

122 9 NEUTRALITY TESTS

Stochastic Demography, Coalescents, and Effective Population Size

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Frequency Spectra and Inference in Population Genetics

NEUTRAL EVOLUTION IN ONE- AND TWO-LOCUS SYSTEMS

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Cluster analysis of Y-chromosomal STR population data using discrete Laplace distributions

Population Genetics I. Bio

Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data

Evolution & Natural Selection

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

I of a gene sampled from a randomly mating popdation,

6 Introduction to Population Genetics

Introduction to population genetics & evolution

URN MODELS: the Ewens Sampling Lemma

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg

Classical Selection, Balancing Selection, and Neutral Mutations

The Wright-Fisher Model and Genetic Drift

Heterozygosity is variance. How Drift Affects Heterozygosity. Decay of heterozygosity in Buri s two experiments

Linear Regression (1/1/17)

Theoretical Population Biology

The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

Population genetics snippets for genepop

6 Introduction to Population Genetics

Lecture WS Evolutionary Genetics Part I 1

AARMS Homework Exercises

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Genetic hitch-hiking in a subdivided population

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Lecture 13: Variation Among Populations and Gene Flow. Oct 2, 2006

Computational Systems Biology: Biology X

Conservation Genetics. Outline

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

DISTRIBUTION OF NUCLEOTIDE DIFFERENCES BETWEEN TWO RANDOMLY CHOSEN CISTRONS 1N A F'INITE POPULATION'

Closed-form sampling formulas for the coalescent with recombination

On the inadmissibility of Watterson s estimate

Outline for today s lecture (Ch. 14, Part I)

Segregation versus mitotic recombination APPENDIX

p(d g A,g B )p(g B ), g B

Mathematical models in population genetics II

Genetic Changes Lesson 2 HW

Chapter 2 Section 1 discussed the effect of the environment on the phenotype of individuals light, population ratio, type of soil, temperature )

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Runaway. demogenetic model for sexual selection. Louise Chevalier. Jacques Labonne

Supporting Information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

NOTES CH 17 Evolution of. Populations

Population Genetics: a tutorial

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Lecture Notes: BIOL2007 Molecular Evolution

7. Tests for selection

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Outline of lectures 3-6

Evolution & Natural Selection

BIRS workshop Sept 6 11, 2009

NOTES Ch 17: Genes and. Variation

9 Genetic diversity and adaptation Support. AQA Biology. Genetic diversity and adaptation. Specification reference. Learning objectives.

Natural Selection results in increase in one (or more) genotypes relative to other genotypes.

VARIANCE AND COVARIANCE OF HOMOZYGOSITY IN A STRUCTURED POPULATION

Processes of Evolution

The Wright Fisher Controversy. Charles Goodnight Department of Biology University of Vermont

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Neutral behavior of shared polymorphism

I negligible, and in this case it is possible to construct an evolutionary tree EVOLUTIONARY RELATIONSHIP OF DNA SEQUENCES IN FINITE POPULATIONS

Evolution and the Genetics of Structured populations. Charles Goodnight Department of Biology University of Vermont

Genes and DNA. 1) Natural Selection. 2) Mutations. Darwin knew this

Unifying theories of molecular, community and network evolution 1

Biology 11 UNIT 1: EVOLUTION LESSON 2: HOW EVOLUTION?? (MICRO-EVOLUTION AND POPULATIONS)

Demography April 10, 2015

Levels of genetic variation for a single gene, multiple genes or an entire genome

Darwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection

Inventory Model (Karlin and Taylor, Sec. 2.3)

Lecture 2: Introduction to Quantitative Genetics

Rare Alleles and Selection

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Notes on Population Genetics

Evolution and Diversification of Life

Genetic adaptation to captivity in species conservation programs

Population Genetics. Tutorial. Peter Pfaffelhuber, Pleuni Pennings, and Joachim Hermisson. February 2, 2009

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author

Genotype Imputation. Biostatistics 666

Transcription:

Australian bird data set comparison between Arlequin and other programs Peter Beerli, Kevin Rowe March 7, 2006 1 Data set We used a data set of Australian birds in 5 populations. Kevin ran the program Arlequin over all populations and all 10 loci. Results are in the Appendix. Several interesting things popped up: 1.1 Polymorphic sites versus substitutions Arlequinreports 57 segregating sites (variable sites) that translates into 53 polymorphic sites. Obviously only 4 mutations were singletons. This is somewhat striking given the data set. What happened to the missing sites? 1.2 Population size θ estimates Arlequin reports several values for the population size and it seems that it uses only the polymorphic sites, which is wrong, even for the Watterson estimator θ = S n 1 i=1 1 i (1) we should use all segregating sites. Using (1) we calculate with 57 segregating sites in 96 samples a value of θ = 11.0974 versus a reported value of 10.3186 based on 53 polymorphic sites. The result does not look very different so when we calculate the population size per site instead of locus: Θ (57,96) = θ (57,96) /3766 = 0.00294673 Θ (53,96) = θ (53,96) /3766 = 0.00273994 Arlequintreated all loci as blocks in a single locus and so errs on the analysis [add here later with the per locus analysis]. Another problem seems to be the differentiation between all sites (3766) or 1

usable sites (867). I bet that several papers are out there that are unclear about how to calculate the population size per site. Calculating population sizes taking into account different population models is feasible if we know the variance of the number of offspring using the values above and the relationship that the population size in a Canning model is θ = N e µ/σ 2, if we use the Wright-Fisher population size θ W F as a yardstick then we can make the relationship θ C = θ W F σ 2 Wright-Fisher uses a offspring n umber variance of 1, yielding both sizes the same. The population size θ M of a Moran model is half the size of a Wright-Fisher model. θ W F was derived under the assumption that all parents die every generation. We can express this survival into the next generation with a ratio N e k/(2n) with k individuals dying every time unit. In a Moran model k = 1 and so N e /(2N) = 1/2 and so we can get θ M = θ W F 2 = θ W F 2/(2N) An example of the influence of the model onto the estimate: Θ W F = 0.00274; Θ C = 0.274 with σ 2 = 0.01, or Θ C = 0.0000274 with σ 2 = 100; and Θ M = 0.00274/2.91910 6 = 1876.82 with µ = 10 9 per site and generation. Translating all this into effective population size using µ results in: N W F = 684, 985,N C = 6850 and N C = 68, 498, 500, respectively; and N M = 1, 876, 820, 000, 000. Of course, we know that we are not waist-deep in these birds and so the Moran model is probably an unlikely candidate, but the the Canning model with low variance might describe the total popualtion size of these birds better. [I want to address differences due to estimation method and also see at the variance due to loci, this will all come in a followup] 2 Appendix 2.1 All populations together RUN NUMBER 1 (07/03/06 at 15:41:17) Project information: NbSamples = 1 DataType = DNA GenotypicData = 0 ============================== Settings used for Calculations 2

============================== General settings: ----------------- Deletion Weight = 1 Transition Weight Weight = 1 Tranversion Weight Weight = 1 Epsilon Value = 1e-06 Significant digits for output = 5 Use original haplotype definition Alllowed level of missing data = 0.05 Active Tasks: ------------- Molecular Diversity: Molecular Distance :Pairwise difference GammaA Value = 0 Theta estimators : Theta(Hom) Theta(S) Theta(k) Theta(Pi) Tajima s selective neutrality test --------------- Ewens-Watterson neutrality test ------------ No. of Simultated Samples = 10000 Fu s Neutraliy test: No. of Simultated Samples = 10000 Warning: The locus separator has been removed ------- ============================================================================== == ANALYSES AT THE INTRA-POPULATION LEVEL ============================================================================== =============================================================================== == Sample : AllBirds =============================================================================== ================================ == Molecular diversity indices : (AllBirds) ================================ Reference: Tajima, F., 1983. Tajima, F. 1993. Nei, M., 1987. Zouros, E., 1979. Ewens, W.J. 1972. Sample size : 96 No. of haplotypes : 96 Deletion weight : 1 Transition weight : 1 Transversion weight : 1 Allowed level of missing data : 5 % Number of observed transitions : 36 3

Number of observed transversions : 21 Number of substitutions : 57 Number of observed indels : 0 Number of polymorphic sites : 53 Number of observed sites with transitions : 35 Number of observed sites with transversions : 21 Number of observed sites with substitutions : 53 Number of observed sites with indels : 0 Number of observed nucleotide sites : 3766 Number of usable nucleotide sites : 867 Nucleotide composition (Relative values) C : 24.97% T : 24.79% A : 27.25% G : 22.99% Total :100.00% Distance method : Pairwise difference (no Gamma correction) Mean number of pairwise differences : 8.968640 +/- 4.167432 Nucleotide diversity : 0.010344 +/- 0.005324 (Standard deviations are for both the sampling and the stochastic processes) Unable to compute Theta(Hom) when all gene copies are different Unable to compute Theta(k) when all gene copies are different Theta(S) : 10.318618 S.D. Theta(S) : 2.846638 Theta(Pi) : 8.968640 S.D. Theta(Pi) : 4.616216 ========================================== == Tajima s test of selective neutrality : (AllBirds) ========================================== Reference: Tajima, F. 1989a. Tajima, F., 1996. Sample size : 96 No. of sites with substitutions (S) : 53 Mean No. of pairwise differences (Pi) : 8.96864 Distance method : Pairwise difference (no Gamma correction, indels not taken into account) Tajima s D : -0.41913 P(D random < D obs) : 0.35961 (Beta distribution aproximation) No. of simulations : 10000 Obs. Theta(S) : 10.31862 Mean Theta(S) : 10.30855 S.D. Theta(S) : 2.91849 Mean D : -0.08598 S.D. D : 0.90049 P(D simul < D obs) : 0.39370 ================================================== == Ewens-Watterson tests of selective neutrality : (AllBirds) ================================================== Reference: Ewens, W.J. 1972. Watterson, G., 1975. 4

Stewart, F. M. 1977. Slatkin, M. 1994b. Slatkin, M., 1996. No. of genes in sample : 96.00000 No. of alleles in sample : 96 The test is impossible because all gene copies are different =============================================== == Fu s Fs test of selective neutrality : (AllBirds) =============================================== Reference: Fu, Y. X. (1996). Original No. of alleles(k) : 96 Theta(Pi) : 8.96864 Exp. No. of alleles : 22.52870 Fs : -11.22769 No. of simulations :10000 Mean Theta(Pi) : 9.00305 S.D. Theta(Pi) : 4.64860 Mean k : 22.53300 S.D. k : 3.67288 Prob(sim_Fs <=obs_fs) : 0.00970 END OF RUN NUMBER 1 (07/03/06 at 15:45:25)) Total computing time for this run : 0h 0m 14s 941 ms 5