Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Size: px
Start display at page:

Download "Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #"

Transcription

1 Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either very deleterious or have selective effect s such that the fitness of a heterozygote is 1 s relative to the wildtype, and a homozygous individual has fitness 1 2s 1. As in most population genetic models, we can only estimate the product of the selection coefficient and the population size! = 2N e s ). The strength of purifying selection against very deleterious mutations is captured in the non-lethal non-synonymous mutation rate parameter! r = 2N e µ f 0 where N e is the effective population size, µ is the mutation rate, and f 0 is the fraction of mutations that are non-lethal. Synonymous sites are assumed to evolve neutrally with mutation rate! s = 2N e µ. We assume the mutation rate parameters among genes are independent of one another to capture the fact that some genes are subject to very strong purifying selection while others may experience relatively weak purifying selection. Likewise, we make no assumption about how neutral mutation rate varies from gene to gene. Using the results of Sawyer and Hartl 1992) 1, we model the cell entries in the McDonald- Kreitman tables 2 for a given gene as Poisson random variables with the following expected values: Fixed Segregating Silent! s " 1 m 1 n)1 # & 1 $ % n'! s * i 2 Replacement! r " G,m) G,n))! 1) e )2 r F,n) i=1 1.1) where F!,n) and G!,n) are integrals over of the distribution of mutation frequencies that depend on the selection coefficient for the gene and the number of sequences sampled from species 1 n) and species 2 m) see Sawyer and Hartl, 1992):

2 Bustamante et al., Supplementary Nature Manuscript # 2 out of 9 F!,n) = # # " x n " 1 " x) n ) x1" x) G!,n) = x n"1 1 " x) "1 0 "2! 1" x) 1 " e dx 1" e "2! "2! 1" x) 1" e dx 1 " e "2! 1.2) Equation 1.1) assume that polymorphism data is only being modelled for one species as is the case for our study. The parameter! is the number of generations since species divergence divided by twice the effective population size. We treat this quantity as constant among genes and estimate it using all of the data. Since the above equations are only strictly true under the assumption of independence among sites and a population of constant size, we use simulations to gauge the accuracy and robustness of our results to model misspecification. We write down the joint probability of the MK data given the selection coefficient and mutation rate parameters for each of the G genes in our sample and time since species divergence as the product of the individual entries in the MK tables for all genes: G Pr{P,D!,",#} = % % Pr{D c,i! c,i,#," c,i }Pr{P c,i! c,i," c,i } 1.3) i=1 c${n,s} where P c,i is the number SNPs and D c,i is the number of fixed difference of type c either synonymous or non-synonymous) in gene i. We treat all synonymous sites as neutral! S,i = 0 for all i) and obtain the probabilities on the right hand side of 1.3) directly from the Poisson distribution with mean given by the corresponding entry in 1.1). To obtain the posterior distribution on! i conditional on the species divergence time, we use a Normal prior with mean 0 and standard deviation of! = 8 such that

3 Bustamante et al., Supplementary Nature Manuscript # 3 out of 9 & ) F! j,n) Pr{! i P N,i, D N,i,"} # 2! $ i F! j,n) i " G! 1% e %2! i i,m) G! i,n) ' )* & 2! i 1% e %2! i 2! $ i F! j,n) i ' 1% e %2! i & 2! $ i F! j,n) i ' 1% e %2! i ) " G! i,m) G! i,n))* " G! i,m) G! i,n)) ) " G! i,m) G! i,n))*, i D N,i, i e %! i P N,i 1.4) where! i and! i are parameters of a Gamma prior distribution on the mutation rate for the locus in practice we set these to 0.01 for all genes, which makes them uninformative). The first two terms on the right-hand side of expression 1.4) represent the conditional probability given " of observing P N,i non-synonymous polymorphisms and D N,i non-synonymous fixed differences, respectively; the third term comes from the prior distribution on the mutation rate, a parameter which has been integrated out of the posterior distribution, and the fourth term is the prior distribution of ". In order to classify individual loci as positively or negatively selected, we will focus on quantifying the posterior probability for a given gene that it s selection coefficient is greater or less) than 0 given the observed data for the gene P i = Pr{! i > 0 P N,i, D N,i }). If P is greater than 97.5%, this is mathematically equivalent to saying that the 95% highest posterior density credibility intervals Bayesian confidence interval) for the selection coefficient are above 0 and we classify such genes as positively selected. Likewise, if P! is greater than 97.5% for a given locus, this is equivalent to the corresponding 95% CI being completely below 0 and we classify these as negatively selected. We will estimate this quantity using the usual Monte Carlo estimator: P i = Pr{! i > 0 P N,i, D N,i } " 1 # M I! m) > 0) i 1.5) M m=1

4 Bustamante et al., Supplementary Nature Manuscript # 4 out of 9 where I!) is the indicator function which takes on the value 1 if the argument is true and 0, otherwise, and! i m) is the value of! i at step m in a Markov Chain Monte Carlo algorithm. All posterior probabilities reported here are from 50,000 retained draws from 10 chains each of length 50,000 steps sampled using the Markov Chain Monte Carlo algorithm and convergence criteria previously described with the modification that the genomic distribution of selective effects is not updated 3-5. This simplification is made so that the marginal posterior distributions of the selection coefficient are conditionally independent of one another and can be pooled for further analysis in terms of molecular function and biological process. Simulations A potential concern is the robustness of our analysis to deviations from the assumptions of the Sawyer and Hartl Poisson Random Field model used to analyze the data. That is, could nonstandard demography produce genomic patterns of variation that we may misinterpret as signatures of selection? To address this issue, we have simulated data using standard coalescent algorithms as implemented in the computer program ms under complete linkage within genes and three neutral demographic scenarios 6. For all simulations, we used 10,000 replicates with 79 chromosomes. This mimics the sampling structure of the Celera data: 38 African-American and 40 European American with 1 chromosome representing the chimpanzee sequences used where chimp SNPs were excluded from the analysis. We assumed a mutation rate of! = 2 with half of the neutral mutations as synonymous and half non-synonymous. This parameter was chosen since close to half of the SNPs in our data are non-synonymous and half are synonymous see Figure1A). Our choice of mutation rate is twice the average estimate of the mutation rate, and makes our results conservative, since the smaller the mutation rate, the better the Poisson approximation to the cell entries. For all models, we used a human-chimpanzee divergence of! = 10. The demographic models considered are:

5 Bustamante et al., Supplementary Nature Manuscript # 5 out of 9 a) Panmixia among humans all 78 chromosomes from a randomly mating population) with constant size i.e., the standard neutral model). b) Population structure model A: 40 European American chromosomes drawn from one population and 38 African-American drawn from another, with a migration rate of M = 4N e m = 1 per generation. The European-American population undergoes a population contraction backwards in time at time 0.1* 2N e generations back in time thousand years ago) of 90% while the African-American population has a 50% reduction. The two human populations are then joined at time 0.25 in units of 2N e generations ~ K years ago). The human and chimpanzee population are joined at time 10 ~5 million years ago). c) Population structure model B: same as above except twice the migration rate. In figure 1B we report the distribution of Posterior probabilities that the selection coefficient for a gene is above 0 for each of the three models considered here as well as for the Celera data. It is important to keep in mind that posterior probabilities are not the same as P- values, so there is no theoretical reason for them to follow a uniform distribution as would be the case for P-values if the null hypothesis is true). The Celera data has a clear excess of genes with high and low posterior probabilities i.e., too many in the <1%, 1-5%, and >99% categories) regardless of which demographic model is used as the null. The signature is particularly strong for negative selection this may be partly due to power). Note, in this figure, we have conditioned as in the data on using only loci with at least 4 variable amino acids in the alignment. Model Diagnostics In Figure 1S, we summarize the posterior mean of the selection parameter! = E2Ns Data) for genes with at least two variable amino acid sites in the human-chimp alignments as a function of six aspects of the data *all correlations are based the square-root

6 Bustamante et al., Supplementary Nature Manuscript # 6 out of 9 transformation of the raw data). We see that d S, the per synonymous site species substitution rate, is slightly correlated with the posterior mean of the selection coefficient r = ± 0.035; P < 10!3 ). This may be explained by the fact that! is strongly positively correlated with d N, the non-synonymous species substitution rate r = ± 0.021; P < 10!16 ) and that these quantities are, themselves, correlated, r = ± 0.034; P < 10!7 ). The former correlation between! and d N is expected, since the rate of amino acid substitution should increase with the strength of selection ". The latter correlation between d N and d S has been previously documented for samples of size n = 1 from each species across a variety of methods. We also observe a significant moderate negative correlation between! and p S r =!0.139 ± 0.034; P < 10!14 ) and a strong negative correlation between! with p N r =!0.665 ± 0.021; P < 10!16 ). The latter correlation is not expected, but can be explained by a consideration of power. That is, mkprf relies on the ratio of replacement divergence to replacement polymorphism to estimate "!! if a gene has low levels of amino acid polymorphism and high levels of amino acid divergence, then this is consistent with strong positive selection and low mutation rate. This signal will be amplified if a gene has experienced very recent positive selection, since genetic hitchhiking will reduce amino acid polymorphism. Likewise, we observe a positive correlation between the d N / d S ratio and the posterior mean of the selection coefficient r = ± 0.028; P < 10!16 ) and a negative correlation between p N / p S and! r =!0.282 ± 0.032; P < 10!16 ). This illustrates one important aspect of our analysis that differs from previous work 7 namely, that we can detect evidence for positive selection in the presence of selective constraint. Our power to detect selection is dependent on the observed cell entries in the McDonald- Kreitman table. Since genes of longer length will, generally, have more mutations and, thus, more variation per gene, we were concerned that the effects we observe could be due to spurious correlation. This might occur, for example, if longer genes have more amino acid polymorphism

7 Bustamante et al., Supplementary Nature Manuscript # 7 out of 9 and are, thus, overrepresented in the set of negatively selected genes. In order to assess this issue, we plotted the distribution of the log-odds posterior of negative selection P! < 0 Data) log as a function of the length of the aligned human-chimpanzee coding regions P! > 0 Data) see Figure 2S). There appears to be little or no correlation; therefore, differences in length among genes of different molecular functions and biological processes contributes little, if anything, to the discrepancy in the proportion of genes we classify as positively or negatively selected. Posterior Distribution of Human-Chimpanzee Species Divergence time. As part of our analysis, we also obtain a very precise estimate of the scaled humanchimpanzee species divergence,!. Based on 50,000 retained draws of our MCMC algorithm we obtain a posterior mean of 9.57 in units of 2N e generations assuming the human, chimpanzee, and ancestral populations are of roughly equal size) with 95% credibility intervals of 9.37, 9.77). Using a human/chimp generation time of 25 years and a long-term effective population size of 10,000, this corresponds to 4.78 million years ago. Our 95% confidence intervals holding generation time and 2N e fixed is a narrow range of mya to mya. Given our uncertainty in the long-term effective population size of humans and chimpanzees as well as variation in generation time, we have surely overestimated our confidence in the credibility interval of the divergence time. However, the likelihood function in our model is only dependent on the scaled time, which we have estimated with high precision. Counting synonymous and non-synonymous sites The number of synonymous and non-synonymous sites per gene have been counted using the underlying nucleotide context dependent mutation rates found by Hwang and Green 8, which assumes that the mutation rate from one nucleotide to another is dependent on the site s two

8 Bustamante et al., Supplementary Nature Manuscript # 8 out of 9 flanking nucleotides. This method is able to account for many mutation biases such as the hypermutability of C p G dinucleotides, transition/transversion biases, as well as many other subtle effects. Calculation of the total number of non-synonymous or synonymous) sites in a gene is then performed by summing over the mutation rates at each site that would or would not) result in an amino acid change relative to the overall mutability of the site. Missing or ambiguous data in the human-chimp alignment, as well as changes to and from stop codons were excluded. References 1. Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, ). 2. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, ). 3. Barrier, M., Bustamante, C. D., Yu, J. & Purugganan, M. D. Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics 163, ). 4. Gilad, Y., Bustamante, C. D., Lancet, D. & Paabo, S. Natural selection on the olfactory receptor gene family in humans and chimpanzees. Am J Hum Genet 73, ). 5. Bustamante, C. D. et al. The cost of inbreeding in Arabidopsis. Nature 416, ). 6. Hudson, R. R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, ). 7. Clark, A. G. et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302, ).

9 Bustamante et al., Supplementary Nature Manuscript # 9 out of 9 8. Hwang, D. G. & Green, P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A 101, ).

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012 Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based

More information

The genomic rate of adaptive evolution

The genomic rate of adaptive evolution Review TRENDS in Ecology and Evolution Vol.xxx No.x Full text provided by The genomic rate of adaptive evolution Adam Eyre-Walker National Evolutionary Synthesis Center, Durham, NC 27705, USA Centre for

More information

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

LETTERS. Natural selection on protein-coding genes in the human genome

LETTERS. Natural selection on protein-coding genes in the human genome Vol 437 20 October 2005 doi:10.1038/nature04240 Natural selection on protein-coding genes in the human genome Carlos D. Bustamante 1, Adi Fledel-Alon 1, Scott Williamson 1, Rasmus Nielsen 1,2, Melissa

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

122 9 NEUTRALITY TESTS

122 9 NEUTRALITY TESTS 122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that

More information

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall

More information

Supporting Information

Supporting Information Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p. Lecture 8 - Selection and Tests of Neutrality Gibson and Muse, chapter 5 Nei and Kumar, chapter 2.6 p. 258-264 Hartl, chapter 3, p. 22-27 The Usefulness of Theta Under evolution by genetic drift (i.e.,

More information

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events 9 Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events Model selection is a process of seeking the least inadequate model from a predefined set, all of which may be grossly

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

Supporting information for Demographic history and rare allele sharing among human populations.

Supporting information for Demographic history and rare allele sharing among human populations. Supporting information for Demographic history and rare allele sharing among human populations. Simon Gravel, Brenna M. Henn, Ryan N. Gutenkunst, mit R. Indap, Gabor T. Marth, ndrew G. Clark, The 1 Genomes

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,

Estimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Genetics: Published Articles Ahead of Print, published on November 19, 2005 as 10.1534/genetics.105.047217 Estimating selection on non-synonymous mutations Laurence Loewe 1, Brian Charlesworth, Carolina

More information

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events

Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events 10 Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events Model selection is a process of seeking the least inadequate model from a predefined set, all of which may be grossly

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Neutral behavior of shared polymorphism

Neutral behavior of shared polymorphism Proc. Natl. Acad. Sci. USA Vol. 94, pp. 7730 7734, July 1997 Colloquium Paper This paper was presented at a colloquium entitled Genetics and the Origin of Species, organized by Francisco J. Ayala (Co-chair)

More information

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009 Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?

More information

Effective population size and patterns of molecular evolution and variation

Effective population size and patterns of molecular evolution and variation FunDamental concepts in genetics Effective population size and patterns of molecular evolution and variation Brian Charlesworth Abstract The effective size of a population,, determines the rate of change

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

Gene expression differences in human and chimpanzee cerebral cortex

Gene expression differences in human and chimpanzee cerebral cortex Evolution of the human genome by natural selection What you will learn in this lecture (1) What are the human genome and positive selection? (2) How do we analyze positive selection? (3) How is positive

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

Selection and Population Genetics

Selection and Population Genetics Selection and Population Genetics Evolution by natural selection can occur when three conditions are satisfied: Variation within populations - individuals have different traits (phenotypes). height and

More information

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs

Statistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs Statistical Tests for Detecting Positive Selection by Utilizing High Frequency SNPs Kai Zeng *, Suhua Shi Yunxin Fu, Chung-I Wu * * Department of Ecology and Evolution, University of Chicago, Chicago,

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Population Genetics I. Bio

Population Genetics I. Bio Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Lecture Notes: BIOL2007 Molecular Evolution

Lecture Notes: BIOL2007 Molecular Evolution Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits

More information

Gene regulation: From biophysics to evolutionary genetics

Gene regulation: From biophysics to evolutionary genetics Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

I of a gene sampled from a randomly mating popdation,

I of a gene sampled from a randomly mating popdation, Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Diffusion Models in Population Genetics

Diffusion Models in Population Genetics Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Neutral Theory of Molecular Evolution

Neutral Theory of Molecular Evolution Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA

Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA Rasmus Nielsen* and Ziheng Yang *Department of Biometrics, Cornell University;

More information

Supporting Information

Supporting Information Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of

More information

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Challenges when applying stochastic models to reconstruct the demographic history of populations. Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017 Outline 1 Introduction 2 Inverse

More information

It has been more than 25 years since Lewontin

It has been more than 25 years since Lewontin Population Genetics of Polymorphism and Divergence Stanley A. Sawyer, and Daniel L. Hartl Department of Mathematics, Washington University, St. Louis, Missouri 6313, Department of Genetics, Washington

More information

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure

Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution Classical vs. balanced views of genome structure - the proposal of the neutral theory by Kimura in 1968 led to the so-called neutralist-selectionist

More information

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim Evolu&on, Popula&on Gene&cs, and Natural Selec&on 02-710 Computa.onal Genomics Seyoung Kim Phylogeny of Mammals Phylogene&cs vs. Popula&on Gene&cs Phylogene.cs Assumes a single correct species phylogeny

More information

Supporting Information Text S1

Supporting Information Text S1 Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Natural selection on the molecular level

Natural selection on the molecular level Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change

More information

Robust demographic inference from genomic and SNP data

Robust demographic inference from genomic and SNP data Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics

More information

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Ze Zhang,* Z. W. Luo,* Hirohisa Kishino,à and Mike J. Kearsey *School of Biosciences, University of Birmingham,

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Introduction to Advanced Population Genetics

Introduction to Advanced Population Genetics Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency

More information

The neutral theory of molecular evolution

The neutral theory of molecular evolution The neutral theory of molecular evolution Introduction I didn t make a big deal of it in what we just went over, but in deriving the Jukes-Cantor equation I used the phrase substitution rate instead of

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Haplotype-based variant detection from short-read sequencing

Haplotype-based variant detection from short-read sequencing Haplotype-based variant detection from short-read sequencing Erik Garrison and Gabor Marth July 16, 2012 1 Motivation While statistical phasing approaches are necessary for the determination of large-scale

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

(Write your name on every page. One point will be deducted for every page without your name!)

(Write your name on every page. One point will be deducted for every page without your name!) POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average

More information

The abundance of deleterious polymorphisms in humans

The abundance of deleterious polymorphisms in humans Genetics: Published Articles Ahead of Print, published on February 23, 2012 as 10.1534/genetics.111.137893 Note February 3, 2011 The abundance of deleterious polymorphisms in humans Sankar Subramanian

More information

The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent"

The Structure of Genealogies in the Presence of Purifying Selection: a Fitness-Class Coalescent The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent" The Harvard community has made this article openly available. Please share how this access benefits you.

More information

Population Structure

Population Structure Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Sequence evolution within populations under multiple types of mutation

Sequence evolution within populations under multiple types of mutation Proc. Natl. Acad. Sci. USA Vol. 83, pp. 427-431, January 1986 Genetics Sequence evolution within populations under multiple types of mutation (transposable elements/deleterious selection/phylogenies) G.

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Population Genetics II (Selection + Haplotype analyses)

Population Genetics II (Selection + Haplotype analyses) 26 th Oct 2015 Poulation Genetics II (Selection + Halotye analyses) Gurinder Singh Mickey twal Center for Quantitative iology Natural Selection Model (Molecular Evolution) llele frequency Embryos Selection

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution Wright-Fisher Models, Approximations, and Minimum Increments of Evolution William H. Press The University of Texas at Austin January 10, 2011 1 Introduction Wright-Fisher models [1] are idealized models

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants Genetics: Published Articles Ahead of Print, published on September 1, 2006 as 10.1534/genetics.106.061432 Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Kai Zeng,*

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

1 Introduction. Abstract

1 Introduction. Abstract CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement. Temporal Trails of Natural Selection in Human Mitogenomes Author Sankarasubramanian, Sankar Published 2009 Journal Title Molecular Biology and Evolution DOI https://doi.org/10.1093/molbev/msp005 Copyright

More information

Lecture 13: Population Structure. October 8, 2012

Lecture 13: Population Structure. October 8, 2012 Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The

More information

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Coalescent based demographic inference. Daniel Wegmann University of Fribourg Coalescent based demographic inference Daniel Wegmann University of Fribourg Introduction The current genetic diversity is the outcome of past evolutionary processes. Hence, we can use genetic diversity

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50 LECTURE #10 A. The Hardy-Weinberg Equilibrium 1. From the definitions of p and q, and of p 2, 2pq, and q 2, an equilibrium is indicated (p + q) 2 = p 2 + 2pq + q 2 : if p and q remain constant, and if

More information

Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Research article.

Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Research article. Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis David Bryant,*,1 Remco Bouckaert, 2 Joseph Felsenstein, 3 Noah A. Rosenberg, 4 and Arindam

More information

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation

A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.

More information

Inference of mutation parameters and selective constraint in mammalian. coding sequences by approximate Bayesian computation

Inference of mutation parameters and selective constraint in mammalian. coding sequences by approximate Bayesian computation Genetics: Published Articles Ahead of Print, published on February 14, 2011 as 10.1534/genetics.110.124073 Inference of mutation parameters and selective constraint in mammalian coding sequences by approximate

More information