Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
|
|
- Lora Fox
- 5 years ago
- Views:
Transcription
1 Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either very deleterious or have selective effect s such that the fitness of a heterozygote is 1 s relative to the wildtype, and a homozygous individual has fitness 1 2s 1. As in most population genetic models, we can only estimate the product of the selection coefficient and the population size! = 2N e s ). The strength of purifying selection against very deleterious mutations is captured in the non-lethal non-synonymous mutation rate parameter! r = 2N e µ f 0 where N e is the effective population size, µ is the mutation rate, and f 0 is the fraction of mutations that are non-lethal. Synonymous sites are assumed to evolve neutrally with mutation rate! s = 2N e µ. We assume the mutation rate parameters among genes are independent of one another to capture the fact that some genes are subject to very strong purifying selection while others may experience relatively weak purifying selection. Likewise, we make no assumption about how neutral mutation rate varies from gene to gene. Using the results of Sawyer and Hartl 1992) 1, we model the cell entries in the McDonald- Kreitman tables 2 for a given gene as Poisson random variables with the following expected values: Fixed Segregating Silent! s " 1 m 1 n)1 # & 1 $ % n'! s * i 2 Replacement! r " G,m) G,n))! 1) e )2 r F,n) i=1 1.1) where F!,n) and G!,n) are integrals over of the distribution of mutation frequencies that depend on the selection coefficient for the gene and the number of sequences sampled from species 1 n) and species 2 m) see Sawyer and Hartl, 1992):
2 Bustamante et al., Supplementary Nature Manuscript # 2 out of 9 F!,n) = # # " x n " 1 " x) n ) x1" x) G!,n) = x n"1 1 " x) "1 0 "2! 1" x) 1 " e dx 1" e "2! "2! 1" x) 1" e dx 1 " e "2! 1.2) Equation 1.1) assume that polymorphism data is only being modelled for one species as is the case for our study. The parameter! is the number of generations since species divergence divided by twice the effective population size. We treat this quantity as constant among genes and estimate it using all of the data. Since the above equations are only strictly true under the assumption of independence among sites and a population of constant size, we use simulations to gauge the accuracy and robustness of our results to model misspecification. We write down the joint probability of the MK data given the selection coefficient and mutation rate parameters for each of the G genes in our sample and time since species divergence as the product of the individual entries in the MK tables for all genes: G Pr{P,D!,",#} = % % Pr{D c,i! c,i,#," c,i }Pr{P c,i! c,i," c,i } 1.3) i=1 c${n,s} where P c,i is the number SNPs and D c,i is the number of fixed difference of type c either synonymous or non-synonymous) in gene i. We treat all synonymous sites as neutral! S,i = 0 for all i) and obtain the probabilities on the right hand side of 1.3) directly from the Poisson distribution with mean given by the corresponding entry in 1.1). To obtain the posterior distribution on! i conditional on the species divergence time, we use a Normal prior with mean 0 and standard deviation of! = 8 such that
3 Bustamante et al., Supplementary Nature Manuscript # 3 out of 9 & ) F! j,n) Pr{! i P N,i, D N,i,"} # 2! $ i F! j,n) i " G! 1% e %2! i i,m) G! i,n) ' )* & 2! i 1% e %2! i 2! $ i F! j,n) i ' 1% e %2! i & 2! $ i F! j,n) i ' 1% e %2! i ) " G! i,m) G! i,n))* " G! i,m) G! i,n)) ) " G! i,m) G! i,n))*, i D N,i, i e %! i P N,i 1.4) where! i and! i are parameters of a Gamma prior distribution on the mutation rate for the locus in practice we set these to 0.01 for all genes, which makes them uninformative). The first two terms on the right-hand side of expression 1.4) represent the conditional probability given " of observing P N,i non-synonymous polymorphisms and D N,i non-synonymous fixed differences, respectively; the third term comes from the prior distribution on the mutation rate, a parameter which has been integrated out of the posterior distribution, and the fourth term is the prior distribution of ". In order to classify individual loci as positively or negatively selected, we will focus on quantifying the posterior probability for a given gene that it s selection coefficient is greater or less) than 0 given the observed data for the gene P i = Pr{! i > 0 P N,i, D N,i }). If P is greater than 97.5%, this is mathematically equivalent to saying that the 95% highest posterior density credibility intervals Bayesian confidence interval) for the selection coefficient are above 0 and we classify such genes as positively selected. Likewise, if P! is greater than 97.5% for a given locus, this is equivalent to the corresponding 95% CI being completely below 0 and we classify these as negatively selected. We will estimate this quantity using the usual Monte Carlo estimator: P i = Pr{! i > 0 P N,i, D N,i } " 1 # M I! m) > 0) i 1.5) M m=1
4 Bustamante et al., Supplementary Nature Manuscript # 4 out of 9 where I!) is the indicator function which takes on the value 1 if the argument is true and 0, otherwise, and! i m) is the value of! i at step m in a Markov Chain Monte Carlo algorithm. All posterior probabilities reported here are from 50,000 retained draws from 10 chains each of length 50,000 steps sampled using the Markov Chain Monte Carlo algorithm and convergence criteria previously described with the modification that the genomic distribution of selective effects is not updated 3-5. This simplification is made so that the marginal posterior distributions of the selection coefficient are conditionally independent of one another and can be pooled for further analysis in terms of molecular function and biological process. Simulations A potential concern is the robustness of our analysis to deviations from the assumptions of the Sawyer and Hartl Poisson Random Field model used to analyze the data. That is, could nonstandard demography produce genomic patterns of variation that we may misinterpret as signatures of selection? To address this issue, we have simulated data using standard coalescent algorithms as implemented in the computer program ms under complete linkage within genes and three neutral demographic scenarios 6. For all simulations, we used 10,000 replicates with 79 chromosomes. This mimics the sampling structure of the Celera data: 38 African-American and 40 European American with 1 chromosome representing the chimpanzee sequences used where chimp SNPs were excluded from the analysis. We assumed a mutation rate of! = 2 with half of the neutral mutations as synonymous and half non-synonymous. This parameter was chosen since close to half of the SNPs in our data are non-synonymous and half are synonymous see Figure1A). Our choice of mutation rate is twice the average estimate of the mutation rate, and makes our results conservative, since the smaller the mutation rate, the better the Poisson approximation to the cell entries. For all models, we used a human-chimpanzee divergence of! = 10. The demographic models considered are:
5 Bustamante et al., Supplementary Nature Manuscript # 5 out of 9 a) Panmixia among humans all 78 chromosomes from a randomly mating population) with constant size i.e., the standard neutral model). b) Population structure model A: 40 European American chromosomes drawn from one population and 38 African-American drawn from another, with a migration rate of M = 4N e m = 1 per generation. The European-American population undergoes a population contraction backwards in time at time 0.1* 2N e generations back in time thousand years ago) of 90% while the African-American population has a 50% reduction. The two human populations are then joined at time 0.25 in units of 2N e generations ~ K years ago). The human and chimpanzee population are joined at time 10 ~5 million years ago). c) Population structure model B: same as above except twice the migration rate. In figure 1B we report the distribution of Posterior probabilities that the selection coefficient for a gene is above 0 for each of the three models considered here as well as for the Celera data. It is important to keep in mind that posterior probabilities are not the same as P- values, so there is no theoretical reason for them to follow a uniform distribution as would be the case for P-values if the null hypothesis is true). The Celera data has a clear excess of genes with high and low posterior probabilities i.e., too many in the <1%, 1-5%, and >99% categories) regardless of which demographic model is used as the null. The signature is particularly strong for negative selection this may be partly due to power). Note, in this figure, we have conditioned as in the data on using only loci with at least 4 variable amino acids in the alignment. Model Diagnostics In Figure 1S, we summarize the posterior mean of the selection parameter! = E2Ns Data) for genes with at least two variable amino acid sites in the human-chimp alignments as a function of six aspects of the data *all correlations are based the square-root
6 Bustamante et al., Supplementary Nature Manuscript # 6 out of 9 transformation of the raw data). We see that d S, the per synonymous site species substitution rate, is slightly correlated with the posterior mean of the selection coefficient r = ± 0.035; P < 10!3 ). This may be explained by the fact that! is strongly positively correlated with d N, the non-synonymous species substitution rate r = ± 0.021; P < 10!16 ) and that these quantities are, themselves, correlated, r = ± 0.034; P < 10!7 ). The former correlation between! and d N is expected, since the rate of amino acid substitution should increase with the strength of selection ". The latter correlation between d N and d S has been previously documented for samples of size n = 1 from each species across a variety of methods. We also observe a significant moderate negative correlation between! and p S r =!0.139 ± 0.034; P < 10!14 ) and a strong negative correlation between! with p N r =!0.665 ± 0.021; P < 10!16 ). The latter correlation is not expected, but can be explained by a consideration of power. That is, mkprf relies on the ratio of replacement divergence to replacement polymorphism to estimate "!! if a gene has low levels of amino acid polymorphism and high levels of amino acid divergence, then this is consistent with strong positive selection and low mutation rate. This signal will be amplified if a gene has experienced very recent positive selection, since genetic hitchhiking will reduce amino acid polymorphism. Likewise, we observe a positive correlation between the d N / d S ratio and the posterior mean of the selection coefficient r = ± 0.028; P < 10!16 ) and a negative correlation between p N / p S and! r =!0.282 ± 0.032; P < 10!16 ). This illustrates one important aspect of our analysis that differs from previous work 7 namely, that we can detect evidence for positive selection in the presence of selective constraint. Our power to detect selection is dependent on the observed cell entries in the McDonald- Kreitman table. Since genes of longer length will, generally, have more mutations and, thus, more variation per gene, we were concerned that the effects we observe could be due to spurious correlation. This might occur, for example, if longer genes have more amino acid polymorphism
7 Bustamante et al., Supplementary Nature Manuscript # 7 out of 9 and are, thus, overrepresented in the set of negatively selected genes. In order to assess this issue, we plotted the distribution of the log-odds posterior of negative selection P! < 0 Data) log as a function of the length of the aligned human-chimpanzee coding regions P! > 0 Data) see Figure 2S). There appears to be little or no correlation; therefore, differences in length among genes of different molecular functions and biological processes contributes little, if anything, to the discrepancy in the proportion of genes we classify as positively or negatively selected. Posterior Distribution of Human-Chimpanzee Species Divergence time. As part of our analysis, we also obtain a very precise estimate of the scaled humanchimpanzee species divergence,!. Based on 50,000 retained draws of our MCMC algorithm we obtain a posterior mean of 9.57 in units of 2N e generations assuming the human, chimpanzee, and ancestral populations are of roughly equal size) with 95% credibility intervals of 9.37, 9.77). Using a human/chimp generation time of 25 years and a long-term effective population size of 10,000, this corresponds to 4.78 million years ago. Our 95% confidence intervals holding generation time and 2N e fixed is a narrow range of mya to mya. Given our uncertainty in the long-term effective population size of humans and chimpanzees as well as variation in generation time, we have surely overestimated our confidence in the credibility interval of the divergence time. However, the likelihood function in our model is only dependent on the scaled time, which we have estimated with high precision. Counting synonymous and non-synonymous sites The number of synonymous and non-synonymous sites per gene have been counted using the underlying nucleotide context dependent mutation rates found by Hwang and Green 8, which assumes that the mutation rate from one nucleotide to another is dependent on the site s two
8 Bustamante et al., Supplementary Nature Manuscript # 8 out of 9 flanking nucleotides. This method is able to account for many mutation biases such as the hypermutability of C p G dinucleotides, transition/transversion biases, as well as many other subtle effects. Calculation of the total number of non-synonymous or synonymous) sites in a gene is then performed by summing over the mutation rates at each site that would or would not) result in an amino acid change relative to the overall mutability of the site. Missing or ambiguous data in the human-chimp alignment, as well as changes to and from stop codons were excluded. References 1. Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, ). 2. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, ). 3. Barrier, M., Bustamante, C. D., Yu, J. & Purugganan, M. D. Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics 163, ). 4. Gilad, Y., Bustamante, C. D., Lancet, D. & Paabo, S. Natural selection on the olfactory receptor gene family in humans and chimpanzees. Am J Hum Genet 73, ). 5. Bustamante, C. D. et al. The cost of inbreeding in Arabidopsis. Nature 416, ). 6. Hudson, R. R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, ). 7. Clark, A. G. et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302, ).
9 Bustamante et al., Supplementary Nature Manuscript # 9 out of 9 8. Hwang, D. G. & Green, P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci U S A 101, ).
7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationLecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based
More informationThe genomic rate of adaptive evolution
Review TRENDS in Ecology and Evolution Vol.xxx No.x Full text provided by The genomic rate of adaptive evolution Adam Eyre-Walker National Evolutionary Synthesis Center, Durham, NC 27705, USA Centre for
More informationDrosophila melanogaster and D. simulans, two fruit fly species that are nearly
Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different
More informationFitness landscapes and seascapes
Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationLETTERS. Natural selection on protein-coding genes in the human genome
Vol 437 20 October 2005 doi:10.1038/nature04240 Natural selection on protein-coding genes in the human genome Carlos D. Bustamante 1, Adi Fledel-Alon 1, Scott Williamson 1, Rasmus Nielsen 1,2, Melissa
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More information122 9 NEUTRALITY TESTS
122 9 NEUTRALITY TESTS 9 Neutrality Tests Up to now, we calculated different things from various models and compared our findings with data. But to be able to state, with some quantifiable certainty, that
More informationQ1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.
OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationSEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION
Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,
More informationSWEEPFINDER2: Increased sensitivity, robustness, and flexibility
SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania
More informationLecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.
Lecture 8 - Selection and Tests of Neutrality Gibson and Muse, chapter 5 Nei and Kumar, chapter 2.6 p. 258-264 Hartl, chapter 3, p. 22-27 The Usefulness of Theta Under evolution by genetic drift (i.e.,
More informationUsing Molecular Data to Detect Selection: Signatures From Multiple Historical Events
9 Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events Model selection is a process of seeking the least inadequate model from a predefined set, all of which may be grossly
More informationSolutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin
Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele
More informationSupporting information for Demographic history and rare allele sharing among human populations.
Supporting information for Demographic history and rare allele sharing among human populations. Simon Gravel, Brenna M. Henn, Ryan N. Gutenkunst, mit R. Indap, Gabor T. Marth, ndrew G. Clark, The 1 Genomes
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationEstimating selection on non-synonymous mutations. Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh,
Genetics: Published Articles Ahead of Print, published on November 19, 2005 as 10.1534/genetics.105.047217 Estimating selection on non-synonymous mutations Laurence Loewe 1, Brian Charlesworth, Carolina
More informationUsing Molecular Data to Detect Selection: Signatures From Multiple Historical Events
10 Using Molecular Data to Detect Selection: Signatures From Multiple Historical Events Model selection is a process of seeking the least inadequate model from a predefined set, all of which may be grossly
More informationSupplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles
Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To
More informationNeutral behavior of shared polymorphism
Proc. Natl. Acad. Sci. USA Vol. 94, pp. 7730 7734, July 1997 Colloquium Paper This paper was presented at a colloquium entitled Genetics and the Origin of Species, organized by Francisco J. Ayala (Co-chair)
More informationGene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009
Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?
More informationEffective population size and patterns of molecular evolution and variation
FunDamental concepts in genetics Effective population size and patterns of molecular evolution and variation Brian Charlesworth Abstract The effective size of a population,, determines the rate of change
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationFrequency Spectra and Inference in Population Genetics
Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient
More informationGene expression differences in human and chimpanzee cerebral cortex
Evolution of the human genome by natural selection What you will learn in this lecture (1) What are the human genome and positive selection? (2) How do we analyze positive selection? (3) How is positive
More informationThe Wright-Fisher Model and Genetic Drift
The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population
More informationMathematical models in population genetics II
Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population
More informationSelection and Population Genetics
Selection and Population Genetics Evolution by natural selection can occur when three conditions are satisfied: Variation within populations - individuals have different traits (phenotypes). height and
More informationStatistical Tests for Detecting Positive Selection by Utilizing High. Frequency SNPs
Statistical Tests for Detecting Positive Selection by Utilizing High Frequency SNPs Kai Zeng *, Suhua Shi Yunxin Fu, Chung-I Wu * * Department of Ecology and Evolution, University of Chicago, Chicago,
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationPopulation Genetics I. Bio
Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn
More informationBayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies
Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development
More informationLecture Notes: BIOL2007 Molecular Evolution
Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits
More informationGene regulation: From biophysics to evolutionary genetics
Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)
More informationPhylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X
More informationI of a gene sampled from a randomly mating popdation,
Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University
More informationProcesses of Evolution
15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection
More informationDiffusion Models in Population Genetics
Diffusion Models in Population Genetics Laura Kubatko kubatko.2@osu.edu MBI Workshop on Spatially-varying stochastic differential equations, with application to the biological sciences July 10, 2015 Laura
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationNeutral Theory of Molecular Evolution
Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationEstimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA
Estimating the Distribution of Selection Coefficients from Phylogenetic Data with Applications to Mitochondrial and Viral DNA Rasmus Nielsen* and Ziheng Yang *Department of Biometrics, Cornell University;
More informationSupporting Information
Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of
More informationChallenges when applying stochastic models to reconstruct the demographic history of populations.
Challenges when applying stochastic models to reconstruct the demographic history of populations. Willy Rodríguez Institut de Mathématiques de Toulouse October 11, 2017 Outline 1 Introduction 2 Inverse
More informationIt has been more than 25 years since Lewontin
Population Genetics of Polymorphism and Divergence Stanley A. Sawyer, and Daniel L. Hartl Department of Mathematics, Washington University, St. Louis, Missouri 6313, Department of Genetics, Washington
More informationFebuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution. Classical vs. balanced views of genome structure
Febuary 1 st, 2010 Bioe 109 Winter 2010 Lecture 11 Molecular evolution Classical vs. balanced views of genome structure - the proposal of the neutral theory by Kimura in 1968 led to the so-called neutralist-selectionist
More informationEvolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim
Evolu&on, Popula&on Gene&cs, and Natural Selec&on 02-710 Computa.onal Genomics Seyoung Kim Phylogeny of Mammals Phylogene&cs vs. Popula&on Gene&cs Phylogene.cs Assumes a single correct species phylogeny
More informationSupporting Information Text S1
Supporting Information Text S1 List of Supplementary Figures S1 The fraction of SNPs s where there is an excess of Neandertal derived alleles n over Denisova derived alleles d as a function of the derived
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationInferring Speciation Times under an Episodic Molecular Clock
Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular
More informationGenetic Variation in Finite Populations
Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationNatural selection on the molecular level
Natural selection on the molecular level Fundamentals of molecular evolution How DNA and protein sequences evolve? Genetic variability in evolution } Mutations } forming novel alleles } Inversions } change
More informationRobust demographic inference from genomic and SNP data
Robust demographic inference from genomic and SNP data Laurent Excoffier Isabelle Duperret, Emilia Huerta-Sanchez, Matthieu Foll, Vitor Sousa, Isabel Alves Computational and Molecular Population Genetics
More informationSupplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss
Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationDivergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law
Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Ze Zhang,* Z. W. Luo,* Hirohisa Kishino,à and Mike J. Kearsey *School of Biosciences, University of Birmingham,
More informationIntroduction to population genetics & evolution
Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationIntroduction to Advanced Population Genetics
Introduction to Advanced Population Genetics Learning Objectives Describe the basic model of human evolutionary history Describe the key evolutionary forces How demography can influence the site frequency
More informationThe neutral theory of molecular evolution
The neutral theory of molecular evolution Introduction I didn t make a big deal of it in what we just went over, but in deriving the Jukes-Cantor equation I used the phrase substitution rate instead of
More informationHidden Markov models in population genetics and evolutionary biology
Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models
More information6 Introduction to Population Genetics
Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,
More informationA Bayesian Approach to Phylogenetics
A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte
More informationHaplotype-based variant detection from short-read sequencing
Haplotype-based variant detection from short-read sequencing Erik Garrison and Gabor Marth July 16, 2012 1 Motivation While statistical phasing approaches are necessary for the determination of large-scale
More informationQTL model selection: key players
Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:
More information(Write your name on every page. One point will be deducted for every page without your name!)
POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average
More informationThe abundance of deleterious polymorphisms in humans
Genetics: Published Articles Ahead of Print, published on February 23, 2012 as 10.1534/genetics.111.137893 Note February 3, 2011 The abundance of deleterious polymorphisms in humans Sankar Subramanian
More informationThe Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent"
The Structure of Genealogies in the Presence of Purifying Selection: a "Fitness-Class Coalescent" The Harvard community has made this article openly available. Please share how this access benefits you.
More informationPopulation Structure
Ch 4: Population Subdivision Population Structure v most natural populations exist across a landscape (or seascape) that is more or less divided into areas of suitable habitat v to the extent that populations
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationHow robust are the predictions of the W-F Model?
How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationSequence evolution within populations under multiple types of mutation
Proc. Natl. Acad. Sci. USA Vol. 83, pp. 427-431, January 1986 Genetics Sequence evolution within populations under multiple types of mutation (transposable elements/deleterious selection/phylogenies) G.
More information6 Introduction to Population Genetics
70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationPopulation Genetics II (Selection + Haplotype analyses)
26 th Oct 2015 Poulation Genetics II (Selection + Halotye analyses) Gurinder Singh Mickey twal Center for Quantitative iology Natural Selection Model (Molecular Evolution) llele frequency Embryos Selection
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationWright-Fisher Models, Approximations, and Minimum Increments of Evolution
Wright-Fisher Models, Approximations, and Minimum Increments of Evolution William H. Press The University of Texas at Austin January 10, 2011 1 Introduction Wright-Fisher models [1] are idealized models
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationStatistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants
Genetics: Published Articles Ahead of Print, published on September 1, 2006 as 10.1534/genetics.106.061432 Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Kai Zeng,*
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More information1 Introduction. Abstract
CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationTemporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.
Temporal Trails of Natural Selection in Human Mitogenomes Author Sankarasubramanian, Sankar Published 2009 Journal Title Molecular Biology and Evolution DOI https://doi.org/10.1093/molbev/msp005 Copyright
More informationLecture 13: Population Structure. October 8, 2012
Lecture 13: Population Structure October 8, 2012 Last Time Effective population size calculations Historical importance of drift: shifting balance or noise? Population structure Today Course feedback The
More informationCoalescent based demographic inference. Daniel Wegmann University of Fribourg
Coalescent based demographic inference Daniel Wegmann University of Fribourg Introduction The current genetic diversity is the outcome of past evolutionary processes. Hence, we can use genetic diversity
More information7.36/7.91 recitation CB Lecture #4
7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer
More informationLECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50
LECTURE #10 A. The Hardy-Weinberg Equilibrium 1. From the definitions of p and q, and of p 2, 2pq, and q 2, an equilibrium is indicated (p + q) 2 = p 2 + 2pq + q 2 : if p and q remain constant, and if
More informationInferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis. Research article.
Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis David Bryant,*,1 Remco Bouckaert, 2 Joseph Felsenstein, 3 Noah A. Rosenberg, 4 and Arindam
More informationA consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation
Ann. Hum. Genet., Lond. (1975), 39, 141 Printed in Great Britain 141 A consideration of the chi-square test of Hardy-Weinberg equilibrium in a non-multinomial situation BY CHARLES F. SING AND EDWARD D.
More informationInference of mutation parameters and selective constraint in mammalian. coding sequences by approximate Bayesian computation
Genetics: Published Articles Ahead of Print, published on February 14, 2011 as 10.1534/genetics.110.124073 Inference of mutation parameters and selective constraint in mammalian coding sequences by approximate
More information