The Wright-Fisher Model and Genetic Drift

Similar documents
How robust are the predictions of the W-F Model?

URN MODELS: the Ewens Sampling Lemma

Genetic Variation in Finite Populations

Introduction to population genetics & evolution

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Computational Systems Biology: Biology X

STAT 536: Genetic Statistics

Demography April 10, 2015

8. Genetic Diversity

Problems for 3505 (2011)

Population Genetics I. Bio

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Processes of Evolution

Evolution in a spatial continuum

Segregation versus mitotic recombination APPENDIX

Frequency Spectra and Inference in Population Genetics

Population Genetics: a tutorial

19. Genetic Drift. The biological context. There are four basic consequences of genetic drift:

6 Introduction to Population Genetics

Microevolution Changing Allele Frequencies

(Write your name on every page. One point will be deducted for every page without your name!)

Outline of lectures 3-6

Outline of lectures 3-6

Mathematical Population Genetics II

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Mathematical models in population genetics II

6 Introduction to Population Genetics

Mathematical Population Genetics II

Population genetics snippets for genepop

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

A. Correct! Genetically a female is XX, and has 22 pairs of autosomes.

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Evolutionary Theory. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Mathematical Population Genetics. Introduction to the Stochastic Theory. Lecture Notes. Guanajuato, March Warren J Ewens

Darwinian Selection. Chapter 6 Natural Selection Basics 3/25/13. v evolution vs. natural selection? v evolution. v natural selection

LECTURE NOTES: Discrete time Markov Chains (2/24/03; SG)

Breeding Values and Inbreeding. Breeding Values and Inbreeding

The Genetics of Natural Selection

Population Structure

NEUTRAL EVOLUTION IN ONE- AND TWO-LOCUS SYSTEMS

1.3 Forward Kolmogorov equation

Population Genetics 7: Genetic Drift

Notes on Population Genetics

Selection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION

Notes for MCTP Week 2, 2014

It all depends on barriers that prevent members of two species from producing viable, fertile hybrids.

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

BIRS workshop Sept 6 11, 2009

Outline of lectures 3-6

Statistical population genetics

Ch 11.Introduction to Genetics.Biology.Landis

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

Introductory seminar on mathematical population genetics

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

The genomes of recombinant inbred lines

Mechanisms of Evolution

Lecture 14 Chapter 11 Biology 5865 Conservation Biology. Problems of Small Populations Population Viability Analysis

An introduction to mathematical modeling of the genealogical process of genes

2. the variants differ with respect to their expected abilities to survive and reproduce in the present environment (S 0), then

Evolutionary Genetics Midterm 2008

Mutation, Selection, Gene Flow, Genetic Drift, and Nonrandom Mating Results in Evolution

Effective population size and patterns of molecular evolution and variation

BIOL Evolution. Lecture 9

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

THEORETICAL EVOLUTIONARY GENETICS JOSEPH FELSENSTEIN

Reinforcement Unit 3 Resource Book. Meiosis and Mendel KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

NATURAL SELECTION FOR WITHIN-GENERATION VARIANCE IN OFFSPRING NUMBER JOHN H. GILLESPIE. Manuscript received September 17, 1973 ABSTRACT

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Darwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Stochastic Demography, Coalescents, and Effective Population Size

Reproduction and Evolution Practice Exam

Neutral Theory of Molecular Evolution

Inventory Model (Karlin and Taylor, Sec. 2.3)

4. Populationsgenetik

Evolution of Populations. Chapter 17

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Sexual and Asexual Reproduction. Cell Reproduction TEST Friday, 11/13

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

Chapter 11 INTRODUCTION TO GENETICS

NOTES CH 17 Evolution of. Populations

So in terms of conditional probability densities, we have by differentiating this relationship:

Heaving Toward Speciation

Wright-Fisher Models, Approximations, and Minimum Increments of Evolution

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Selection and Population Genetics

The genome encodes biology as patterns or motifs. We search the genome for biologically important patterns.

Lecture 18 : Ewens sampling formula

Computational statistics

Genetics Unit Review

Lecture 2: Introduction to Quantitative Genetics

Introduction to Wright-Fisher Simulations. Ryan Hernandez

Introduction to Genetics

Genetics and Natural Selection

5/31/2012. Speciation and macroevolution - Chapter

Statistics 246 Spring 2006

#2 How do organisms grow?

Derivation of Itô SDE and Relationship to ODE and CTMC Models

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Transcription:

The Wright-Fisher Model and Genetic Drift January 22, 2015

1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population satisfying Mendel s first law. Our results are due to G. Hardy and W. Weinberg, who independently discovered them in 1908. We make the following assumptions: 1. Non-overlapping generations 2. Infinite diploid population 3. Autosomal locus segregating alleles A 1,, A k 4. Monoecious individuals 5. Random mating 6. Fair meiosis (no meiotic drive) 7. No selection or mutation. Let p ij be the frequency of genotype A i A j in the current generation. Then the frequency of allele A i in that generation is equal to p i = p ii + 1 p ij. 2 Because the population is infinite and randomly mating, it follows that the frequency of genotype A i A j in the next generation is equal to { p p 2 ij = i if j = i 2p i p j if j i Furthermore, the frequency of allele A i in this generation is p i = p ii + 1 2 j i = p 2 i + j i = p i, which shows that the allele frequencies do not change between generations. Since the genotype frequencies in the next generation are calculated in exactly the same way, it follows that these also will remain unchanged in all future generations. This is why this result is described as an equilibrium. j i p ij p i p j 2 The Wright-Fisher Model 2.1 Description and Motivation In fact, allele and genotype frequencies in real populations do change over time and are affected by several processes that were neglected in our derivation of HWE. Our focus in this section is on the effects of demographic stochasticity in finite populations. To this end, we introduce the Wright-Fisher model, first for a haploid population. The assumptions of this model are as follows:

2 1. Non-overlapping generations 2. Constant population size: N haploid adults 3. Autosomal locus segregating alleles A and a 4. No selection or mutation 5. Each individual in generation t + 1 chooses its parent uniformly at random and with replacement from the N adults alive in generation t. The last assumption ( Wright-Fisher sampling ) is made for convenience, but can also be justified biologically if we assume that each adult in generation t gives birth to a large number of offspring (say M), but that the environment only contains enough resources for N of these to survive to adulthood. We can also modify this model so that it applies to a randomly mating population containing N monoecious, diploid adults. In this case, the fifth assumption is replaced by: Each individual in generation t + 1 chooses two chromosomes uniformly at random and with replacement from the 2N chromosomes present in generation t. This mechanism can also be justified by supposing that each adult sheds a large number of gametes, which then combine at random to give rise to N diploid offspring that survive to adulthood. In particular, selfing can occur in this model and the probability that an individual has just one parent is 1/N. Although simplistic, the Wright-Fisher model captures many of the key features of genetic drift and has played an important role in the development of population genetics. Here we are interested in two aspects of its behavior: Short-term fluctuations in allele frequencies; Asymptotic properties of the model. 2.2 Short-term Behavior To describe the short-term fluctuations in the allele frequencies, let us write X t for the number of copies of allele A in generation t. Then X t is a random variable that takes values in the set {0,, 2N} and, conditional on X t = i, the distribution of X t+1 is binomial with parameters 2N and i/2n, i.e., ( ) ( ) 2N i j ( P (X t+1 = j X t = i) = 1 i ) 2N j. j 2N 2N Since we are usually interested in allele frequencies rather than in counts, we will also define p t = X t /2N to be the frequency of A in generation t. Using the properties of the binomial distribution, we see that E [p t+1 p t = p] = p Var (p t+1 p t = p) = p(1 p) 2N. In other words, while the average allele frequencies will remain constant from generation to generation (reflecting the absence of selection and mutation), the actually allele frequencies will

3 change at a rate that is inversely proportional to population size. These unbiased, random fluctuations in the genetic composition of the population are known as genetic drift, which the Wright-Fisher model suggests is stronger in smaller populations. The Wright-Fisher model is an example of a discrete-time Markov chain. Indeed, from the description of the model it is clear that if we know the allele frequencies in the current generation, then those in the future will depend only on the present and not on generations further in the past. More formally, for any integers i, j, x 0,, x t 1 {0,, N}, we have P (X t+1 = j X t = i, X t 1 = x t 1,, X 0 = x 0 ) = P (X t+1 = j X t = i), which tells us that given the present, the future is conditionally independent of the past. This property, which characterizes Markov chains in general, is key to the analysis of the Wright-Fisher model. To explore the effects of genetic drift on genetic variation, we will define the heterozygosity (or allelic diversity) of the population in generation t to be the quantity Ht 0 = 2X ( ) t(2n X t ) 2N = 2p t (1 p t ). 2N(2N 1) 2N 1 As is clear from the first definition, the heterozygosity is equal to the probability of sampling both alleles when we sample two chromosomes at random and without replacement from the population. Also, by exploiting the Markov property of the Wright-Fisher model and the moment equations given above, we obtain the following classical result. Theorem 1. Under the Wright-Fisher model, the expected heterozygosity h(t) = E[H 0 t ] decreases geometrically at rate (1 1/2N): h(t) = ( 1 1 ) t h(0). 2N While this result can be deduced directly from the moment equations, we provide an alternative proof that foreshadows the intimate relationship between the Wright-Fisher model and the coalescent. Proof. By definition, h(t) is the probability that two chromosomes, sampled at random and without replacement from the population in generation t, carry different alleles. Since we have not incorporated mutation into the model, this event will only occur if the two chromosomes have distinct ancestors in generation t 1 and if these ancestral chromosomes carried different alleles. Under the Wright-Fisher model, the probability that the two chromosomes have distinct ancestors in the previous generation is 1 1/2N, in which case, because the ancestral chromosomes are chosen uniformly at random and without replacement from generation t 1, the probability that they carry different alleles is h(t 1). Combining these two results, we obtain the recursion h(t) = ( 1 1 2N ) h(t 1), and the main result follows by continuing this argument back to generation 0. This shows that, on average, genetic variation tends to be reduced by genetic drift and that the rate of loss of variation due to drift is greater in smaller populations than in larger populations. Furthermore, notice that as t, the expected heterozygosity decreases to 0. This suggests that in the long-term all genetic variation will be lost under the Wright-Fisher model.

4 2.3 Asymptotic Behavior To characterize the asymptotic behavior of the Wright-Fisher model, define the time to fixation τ to be the random variable τ = min{t 0 : p t = 0 and p t = 1}. Notice that if p t = 1, then the population only contains A alleles, while if p t = 0, then the population only contains a alleles. In either case, the population has lost all genetic variation and we say that the surviving allele has been fixed in the population. Because our model lacks a mechanism to reintroduce alleles that have been lost from the population (i.e., no mutation or migration), once p t = 0 or p t = 1, the process remains in this state at all future times. For this reason, 0 and 1 are said to be absorbing states for the Wright-Fisher model. We would like to know whether this is certain to occur, i.e., whether P(τ < ) = 1, and, if so, how the probability that A is fixed in the population depends on its initial frequency. To show that fixation is certain to occur in finite time, notice that no matter how many copies of A are present in the population in the current generation, there is always a positive probability that either A or a will be fixed in the next population. Let α be the minimum of these probabilities, i.e., α = min P (X t+1 {0, 2N} X t = i) > 0 0 i 2N and notice that α does not depend on t. In particular, for any initial frequency p, we have P p (τ > 1) (1 α), since the probability that we have fixation within the first generation is at least α. Furthermore, by the Markov property, P p (τ > 2) = P(τ > 2 τ > 1) P(τ > 1) (1 α) (1 α) = (1 α) 2, since the probability that fixation occurs during the second generation given that it didn t occur during the first generation is also at least (1 α). In general, for any t > 0 and any initial frequency p, we have P p (τ > t) (1 α) t 0 as t. However, since {τ = } {τ > t} for every t > 0, this shows that P p (τ = ) = 0 and so we can conclude that the time to fixation is almost surely finite: P p (τ < ) = 1. We next consider how the fixation probability of allele A depends on its initial frequency. To this end, define u(p) = P p {p τ = 1}. Before stating our main result, we will first prove a useful lemma concerning expectations on events. Lemma 1. Let X be a bounded random variable, i.e., for some real number M <, P( X M) = 1, and let A be an event. Then E[X; A] M P(A). Proof. Let 1 A be the indicator variable for the event A, i.e., 1 A = 1 if A occurs and 1 A = 0 if A does not occur. Then 1 A is a Bernoulli random variable with parameter p = P(A) = E[1 A ] and the expectation of X on the event A is defined as E[X; A] = E[X 1 A ].

5 Since X M almost surely and 1 A is non-negative, it follows that E[X; A] = E[X 1 A ] E[ X 1 A ] = E[ X 1 A ] M E[1 A ] = M P(A). Theorem 2. Under the Wright-Fisher model, the fixation probability of an allele is equal to its initial frequency, i.e., u(p) = p. Proof. To prove this result, notice that if τ t, then because 0 and 1 are both absorbing states for the Wright-Fisher model, p t = p τ. Consequently, for any fixed t <, E p [p τ ] = E p [p τ ; τ t] + E p [p τ ; τ > t] = E p [p t ; τ t] + E p [p τ ; τ > t] = E p [p t ] E p [p t ; τ > t] + E p [p τ ; τ > t] = p E p [p t ; τ > t] + E p [p τ ; τ > t]. However, since p t [0, 1], the last two terms on the right-hand side can each be bounded by the probability P p (τ > t), which we know tends to 0 as t. Taking this limit shows that the left-hand side is equal to p, which completes the proof. As we will later see, this result holds quite generally in neutral population genetical models and has even been taken as a abstract definition of neutrality. 2.4 Simulation The Wright-Fisher model can be simulated in several ways. The most straightforward (but not the most efficient) approach is to keep track of individual genotypes in each generation and to randomly sample the parent of each individual from the previous generation. Alternatively, if we are only interested in the dynamics of the allele frequencies, then it suffices to just record these and to generate a binomially-distributed random variable X t+1 Binomial(2N, p t ) and set p t+1 = X t+1 /2N. However, when N is large, say N 1000, even this approach may be too slow for many purposes. In this case, we can replace the binomial distribution by a pair of approximations based on the central limit theorem and the law of rare events. There are three cases, depending on the product m 2Np t. If m 5, then we can use the law of rare events to approximate X t+1 by a Poisson random variable with mean m. Notice that this random variable is guaranteed to be non-negative, as is required by the exact model. Alternatively, if m 2N 5 (i.e., 2N(1 p t ) 5), then we can use the law of rare events to approximate 2N X t+1 by a Poisson random variable with mean 2N m. In the third case where 5 < m < 2N 5, we can invoke the central limit theorem and approximate p t+1 by a normally-distributed random variable with mean p t and variance p t (1 p t )/2N. While the resulting simulations will not exactly reproduce the Wright-Fisher model, the accuracy of the approximations based on the law of rare events and the central limit theorem for sufficiently large N is quite good.