A coalescent model for the effect of advantageous mutations on the genealogy of a population

Size: px
Start display at page:

Download "A coalescent model for the effect of advantageous mutations on the genealogy of a population"

Transcription

1 A coalescent model for the effect of advantageous mutations on the genealogy of a population y Rick Durrett and Jason Schweinserg May 13, 5 Astract When an advantageous mutation occurs in a population, the favorale allele may spread to the entire population in a short time, an event known as a selective sweep. As a result, when we sample n individuals from a population and trace their ancestral lines ackwards in time, many lineages may coalesce almost instantaneously at the time of a selective sweep. We show that as the population size goes to infinity, this process converges to a coalescent process called a coalescent with multiple collisions. A etter approximation for finite populations can e otained using a coalescent with simultaneous multiple collisions. We also show how these coalescent approximations can e used to get insight into how eneficial mutations affect the ehavior of statistics that have een used to detect departures from the usual Kingman s coalescent. 1 Introduction Our goal in this paper is to descrie the coalescent processes that arise when we consider the genealogy of a population that is affected y repeated eneficial mutations. The starting point for this analysis will e the continuous-time population model introduced y Moran (1958). In this model, the population size is fixed at N. Each individual independently lives for a time that is exponentially distriuted with mean 1 and then is replaced y a new individual. The parent of the new individual is chosen at random from the N individuals, including the one eing replaced. Note that we can think of the population as consisting of N chromosomes of N diploid individuals, so each memer of the population has just one parent. Suppose we sample n individuals at random from this population at time zero. To descrie the genealogy of the sample, we will define the ancestral process, which will e a continuoustime Markov process (Ψ N (t),t ) whose state space is the set P n of partitions of {1,...,n}. The ancestral process descries the coalescence of lineages as we follow the ancestral lines of the sampled individuals ackwards in time. More precisely, Ψ N () is the partition of {1,...,n} into n singletons, and Ψ N (t) is the partition of {1,...,n} such that i and j are in the same lock of Ψ N (t) if and only if the ith and jth individuals in the sample have the same ancestor at time Nt. It is well-known that the process (Ψ N (t),t ) is Kingman s coalescent, a coalescent Partially supported y NSF grants from the proaility program ( and 935) and from a joint DMS/NIGMS initiative to support research in mathematical iology (137). Supported y an NSF Postdoctoral Fellowship 1

2 process introduced y Kingman (198). Kingman s coalescent is a P n -valued Markov process that starts from the partition of {1,...,n} into singletons. All transitions involve exactly two locks of the partition merging together, and each such transition occurs at rate one. Within the last decade, progress has een made on descriing the genealogy of populations in models that allow for natural selection. Krone and Neuhauser (1997) and Neuhauser and Krone (1997) studied a model in which each individual can e of type 1 or. An individual of type i produces offspring at rate λ i, with λ >λ 1 so that type is advantageous. Each new offspring replaces a randomly chosen individual from the population, and is the same type as its parent with proaility 1 u N and the opposite type with proaility u N. Under certain assumptions, they show that the genealogy of a sample from the population can e descried using what they call an ancestral selection graph. Additional work of Donnelly and Kurtz (1999) and Barton, Etheridge, and Sturm (4) has incorporated recomination as well as selection into the model. The ancestral selection graph arises in the limit as N in the case of weak selection, where the selective advantage λ /λ 1 1 and the mutation rates u N are O(1/N ). Then, as N the fraction of individuals with the favored allele can e approximated y a diffusion process. In this paper, we consider strong selection, where the selective advantage is O(1). With strong selection, when a eneficial mutation occurs, there is a positive proaility that the eneficial allele will spread to the entire population, an event known as a selective sweep. At the end of a selective sweep, the entire population has the favorale allele, and every memer of the population will trace that favorale allele ack to the individual that had the eneficial mutation that caused the selective sweep. However, the genealogy ecomes more complicated when we consider recomination. Diploid individuals usually do not inherit an identical copy of one of their parent s chromosomes. Instead, the inherited chromosome consists of pieces of each of a parent s two chromosomes. Since a chromosome is coming from two places, we need to consider the genealogy not of an entire chromosome ut of a particular site of interest on the chromosome. When a selective sweep is caused y a eneficial mutation at a site other than the site of interest, many individuals may trace their gene at the site of interest ack to the individual that had the eneficial mutation at the eginning of the selective sweep, while others may trace their gene at the site of interest to a different ancestor ecause of recomination etween the two sites on the chromosome. This effect was first studied y Maynard Smith and Haigh (1974), who called it the hitchhiking effect. As we will show, the typical duration of a selective sweep is only O(log N). Therefore, when we speed up time y a factor of N to define the ancestral process, the selective sweep takes place almost instantaneously. Consequently, if we sample n individuals some time after a selective sweep and define the ancestral process as efore, the ancestral process ehaves like Kingman s coalescent until we get ack to the time of a selective sweep. At that time, many lineages may coalesce ecause they get traced ack to the individual with the mutation that caused the selective sweep. This possiility was oserved y Gillespie (), who referred to the resulting coalescent process as the pseudohitchhiking model. We will show that if selective selective sweeps happen repeatedly throughout the history of a population at times of a Poisson process, as proposed y Gillespie (), then under suitale assumptions the ancestral processes will converge as N to a coalescent with multiple collisions, which is a P n -valued Markov process in which many locks of the partition can merge at once into a single lock. These coalescent processes were introduced y Pitman (1999) and Sagitov (1999). While coalescents with multiple collisions are the limiting coalescent processes as N,

3 an improved approximation for finite N can e otained using a coalescent with simultaneous multiple collisions. Coalescents with simultaneous multiple collisions, which were introduced y Schweinserg () and Möhle and Sagitov (1), are coalescent processes in which many locks can merge at once into a single lock, and many such mergers can occur simultaneously. They provide a etter approximation than coalescents with multiple collisions in this context ecause, as noted y Barton (1998), Durrett and Schweinserg (4a), and Schweinserg and Durrett (4), multiple groups of lineages can coalesce at the time of a selective sweep. Coalescents with multiple or simultaneous multiple collisions arise as limits of ancestral processes in populations that occasionally have very large families ecause ancestral lines that go ack to an individual with many offspring will coalesce at the same time. Coalescents with multiple collisions arise when a single large family is possile in a given generation, while coalescents with simultaneous multiple collisions arise when one generation can contain many large families. For more details, see Sagitov (1999, 3), Möhle and Sagitov (1), and Schweinserg (3). The results in this paper provide a different iological application of these coalescent processes. The rest of this paper is organized as follows. In section, we descrie our model for how the population evolves when there can e eneficial mutations. We state our main result, which is that the genealogy of this process converges to a coalescent with multiple collisions. In section 3, we present the improved approximation involving a coalescent with simultaneous multiple collisions. The next two sections are devoted to applications of these results. In section 4, we discuss how multiple mergers affect the numer of segregating sites and pairwise differences in a sample of DNA. These quantities are used in Tajima s D-statistic (see Tajima (1989)), which can e used to detect departures from the standard Kingman s coalescent. In section 5 we discuss how multiple mergers affect the numer of mutations that appear on just a single individual in the sample, which is relevant to the test proposed y Fu and Li (1993) for detecting departures from Kingman s coalescent. Our results suggest that Fu and Li s test should have less power to detect selective sweeps, at least in large samples, than Tajima s D-statistic. Finally, in section 6, we prove the convergence and approximation theorems stated in sections and 3. Convergence to a coalescent with multiple collisions In this section, we give a precise description of our model of a population that experiences eneficial mutations, and we state our main convergence theorem. We descrie what happens following a single eneficial mutation in susection.1, and we consider recurrent eneficial mutations in susection.. Then in susection.3, we state the convergence result and give some examples..1 The effect of a single eneficial mutation In this susection we descrie how the population evolves after one of the N individuals experiences a eneficial mutation. We will denote the new favorale allele y B and the other allele y. We assume the relative fitnesses of the two alleles are 1 and 1 s, so the B alleles will tend to survive longer. Immediately after the mutation, one individual has the B allele and N 1 have the allele. Kaplan, Hudson, and Langley (1989) and Stephen, Wiehe, and Lenz (199) proposed modeling the fraction of individuals p(t) with the B allele at time t y using the logistic 3

4 differential equation dp = sp(1 p). dt This approach has een popular in simulation studies. However, Durrett and Schweinserg (4a) showed that this approximation is not very accurate. Consequently, we will consider instead a modification to the Moran model that was studied y Durrett and Schweinserg (4a) and Schweinserg and Durrett (4). At one site, each chromosome has a B or allele, ut we will e interested in the genealogy at another neutral site at which all alleles have the same fitness. As in the Moran model, each individual survives for a time that is exponentially distriuted with mean 1, and then a replacement is proposed in which the parent of the proposed new individual is chosen at random from the N memers of the population. However, to account for natural selection, whenever a replacement of a B chromosome with a chromosome is proposed, the change is rejected with proaility s. Also, to incorporate recomination into the model, we say that when a new individual is orn, it inherits its alleles at oth sites from the same parent with proaility 1 r. However, with proaility r, there is recomination etween the two sites, so the new individual inherits its allele at the neutral site from its parent s other chromosome. Because we are treating an individual s two chromosomes as two separate memers of the population, we model this y saying that, with proaility r, the new individual inherits the two alleles from two ancestors chosen independently at random from the population. Suppose the eneficial mutation appears on one chromosome at time, and let X(t) e the numer of chromosomes with the favorale allele at time t. Let τ = inf{t : X(t) {, N}} e the time at which either the B or allele disappears from the population. Suppose we take a random sample of n individuals from the population at time τ. Let Θ e the partition of {1,...,n} such that i and j are in the same lock of Θ if and only if the ith and jth individuals in the sample have the same ancestor at time zero when we follow the ancestral lines associated with the neutral site of interest. The partition Θ then descries how the eneficial mutation affects the genealogy of the sample. We have the following result concerning the distriution of Θ. Here Q p,n, for p [, 1], is the distriution of a random partition Π otained as follows. First, define a sequence of independent random variales (ξ i ) n i=1 such that P (ξ i =1)=p and P (ξ i =)=1 p for i =1,...,n. Then define Π such that one lock of Π consists of {i n : ξ i =1} and the remaining locks of Π are singletons. Proposition.1. Fix n N, and fix s (, 1). Assume there is a constant C such that r C /(log N) for all N. Letα = r log(n)/s, and let p = e α. 1. There exists a positive constant C, depending continuously on s and α ut not depending on N, such that P (Θ = π X(τ) =N) Q p,n (π) C/(log N) for all π P n.. Let κ e the partition of {1,...,n} into singletons. There exists a constant C, depending continuously on s and α ut not depending on N, such that P (Θ κ and X(τ) =) CN 1/. Note that in this proposition, the selective advantage s is assumed to e fixed, ut the recomination proaility r depends on N. Part 1 of the proposition, which is a restatement of Theorem 1.1 of Schweinserg and Durrett (4), implies that as N, the distriution of Θ, 4

5 conditional on the event that a selective sweep occurs, converges to Q p,n, where p represents the approximate fraction of lineages that coalesce at the time of the selective sweep. Part of the proposition, which we prove in Section 6, shows that lineages typically do not coalesce when the favorale B allele dies out. The proaility that a selective sweep occurs, and therefore Part 1 of the proposition applies, is s/(1 (1 s) N ) (see Durrett () or Schweinserg and Durrett (4)).. A model with recurrent eneficial mutations To model a population in which eneficial mutations can occur repeatedly, we assume that eneficial mutations at different points on the chromosome occur at times of a Poisson process. The selective advantage that these mutations provide and the rate of recomination etween the site of interest and the site of the mutation will e random. When there is a eneficial mutation in the population, the population will evolve as descried in the previous susection. Between these times, the population will follow the standard Moran model. To e more precise, we will consider the chromosome to e the line segment [ L, L]. Our goal will e to descrie the genealogy of the site. For each N, the eneficial mutations will e governed y a Poisson process K N on R [ L, L] [, 1]. If (t, x, s) is a point in K N, then at time t, a mutation, which provides a selective advantage of s, will appear at location x on one of the N chromosomes. The intensity measure of K N will e λ µ N, where λ denotes Leesgue measure on R and µ N is a finite measure on [ L, L] [, 1] which governs the rates of eneficial mutations. The recomination proailities will e determined y a function r N :[ L, L] [, 1]. We assume that r N () = and r N is nonincreasing on [ L, ] and nondecreasing on [,L]. Beginning at time t, the population will evolve according to the model descried in the previous susection of a population with a eneficial allele having selective advantage s and recomination proaility r N (x). We let τ(t) denote the first time that the eneficial mutation that appears at time t either disappears from the population or is present on all N chromosomes. Let T N = {t :(t, x, s) is a point in K N for some x and s} e the times at which eneficial mutations are proposed. Note, however, that we can not define the evolution of the population as explained aove if, for some t 1,t T N, the intervals [t 1,τ(t 1 )] and [t,τ(t )] overlap. There has een some work in the iology literature on the question of how a selective sweep is affected y another selective sweep happening at the same time (see, for example, Barton (1995), Gerrish and Lenski (1998), and Kim and Stephen (3)). However, as we will show, in our model this overlap occurs too infrequently to have any affect on our results, so we avoid the issue of defining the population during periods of overlap y allowing a new eneficial mutation to occur only when there is no other eneficial mutation currently in the population. That is, eneficial mutations will occur at the times in T N = {t T N : τ(u) <tfor all u T N such that u<t}. Let I N = [t, τ(t)]. t T N A eneficial mutation will e present in the population at time u if and only if u I N. For the intervals in I N, the evolution of the population was defined in susection.1. For the times in R \I N, we will say that the population evolves according to the standard Moran model so that the evolution of the population is well-defined for all of R. To define the ancestral process Ψ N =(Ψ N (t),t ), we sample n of the N individuals at random from the population at time zero. We then define Ψ N (t) to e the partition of 5

6 {1,...,n} such that i and j are in the same lock of Ψ N (t) if and only if the ith and jth individuals in the sample got their allele at location on the chromosome from the same ancestor at time Nt. Note that we are again speeding up time y a factor of N so that, if there are no eneficial mutations (i.e. if µ N is the zero measure), the ancestral process Ψ N =(Ψ N (t),t ) is Kingman s coalescent. When we do have eneficial mutations, the ancestral processes will converge as N, under suitale conditions, to a coalescent with multiple collisions..3 The main convergence theorem and examples Pitman (1999) introduced coalescents with multiple collisions, in which many locks of the partition can merge into one. These coalescent processes are in one-to-one correspondence with finite measures Λ on [, 1], and the coalescent process associated with a particular measure Λ is called the Λ-coalescent. We will consider here only P n -valued coalescents ecause they are what we will need to approximate the genealogy of a sample of size n. However, the constructions can e extended, using Kolmogorov s Extension Theorem, to yield coalescent processes that take their values in the set of partitions of N = {1,,...}. Suppose (Π n (t),t ) is the P n -valued Λ-coalescent. Then Π n () is the partition of {1,...,n} into singletons. If Π n (t) has locks, then every possile transition involves merging k of the locks into one, where k. Denoting the rate of this transition y λ,k, we have λ,k = 1 x k (1 x) k Λ(dx). (.1) If Λ = δ, where δ denotes a unit mass at zero, then every transition that involves two locks merging into one happens at rate one, and no other transitions are possile. Thus, the δ - coalescent is Kingman s coalescent. The theorem elow states that when we do have eneficial mutations, the ancestral processes converge as N, under suitale conditions, to a coalescent with multiple collisions. The multiple mergers happen at times of selective sweeps. Note that the convergence is in the sense of finite-dimensional distriutions. Convergence in the stronger Skorohod topology does not hold ecause, during the short time intervals when selective sweeps are taking place, Ψ N may undergo multiple transitions. Theorem.. Let µ e a finite measure on [ L, L] [, 1], and let r :[ L, L] [, ) e a ounded continuous function such that r() = and r is nonincreasing on [ L, ] and nondecreasing on [,L]. Suppose that, as N, the measures Nµ N converge weakly to µ and the functions (log N)r N converge uniformly to r. Letη e the measure on (, 1] such that η([y, 1]) = L 1 s1 {e r(x)/s y} L µ(dx ds) for all y (, 1]. LetΛ e the measure on [, 1] defined y Λ=δ +Λ, where Λ (dx) =x η(dx). Let Π = (Π(t),t ) e the P n -valued Λ-coalescent. Then, as N, the finite-dimensional distriutions of Ψ N converge to the finite-dimensional distriutions of Π. Note that in Theorem., the recomination proaility is O(1/(log N)). The function r is assumed to e monotone on [ L, ] and [,L] ecause the greater the distance etween and the 6

7 site of the mutation, the greater the likelihood of recomination etween the two sites. Also, the rate of eneficial mutations is O(1/N ), so that the multiple mergers caused y selective sweeps and the ordinary mergers of two lineages at a time are happening on the same time scale. If the rate of selective sweeps were o(1/n ), then the multiple mergers would disappear in the limit. If selective sweeps occurred on a faster time scale than O(1/N ), then the multiple mergers would dominate for large N and the limiting coalescent would have no δ component. Gillespie () considers this possiility and proposes that it may explain why oserved genetic variation does not appear to e as sensitive to population size as Kingman s coalescent model predicts. However, in this paper we focus on the case in which oth types of mergers happen on the same time scale. We now derive the limiting coalescent with multiple collisions in two natural examples. Example.3. Consider the case in which we are concerned only with mutations at a single site, all of which have the same selective advantage. Fix α>, and let µ N = αn 1 δ (z,s) for some s (, 1] and z [ L, L]. This means that eneficial mutations that provide selective advantage s appear on the chromosome at site z at times of a Poisson process. The measures Nµ N converge to µ = αδ (z,s). Assume that the recomination functions r N are defined such that the sequence (log N)r N converges uniformly to r, and let β = r(z). Then, for all y (, 1], we have η([y, 1]) = L 1 L u1 {e r(x)/u y} µ(dx du) =sα1 {e β/s y}. Therefore, η consists of a mass sα at p = e β/s. It follows from Theorem. that the limiting coalescent process is the Λ-coalescent, where Λ = δ + sαp δ p. Thus, in addition to the mergers involving just two locks, we have coalescence events at times of a Poisson process in which we flip p-coins for each lineage and merge the lineages whose coins come up heads. Example.4. It is also natural to consider the case in which mutations occur uniformly along the chromosome. For simplicity, we will assume that the selective advantage s is fixed. Let λ denote Leesgue measure on [ L, L]. Suppose µ N = N 1 (αλ δ s ), so the measures Nµ N converge to µ = αλ δ s. To model recomination occurring uniformly along the chromosome, we assume that the functions (log N)r N converge uniformly to the function r(x) =β x, so the proaility of recomination is proportional to the distance etween the two sites on the chromosome. For all y (, 1], we have η([y, 1]) = αs L L L 1 {e r(x)/s y} dx = αs 1 {e β x /s y} dx. Since e β x /s y if and only if x (s/β)(log y), we have { αs } log y η([y, 1]) = min, αsl. β Therefore, for y e βl/s, we have d η([y, 1]) = αs dy βy. Let c =αs /β. It follows that η has a density given y g L (y) =c/y for e βl/s y 1 and g L (y) = otherwise. By Theorem., the finite-dimensional distriutions of the ancestral L 7

8 processes Ψ N converge to those of the Λ-coalescent, where Λ = δ +Λ and Λ has density h L (y) =y g L (y). Note that as L, the density h L (y) converges to h(y), where h(y) =cy for y [, 1] and h(y) = otherwise. We can think of this as the limiting coalescent for an infinitely long chromosome. Example.5. Finally, we show that any Λ-coalescent with a unit mass at zero can arise as a limit of ancestral processes in this model. We first show how to otain coalescents of the form Λ=δ +Λ, where Λ is a finite measure on [ɛ, 1] and <ɛ<1. Note that in Theorem., we have Λ (dx) =x η(dx), so it suffices to show that µ and r can e chosen to make η an aritrary finite measure on [ɛ, 1]. Let G :[ɛ, 1] [, ) e any nonincreasing left-continuous function. We will choose µ and r so that η([y, 1]) = G(y) for ɛ y 1 and η([,ɛ)) =. Let L = 1 log ɛ, and let ν e the measure on [ L, L] such that ν([ L, )) = and, for ɛ y 1, ν([, 1 log y])=g(y). Suppose r(x) = x and µ = ν δ 1/. Then, for ɛ y 1, η([y, 1]) = = 1 L 1 s1 {e r(x)/s y} L L µ(dx ds) 1 {e x y} ν(dx) = 1 ν([, (log y)/]) = G(y), as claimed. Thus, we can get the Λ-coalescent in the limit if Λ ((,ɛ)) =. We can otain an aritrary Λ-coalescent y then taking a limit as L (or ɛ ) as in Example.4. 3 Approximation y a coalescent with simultaneous multiple collisions A key ingredient in the proof of Theorem. is part 1 of Proposition.1. Part 1 of Proposition.1 says that, up to an error of O(1/(log N)), we can approximate the effect of a selective sweep on the genealogy y flipping a p-coin for each lineage and merging the lineages whose coins come up heads. However, Durrett and Schweinserg (4a) oserved in simulations that for N etween 1, and 1,,, the approximation in Proposition.1 works poorly, largely ecause it is possile for multiple groups of lineages to coalesce at the time of a selective sweep. By taking this into account, they were ale to give a more complicated approximation that works much etter in simulations and has an error of only O(1/(log N) ). Before stating this result, we review Kingman s (1978) paintox construction of exchangeale random partitions of {1,...,n}. Let = { (x 1,x,...):x 1 x, x i 1 }, and let G e a proaility measure on. We define a G-partition Π of {1,...,n} as follows. Let Y =(Y 1,Y,...) e a -valued random variale with distriution G. Define a sequence (Z i ) n i=1 to e conditionally i.i.d. given Y such that P (Z i = j Y )=Y j for all positive integers j and P (Z i = Y )=1 j=1 Y j. Then define Π to e the partition such that distinct integers i and j are in the same lock if and only if Z i = Z j 1. We denote the distriution of a G-partition of {1,...,n} y Q G,n. Note that if G is a unit mass at (p,,,...), then Q G,n = Q p,n. i=1 8

9 Next, we define a family of distriutions R(θ, M) on y using a stick-reaking construction. Let θ [, 1], and let M e a positive integer. Let (W k ) M k= e independent random variales such that W k has a Beta(1,k 1) distriution. Let (ζ k ) M k= e a sequence of independent random variales such that P (ζ k =1)=θ and P (ζ k =)=1 θ for all k. For k =, 3,...,M, let V k = ζ k W k. To perform the stick reaking, we first reak off a fraction W M of the unit interval, then reak off a fraction W M 1 of what is left over, and so on until we get down to W. For k =,...,M, the length of the kth fragment is Ỹk = V M k j=k+1 (1 V j), and the length of the first fragment is Ỹ1 = M j= (1 V j). Note that M k=1 Ỹk = 1. Let Y =(Y 1,Y,...,Y M,,,...) e the sequence otained y ranking the interval lengths Ỹ1,...,ỸM in decreasing order and then appending an infinite sequence of zeros. Finally, let R(θ, M) e the distriution of Y. These distriutions R(θ, M) were studied in Durrett and Schweinserg (4), who used them to approximate the distriution of family sizes in a Yule process with infinitely many types. They arise in the proposition elow ecause, after a eneficial mutation, the numer of lineages with the B allele that do not eventually die out can e approximated y a Yule process. The result elow is Theorem 1. of Schweinserg and Durrett (4). Proposition 3.1. Fix n N, and fix s (, 1). Assume there is a constant C such that r C /(log N) for all N. Let α = r log(n)/s, and let p = e α. Then there exists a positive constant C, depending continuously on s and α ut not depending on N, such that P (Θ = π X(τ) =N) Q R(r/s, Ns ),n (π) C/(log N) for all π P n, where m denotes the greatest integer less than or equal to m. Because the improved approximation allows many groups of lineages to coalesce at the time of a selective sweep, this result suggests that, for finite N, a coalescent with simultaneous multiple collisions should provide a etter approximation of the ancestral process than a coalescent with multiple collisions. Coalescents with simultaneous multiple collisions, which were studied y Möhle and Sagitov (1), Schweinserg (), and Bertoin and Le Gall (3), have the property that many locks can merge at once into a single lock, and many such mergers can occur simultaneously. Coalescents with simultaneous multiple collisions are in one-to-one correspondence with finite measures Ξ on. Suppose π is a partition of {1,...,n} whose locks are B 1,...,B m, and suppose π is a partition of {1,...,n } with n m whose locks are B 1,...,B k. Following Bertoin and Le Gall (3), define the coagulation of π y π to e the partition whose locks are given y j B i B j for i =1,...,k. Suppose (Π n (t),t ) is the P n -valued Ξ-coalescent. If there are locks at time t and a merger occurs at time t, then there exists a unique partition π P such that Π n (t) is the coagulation of Π n (t ) yπ. If π has r + s locks, s of which are singletons and the other r of which have sizes k 1,...,k r, where = k k r + s, then the rate of this transition is ( 1 λ ;k1,...,k r;s = Q δx,(π) xj) Ξ (dx)+a1 {r=1,k1 =}, (3.1) j=1 where δ x denotes a unit mass at x =(x 1,x,...) and Ξ has een written as aδ (,,... ) +Ξ with Ξ ({(,,...)}) =. Coalescents with multiple collisions are a special case in which Ξ is concentrated on points in which only the first coordinate is nonzero. 9

10 Coalescents with multiple and simultaneous multiple collisions can e constructed from Poisson point processes (see Pitman (1999) and Schweinserg ()). Consider a Poisson process on (, ) P n whose intensity measure is the product of Leesgue measure on (, ) and a measure L on P n defined as follows. Let S P n e the set of all partitions consisting of one lock of size and n singletons. If π P n, let L(π) =ifπ is the partition consisting of n singletons. Otherwise, let ( 1 L(π) = Q δx,n(π) xj) Ξ (dx)+a1 {π S}. (3.) j=1 Since L is a finite measure, it is easy to define Π n =(Π n (t),t ) such that Π n () is the partition consisting of n singletons and, at the times of points (t, π) of the Poisson point process, the partition Π n (t) is the coagulation of Π n (t ) yπ, and these are the only jump times of Π n. This coalescent process is the P n -valued Ξ-coalescent. The construction of the Λ-coalescent is the same, except that if π has at least one lock that is not a singleton, we define L(π) = 1 Q p,n (π)p Λ (dp)+a1 {π S}, (3.3) where Λ = δ +Λ and Λ ({}) =. Under some additional assumptions, most significantly restricting the selective advantage resulting from each eneficial mutation to e at least ɛ >, we are ale to otain ounds on the difference etween the finite-dimensional distriutions of Ψ N and the finite-dimensional distriutions of the approximating coalescent process. Proposition 3. elow shows that indeed the coalescent with simultaneous multiple collisions gives a more accurate approximation. Proposition 3.. Let µ e a finite measure on [ L, L] [ɛ, 1], where ɛ>, and let r :[ L, L] [, 1] e a function such that r() = and r is nonincreasing on [ L, ] and nondecreasing on [,L]. Suppose that, for all N, we have µ N = N 1 µ. Also, assume that r N (x) =r(x)/ log(n) for all N and x. Fix times <u 1 < <u m, and let π 1,...,π m P n. 1. Define η and Λ as in Theorem.. Let Π = (Π(t),t ) e the P n -valued Λ-coalescent. Then there exists a constant C such that P (Ψ N (u i )=π i for i =1,...,m) P (Π(u i )=π i for i =1,...,m) C log N.. Let G N e the measure on such that for all measurale susets A, we have G N (A) = L 1 L sr(r N (x)/s, Ns )(A) µ(dx ds). Let Ξ N e the measure on given y Ξ N = δ (,,... ) +Ξ N,, where Ξ N, is defined y Ξ N, (dx) =( j=1 x j )G N (dx). Let Υ N =(Υ N (t),t ) e the P n -valued Ξ N -coalescent. Then there exists a constant C such that P (Ψ N (u i )=π i for i =1,...,m) P (Υ N (u i )=π i for i =1,...,m) C (log N). 1

11 4 Segregating sites and pairwise differences One motivation for modeling a population that experiences recurrent selective sweeps y coalescents with multiple or simultaneous multiple collisions is that these coalescent models can provide insight into tests used to detect selective sweeps. In view of part of Proposition 3. and the simulation results in Durrett and Schweinserg (4a), there should e little loss of accuracy in studying the ehavior of these tests under the assumption that the genealogy of a sample follows a coalescent with simultaneous multiple collisions. One commonly used test is ased on Tajima s D-statistic (see Tajima (1989)). Given a sample of n strands of DNA from the same region on a chromosome, let ij e the numer of sites at which the ith and jth segments differ, and let n = ( n) 1 i j ij e the average numer of pairwise differences over the ( n ) possile pairs. Let S n e the numer of segregating sites in the sample, that is, the numer of sites at which at least one pair of segments differs. Tajima s D-statistic compares the statistics n and S n. Suppose the ancestral history of a sample of N individuals is given y a coalescent with multiple or simultaneous multiple collisions. Let λ e the total rate of all mergers when the coalescent has locks. Assume that, on the time scale of the coalescent process, mutations happen at rate θ/. Any mutation on the ith or jth lineage efore these lineages coalesce will cause the ith and jth segments to differ at some site. Since the expected time for these lineages to coalesce is λ 1, we have E[ ij] =θλ 1. Therefore E[ n ]=θλ 1. (4.1) Note that λ = Λ([, 1]) for coalescents with multiple collisions and λ = Ξ( ) for coalescents with simultaneous multiple collisions. To calculate the expected numer of segregating sites, we note that any mutation in the ancestral tree efore all n lineages have coalesced into one adds to the numer of segregating sites. If, at some time, the coalescent has exactly locks, the expected time that the coalescent has locks is λ 1. Let G n () e the proaility that the coalescent, starting with n locks, will have exactly locks at some time. Then E[S n ]= θ λ 1 G n (). (4.) Although we do not have a closed-form expression for G n (), these quantities can e calculated recursively ecause (.1) and (3.1) allow us to express G n () in terms of G k () for k<n. As a result, it would not e difficult to evaluate the expression in (4.) numerically. Suppose the ancestral process is given y Kingman s coalescent, which would e the case if there were no selective sweeps. Then λ = ( ) for all. Also, the numer of locks never decreases y more than one at a time, so G n () = 1 whenever n. It follows that E[ n ]=θ and E[S n ]= θ ( ) 1 1 = θ 1 = θh n 1, (4.3) where h n 1 = n 1 i=1 (1/i). Thus, E[ n S n /h n 1 ] =. This oservation is the asis for Tajima s D-statistic, which is given y D = n S n /h n 1 an S n + n S n (S n 1), (4.4) 11

12 where a n and n are somewhat complicated constants that are chosen to make the variance of D approximately one when the ancestral tree is given y Kingman s coalescent. See section 4.1 of Durrett () for details. After a selective sweep, the new mutants will tend to have low frequency. As a result, a recent selective sweep should decrease n more than S n, causing the numerator of Tajima s D-statistic to e negative. Braverman et. al. (1995) found in simulations that Tajima s D-statistic indeed tends to e negative after a selective sweep. Simonsen, Churchill, and Aquadro (1995) studied this question further and argued that unless the selective sweep was recent, Tajima s D-statistic had relatively little power to detect selective sweeps. See also Przeworski (), who discusses the power of Tajima s D-statistic to detect selective sweeps. Our coalescent approximation allows us to otain the following result regarding the expected numer of segregating sites when the population experiences recurrent selective sweeps. Proposition 4.1. Consider a Λ-coalescent in which Λ=δ +Λ, where Λ ({}) =,ora Ξ-coalescent in which Ξ=δ (,,... ) +Ξ and Ξ ({(,,...)}) =.Letα = λ ( ). Suppose Then, there exists a constant ρ such that α log <. (4.5) lim n E[S n] θh n 1 = ρ. (4.6) Furthermore, defining G () = lim n G n (), we have ρ = θ (( ) 1 ) λ 1 + θ λ 1 (1 G ()). (4.7) The condition (4.5) prevents Λ or Ξ from having too much mass near zero. Note that (4.1) implies that E[ n ] decreases y a constant as a result of the eneficial mutations, while Proposition 4.1 implies that when (4.5) holds, E[S n /h n 1 ] decreases y approximately ρ/h n 1, which is O(1/(log n)). Therefore, Proposition 4.1 shows that for sufficiently large samples we do expect Tajima s D-statistic to e negative when the population is affected y recurrent selective sweeps. Before proving this proposition, we consider some examples. Example 4.. Suppose, as in Example.3, we have a Λ-coalescent in which Λ = δ + sαp δ p. Since p-mergers occur at rate sα, we have λ ( ) + sα and thus α sα for all. Condition (4.5) follows immediately. Suppose instead we have the Λ-coalescent of Example.4, where Λ = δ +Λ and Λ (dx) = cx dx. Note that α is the same as the total merger rate of the Λ -coalescent when there are locks. Using the fact that if Z Binomial(, x) then P (Z )=1 (1 x) x(1 x) 1, 1

13 we have α = = c = c c 1 1 which implies (4.5). (1 (1 x) x(1 x) 1 )x Λ (dx) 1/ 1/ (1 (1 x) x(1 x) 1 )x 1 dx c 1 (1 (1 x) )x 1 dx + c 1/ 1 (1 (1 x) )x 1 dx (1 (1 x) )x 1 dx 1 dx+ c x 1 dx = c(1 + log ), (4.8) 1/ Example 4.3. Although (4.5) holds in the natural cases given in Examples.3 and.4, we show here that it does not hold for all coalescents. Suppose Λ = δ +Λ, where Λ is the uniform distriution on (, 1). Note that there exists a constant C>such that if Z Binomial(, x) with x 1/ and, then P (Z ) C. Therefore, α = 1 1 C (1 (1 x) x(1 x) 1 )x dx 1/ 1 so (4.5) does not hold in this case. (1 (1 x) x(1 x) 1 )x dx 1/ x dx = C( 1), Proof of Proposition 4.1. When the coalescent has n+1 locks, the proaility that the next coalescence event will take the coalescent down to fewer than n locks is at most [λ n+1 ( ) n+1 ]/λn+1. Therefore, if n, then G n+1 () G n () λ n+1 ( n+1) = α n+1 α n+1 λ n+1 λ n+1 n(n +1). (4.9) Therefore, when (4.5) holds, the sequence (G n ()) n= is Cauchy and thus has a limit G (). It follows from (4.) and (4.3) that E[S n ] θh n 1 = θ λ 1 G n () θ ( ) 1 = θ ( ( ) 1 ) λ 1 θ λ 1 (1 G ()) + θ λ 1 (G n () G ()). (4.1) To prove Proposition 4.1, we need to take the limit as n of the three terms on the right-hand side of (4.1). 13

14 For the first term, we note that ( ) 1 λ 1 = λ ( ) ( ) α = λ ) ( 4α ( 1). Therefore, when (4.5) holds, we have a summale series and θ ( ( ) 1 ) lim λ 1 n = θ (( ) 1 ) λ 1. (4.11) For the second term, note that (4.9) and the fact that G () = 1 imply λ 1 (1 G ()) = m= ( α m+1 ) 1 m(m +1) m= α m+1 m(m +1) m which is finite y (4.5). Therefore, θ lim λ 1 n (1 G ()) = θ Finally, for the third term, lim sup n λ 1 G n () G () lim sup n lim sup n lim sup n 1 log n 1 m= ( 1 4α m+1 (1 + log(m 1)), m(m +1) λ 1 (1 G ()). (4.1) m=n 1 (1 + log(n 1)) log n α m+1 m(m +1) ( m=n m=n y (4.5). The proposition follows from (4.1), (4.11), (4.1), and (4.13). 5 The numer of singletons ) α m+1 log m m(m +1) α m+1 log m m(m +1) ) = (4.13) Fu and Li (1993) proposed another test to detect departures from Kingman s coalescent. They considered the ancestral tree in which the leaves are the n individuals in the sample. They defined the ranches connecting a leaf to an internal node to e external ranches and the other ranches to e internal ranches. Let η e denote the numer of mutations on external ranches, and let η i e the numer of mutations on internal ranches. Every mutation produces a segregating site, so η e + η i = S n. If a mutation occurs on an external ranch, the mutant gene appears on just one of the n individuals in the sample, while if a mutation occurs on an internal ranch, the mutant gene appears on etween and n 1 of the individuals in the sample. Therefore, to determine η e, we simply count the numer of mutations that appear on just one of the sampled chromosomes. 14

15 Note that unless an outgroup is availale, it will not e possile to distinguish etween a mutation that appears on one of the sampled chromosomes and a mutation that appears on n 1 of the sampled chromosomes. Fu and Li (1993) proposed a modification of their test for when there is no outgroup, ut for the analysis in this section, we assume that we have an outgroup that enales us to make this distinction. Let J n e the sum of the lengths of the external ranches. In terms of the associated coalescent process, J n is the sum, over i etween 1 and n, of the amount of time that the integer i is in a singleton lock. Let I n e the sum of the lengths of the internal ranches. Assuming, as efore, that mutations occur at rate θ/ on the time scale of the coalescent process, we have E[η e J n ]=(θ/)j n and E[η i I n ]=(θ/)i n. Fu and Li s D-statistic is ased on comparing η i with (h n 1 1)η e. Note that η i (h n 1 1)η e = S n h n 1 η e. To see that this has mean zero when the ancestral tree is given y Kingman s coalescent, we follow the explanation on p. 163 of Durrett (). In the case of Kingman s coalescent, (4.3) gives E[S n ]=θh n 1. Therefore, E[S n h n 1 η e ]=θh n 1 θh n 1 E[J n ]/, so it remains to show that E[J n ] =. Let K n e the amount of time that the integer 1 is in a singleton lock of the partition, so E[J n ]=ne[k n ]. Let T n e the amount of time efore the first coalescence event, and note that E[T n ]=/[n(n 1)]. The proaility that 1 coalesces with another integer at time T n is /n, and this event is independent of T n. If 1 does not coalesce at this time, then the expected additional time that 1 is a singleton is E[K n 1 ]. Therefore, we get the recursion E[K n ]= n E[T n]+ n n E[T n + K n 1 ]= n(n 1) + n n E[K n 1]. Note that E[K ] = 1, and then it is easy to show y induction that E[K n ]=/n for all n, and so E[J n ] = for all n, as claimed. We can write Fu and Li s D-statistic as D = S n h n 1 η e, (5.1) cn S n + d n Sn where, as in (4.4), c n and d n are constants chosen to make the variance of the statistic approximately one when the genealogy is given y Kingman s coalescent. Details of the variance computation are given in section 4. in Durrett (), where an error of Fu and Li (1993) is corrected. When multiple mergers cause many lineages to coalesce at once, one expects I n to e reduced more than J n ecause there is still an external ranch associated with each leaf, ut there are fewer internal ranches ecause of multiple mergers. This would cause Fu and Li s D-statistic to e negative. The next proposition shows that this is indeed the case. Proposition 5.1. Let (Π n (t),t ) e a P n -valued Λ-coalescent in which Λ=δ +Λ, where Λ ({}) =,orap n -valued Ξ-coalescent in which Ξ=δ (,,... ) +Ξ and Ξ ({(,,...)}) =. Let α = λ ( ), and suppose (4.5) holds. Then where ρ is the constant defined in (4.7). lim n E[S n h n 1 η e ]= ρ, (5.) 15

16 The key to the proof of this proposition is the following lemma. Lemma 5.. Under the assumptions of Proposition 5.1, there is a positive constant C such that E[ J n ] C n α (5.3) for all n. The first inequality in (5.3), which does not require condition (4.5), shows that the expected sum of the lengths of the external ranches is never greater than, which means that it is largest for Kingman s coalescent. The second inequality gives a rather sharp ound on the difference. Recall that in Example.3, we have α sα, soe[ J n ] C (log n)/n for some other constant C. In Example.4, (4.8) gives α c(1 + log ) c(1 + log n), which implies E[ J n ] C (log n) /n for some constant C. Thus, in these examples, the lengths of the external ranches are affected very little y multiple mergers when the sample size is large. The reason is that, in large samples, a lot of coalescence occurs very quickly, so most ancestral lines have merged with at least one other ancestral line efore the first multiple merger takes place. Proof of Lemma 5.. We start y proving the first inequality in (5.3) y induction. As efore, let K n e the amount of time that the integer 1 is in a singleton lock. We need to show that E[K n ] /n for all n. First, note that E[K ]=λ 1 1. Now, suppose for some n 3, we have E[K j ] /j for j =,...,n 1, and consider E[K n ]. Let T n e the time of the first merger when the coalescent starts with n locks, and let B e the numer of locks involved in the merger at time T n. Note that B is independent of T n. Conditional on B, the proaility that 1 merges with at least one other lock at time T n is B/n. If this does not happen, then at least n B + 1 locks remain after the merger, so y the induction hypothesis, the expected time after T n that {1} will remain a singleton is at most /(n B + 1). Therefore, E[K n T n,b] ( B n ) T n + ( n B n )( ) T n + n B +1 = T n + (n B) n(n B +1). Since B n, we have (n B)/(n B+1) (n )/(n 1). Also, E[T n ]=λ 1 n /[n(n 1)], so (n ) E[K n ] + n(n 1) n(n 1) = n, which proves the first inequality. The proof of the second inequality requires a coupling argument. Let (Π n (t),t ) e the coalescent process defined in the statement of Proposition 5.1, and let (Υ n (t),t ) e Kingman s coalescent, started from the partition of 1,...,n into singletons. We may assume that the coalescent processes Π n and Υ n are constructed from Poisson processes N 1 and N respectively on (, ) P n, as descried in section 3. That is, whenever (t, π) is a point of N 1, the partition Π n (t) is the coagulation of Π n (t ) yπ, and whenever (t, π) is a point of N, the partition Υ n (t) is the coagulation of Υ n (t ) yπ. Furthermore, these are the only jump times of Π n and Υ n. Let L 1 and L e the intensity measures of the second coordinate for the Poisson processes N 1 and N respectively. Then, for π P n, we have L (π) =1ifπ consists of one lock of size and n singletons, and L (π) = otherwise. Also, L 1 (π) L (π) for 16

17 all π P n. Therefore, we may assume that the Poisson processes N 1 and N are coupled such that if (t, π) is a point of N then (t, π) is a point of N 1. The points (t, π) in oth N 1 and N correspond to mergers in which two locks coalesce at a time, while the points (t, π) inn 1 ut not N correspond to multiple mergers caused y selective sweeps. To compare the two processes, note that K n = inf{t : {1} is not a singleton in Π n (t)}, and let K n = inf{t : {1} is not a singleton in Υ n(t)}. We have E[J n ]=ne[k n ]. By our previous results for Kingman s coalescent, we have E[K n]=/n, and so E[ J n ]=ne[k n K n ]. Let τ = inf{t :Π n (t) Υ n (t)}, where we say τ = if Π n (t) =Υ n (t) for all t. For π P n, denote y π the numer of locks in π. Since Π n (t) =Υ n (t) for all t τ, we have E[ J n ]=ne[k n K n] ne[(k n τ)1 {τ<k n } ]=n E[(K n τ)1 {τ<k n } 1 { Υn(τ) =}]. For =1,,...,n, define T = inf{t : Υ n (t) = }. If τ<k n and Υ n(τ) =, then K n >T. Therefore, E[ J n ] n E[K n τ {τ <K n} { Υ n (τ) = }]P ({K n >T } { Υ n (τ) = }). (5.4) If τ<k n and Υ n (τ) =, then {1} is one of locks of Υ n (τ), and y our previous results on Kingman s coalescent, the expected time efore it merges with another lock is /. Thus, we have E[K n τ {τ <K n} { Υ n (τ) = }] =. (5.5) Note that K n >T whenever {1} remains a singleton at the time that Kingman s coalescent is down to locks. Whenever the coalescent goes from j locks to j 1, the proaility that the integer 1 is involved in the merger is /j, so P (K n >T )= n j=+1 ( 1 ) ( exp j j=+1 ) ( n ) ( ) 1 exp 1 j x dx = e. (5.6) n If Υ n (τ) =, then oth Π n and Υ n have the same locks at time T, ut at time τ the process Π n has a transition ut Υ n does not. Since the total merger rate for Π n after time T is λ = α + ( ) and the total merger rate for Υn after time T is ( ), we have Comining (5.4)-(5.7), we get P ( Υ n (τ) = K n >T ) α λ E[ J n ] n which is the second inequality in (5.3). 4eα ( 1)n C n α ( 1). (5.7) α, 17

18 Proof of Proposition 5.1. We have E[S n h n 1 η e ]=(E[S n ] θh n 1 )+h n 1 (θ E[η e ]) = (E[S n ] θh n 1 )+ h n 1θ E[ J n ]. (5.8) By Proposition 4.1, lim n (E[S n ] θh n 1 )= ρ. It thus remains only to show that the second term on the right-hand side of (5.8) goes to zero as n. Let ɛ>. By (4.5), there exists a positive integer N such that α (1 + log ) <ɛ. Therefore, y Lemma 5., lim sup n =N h n 1 θ Ch n 1 θ E[ J n ] lim sup n n + Cθ lim sup n =N α α h n 1 n = lim sup n Cθ ( Ch n 1 θ N n lim sup n =N α + =N α (1 + log ) Since this is true for all ɛ>, and since E[ J n ] for all n y Lemma 5., we have lim n which completes the proof of the proposition. h n 1 θ E[ J n ]=, ) α Cθɛ. We conclude this section with some comments aout the power of Tajima s D-statistic and Fu and Li s D-statistic to detect selective sweeps. The numerators of these two statistics, which are n S n /h n 1 and S n h n 1 η e, each have mean zero when the ancestral process is Kingman s coalescent. The expected values of these two numerators oth converge to a negative constant as the sample size goes to infinity when multiple mergers can occur. These statistics are used to test for departures from Kingman s coalescent. If the goal is to test for multiple mergers caused y selective sweeps, one would reject the null hypothesis of no selective sweeps if the value of the statistic is too small (i.e. more negative than would e expected with Kingman s coalescent). A natural question, then, is how much power these tests have to detect selective sweeps. While a full analysis of this question would require a simulation study, we can otain some insight from the analytical results presented aove. From the values of a n and n in (4.4), which can e found in section 4.1 of Durrett (), we see that the standard deviation of the numerator of Tajima s D-statistic is O(1) when the genealogy is given y Kingman s coalescent. However, from the values of c n and d n in (5.1), which can e found in section 4. of Durrett (), we see that the numerator of Fu and Li s D-statistic has a standard deviation which is O(log n). This means that, for large n, moderate negative values for the numerator of Fu and Li s D-statistic are not strong evidence against the null model of Kingman s coalescent, and thus a test ased on Fu and Li s D-statistic will most likely have low power. These oservations are consistent with simulation results of Simonsen, Churchill, and Aquadro (1995), who found that Tajima s D-statistic has more power to detect selective sweeps than Fu and Li s D-statistic. Neither of these tests has the desirale feature of many tests in classical statistics, which is that for all α>, the power of the level α test tends to 1 as the sample size n tends to 18

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego.

Dynamics of the evolving Bolthausen-Sznitman coalescent. by Jason Schweinsberg University of California at San Diego. Dynamics of the evolving Bolthausen-Sznitman coalescent by Jason Schweinsberg University of California at San Diego Outline of Talk 1. The Moran model and Kingman s coalescent 2. The evolving Kingman s

More information

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg

Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg Wald Lecture 2 My Work in Genetics with Jason Schweinsbreg Rick Durrett Rick Durrett (Cornell) Genetics with Jason 1 / 42 The Problem Given a population of size N, how long does it take until τ k the first

More information

Evolution in a spatial continuum

Evolution in a spatial continuum Evolution in a spatial continuum Drift, draft and structure Alison Etheridge University of Oxford Joint work with Nick Barton (Edinburgh) and Tom Kurtz (Wisconsin) New York, Sept. 2007 p.1 Kingman s Coalescent

More information

Stochastic flows associated to coalescent processes

Stochastic flows associated to coalescent processes Stochastic flows associated to coalescent processes Jean Bertoin (1) and Jean-François Le Gall (2) (1) Laboratoire de Probabilités et Modèles Aléatoires and Institut universitaire de France, Université

More information

Stochastic Demography, Coalescents, and Effective Population Size

Stochastic Demography, Coalescents, and Effective Population Size Demography Stochastic Demography, Coalescents, and Effective Population Size Steve Krone University of Idaho Department of Mathematics & IBEST Demographic effects bottlenecks, expansion, fluctuating population

More information

Introduction to self-similar growth-fragmentations

Introduction to self-similar growth-fragmentations Introduction to self-similar growth-fragmentations Quan Shi CIMAT, 11-15 December, 2017 Quan Shi Growth-Fragmentations CIMAT, 11-15 December, 2017 1 / 34 Literature Jean Bertoin, Compensated fragmentation

More information

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego The genealogy of branching Brownian motion with absorption by Jason Schweinsberg University of California at San Diego (joint work with Julien Berestycki and Nathanaël Berestycki) Outline 1. A population

More information

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford

The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Bob Griffiths University of Oxford The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion Bob Griffiths University of Oxford A d-dimensional Λ-Fleming-Viot process {X(t)} t 0 representing frequencies of d types of individuals

More information

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego Yaglom-type limit theorems for branching Brownian motion with absorption by Jason Schweinsberg University of California San Diego (with Julien Berestycki, Nathanaël Berestycki, Pascal Maillard) Outline

More information

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge

The mathematical challenge. Evolution in a spatial continuum. The mathematical challenge. Other recruits... The mathematical challenge The mathematical challenge What is the relative importance of mutation, selection, random drift and population subdivision for standing genetic variation? Evolution in a spatial continuum Al lison Etheridge

More information

ON COMPOUND POISSON POPULATION MODELS

ON COMPOUND POISSON POPULATION MODELS ON COMPOUND POISSON POPULATION MODELS Martin Möhle, University of Tübingen (joint work with Thierry Huillet, Université de Cergy-Pontoise) Workshop on Probability, Population Genetics and Evolution Centre

More information

The Combinatorial Interpretation of Formulas in Coalescent Theory

The Combinatorial Interpretation of Formulas in Coalescent Theory The Combinatorial Interpretation of Formulas in Coalescent Theory John L. Spouge National Center for Biotechnology Information NLM, NIH, DHHS spouge@ncbi.nlm.nih.gov Bldg. A, Rm. N 0 NCBI, NLM, NIH Bethesda

More information

The Wright-Fisher Model and Genetic Drift

The Wright-Fisher Model and Genetic Drift The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population

More information

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego)

The nested Kingman coalescent: speed of coming down from infinity. by Jason Schweinsberg (University of California at San Diego) The nested Kingman coalescent: speed of coming down from infinity by Jason Schweinsberg (University of California at San Diego) Joint work with: Airam Blancas Benítez (Goethe Universität Frankfurt) Tim

More information

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof)

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof) JOURNAL OF MODERN DYNAMICS VOLUME 4, NO. 4, 010, 637 691 doi: 10.3934/jmd.010.4.637 STRUCTURE OF ATTRACTORS FOR (a, )-CONTINUED FRACTION TRANSFORMATIONS SVETLANA KATOK AND ILIE UGARCOVICI (Communicated

More information

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Upper Bounds for Stern s Diatomic Sequence and Related Sequences Upper Bounds for Stern s Diatomic Sequence and Related Sequences Colin Defant Department of Mathematics University of Florida, U.S.A. cdefant@ufl.edu Sumitted: Jun 18, 01; Accepted: Oct, 016; Pulished:

More information

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES #A50 INTEGERS 14 (014) ON RATS SEQUENCES IN GENERAL BASES Johann Thiel Dept. of Mathematics, New York City College of Technology, Brooklyn, New York jthiel@citytech.cuny.edu Received: 6/11/13, Revised:

More information

Notes 20 : Tests of neutrality

Notes 20 : Tests of neutrality Notes 0 : Tests of neutrality MATH 833 - Fall 01 Lecturer: Sebastien Roch References: [Dur08, Chapter ]. Recall: THM 0.1 (Watterson s estimator The estimator is unbiased for θ. Its variance is which converges

More information

INVERTING THE CUT-TREE TRANSFORM

INVERTING THE CUT-TREE TRANSFORM INVERTING THE CUT-TREE TRANSFORM LOUIGI ADDARIO-BERRY, DAPHNÉ DIEULEVEUT, AND CHRISTINA GOLDSCHMIDT Astract. We consider fragmentations of an R-tree T driven y cuts arriving according to a Poisson process

More information

Frequency Spectra and Inference in Population Genetics

Frequency Spectra and Inference in Population Genetics Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient

More information

Chaos and Dynamical Systems

Chaos and Dynamical Systems Chaos and Dynamical Systems y Megan Richards Astract: In this paper, we will discuss the notion of chaos. We will start y introducing certain mathematical concepts needed in the understanding of chaos,

More information

arxiv: v1 [cs.fl] 24 Nov 2017

arxiv: v1 [cs.fl] 24 Nov 2017 (Biased) Majority Rule Cellular Automata Bernd Gärtner and Ahad N. Zehmakan Department of Computer Science, ETH Zurich arxiv:1711.1090v1 [cs.fl] 4 Nov 017 Astract Consider a graph G = (V, E) and a random

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

An introduction to mathematical modeling of the genealogical process of genes

An introduction to mathematical modeling of the genealogical process of genes An introduction to mathematical modeling of the genealogical process of genes Rikard Hellman Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats 2009:3 Matematisk

More information

Latent voter model on random regular graphs

Latent voter model on random regular graphs Latent voter model on random regular graphs Shirshendu Chatterjee Cornell University (visiting Duke U.) Work in progress with Rick Durrett April 25, 2011 Outline Definition of voter model and duality with

More information

Endowed with an Extra Sense : Mathematics and Evolution

Endowed with an Extra Sense : Mathematics and Evolution Endowed with an Extra Sense : Mathematics and Evolution Todd Parsons Laboratoire de Probabilités et Modèles Aléatoires - Université Pierre et Marie Curie Center for Interdisciplinary Research in Biology

More information

arxiv: v1 [math.pr] 1 Jan 2013

arxiv: v1 [math.pr] 1 Jan 2013 The role of dispersal in interacting patches subject to an Allee effect arxiv:1301.0125v1 [math.pr] 1 Jan 2013 1. Introduction N. Lanchier Abstract This article is concerned with a stochastic multi-patch

More information

Branching Processes II: Convergence of critical branching to Feller s CSB

Branching Processes II: Convergence of critical branching to Feller s CSB Chapter 4 Branching Processes II: Convergence of critical branching to Feller s CSB Figure 4.1: Feller 4.1 Birth and Death Processes 4.1.1 Linear birth and death processes Branching processes can be studied

More information

Lecture 6 January 15, 2014

Lecture 6 January 15, 2014 Advanced Graph Algorithms Jan-Apr 2014 Lecture 6 January 15, 2014 Lecturer: Saket Sourah Scrie: Prafullkumar P Tale 1 Overview In the last lecture we defined simple tree decomposition and stated that for

More information

How robust are the predictions of the W-F Model?

How robust are the predictions of the W-F Model? How robust are the predictions of the W-F Model? As simplistic as the Wright-Fisher model may be, it accurately describes the behavior of many other models incorporating additional complexity. Many population

More information

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Ying Xiao, Student Memer, IEEE, Krishnaiyan Thulasiraman, Fellow, IEEE, and Guoliang Xue, Senior Memer, IEEE Astract

More information

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS APPLICATIONES MATHEMATICAE 9,3 (), pp. 85 95 Erhard Cramer (Oldenurg) Udo Kamps (Oldenurg) Tomasz Rychlik (Toruń) EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS Astract. We

More information

Computing likelihoods under Λ-coalescents

Computing likelihoods under Λ-coalescents Computing likelihoods under Λ-coalescents Matthias Birkner LMU Munich, Dept. Biology II Joint work with Jochen Blath and Matthias Steinrücken, TU Berlin New Mathematical Challenges from Molecular Biology,

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics 70 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 19, 2011 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

CHAPTER 5. Linear Operators, Span, Linear Independence, Basis Sets, and Dimension

CHAPTER 5. Linear Operators, Span, Linear Independence, Basis Sets, and Dimension A SERIES OF CLASS NOTES TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS LINEAR CLASS NOTES: A COLLECTION OF HANDOUTS FOR REVIEW AND PREVIEW OF LINEAR THEORY

More information

Two viewpoints on measure valued processes

Two viewpoints on measure valued processes Two viewpoints on measure valued processes Olivier Hénard Université Paris-Est, Cermics Contents 1 The classical framework : from no particle to one particle 2 The lookdown framework : many particles.

More information

Evolution with Recombination

Evolution with Recombination Evolution with Recomination Varun Kanade SEAS Harvard University Camridge, MA, USA vkanade@fas.harvard.edu Astract Valiant (2007) introduced a computational model of evolution and suggested that Darwinian

More information

Demography April 10, 2015

Demography April 10, 2015 Demography April 0, 205 Effective Population Size The Wright-Fisher model makes a number of strong assumptions which are clearly violated in many populations. For example, it is unlikely that any population

More information

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC R. G. DOLGOARSHINNYKH Abstract. We establish law of large numbers for SIRS stochastic epidemic processes: as the population size increases the paths of SIRS epidemic

More information

Pathwise construction of tree-valued Fleming-Viot processes

Pathwise construction of tree-valued Fleming-Viot processes Pathwise construction of tree-valued Fleming-Viot processes Stephan Gufler November 9, 2018 arxiv:1404.3682v4 [math.pr] 27 Dec 2017 Abstract In a random complete and separable metric space that we call

More information

6 Introduction to Population Genetics

6 Introduction to Population Genetics Grundlagen der Bioinformatik, SoSe 14, D. Huson, May 18, 2014 67 6 Introduction to Population Genetics This chapter is based on: J. Hein, M.H. Schierup and C. Wuif, Gene genealogies, variation and evolution,

More information

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin CHAPTER 1 1.2 The expected homozygosity, given allele

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Human Population Genomics Outline 1 2 Damn the Human Genomes. Small initial populations; genes too distant; pestered with transposons;

More information

Genetic Variation in Finite Populations

Genetic Variation in Finite Populations Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Polynomial Time Perfect Sampler for Discretized Dirichlet Distribution

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Polynomial Time Perfect Sampler for Discretized Dirichlet Distribution MATHEMATICAL ENGINEERING TECHNICAL REPORTS Polynomial Time Perfect Sampler for Discretized Dirichlet Distriution Tomomi MATSUI and Shuji KIJIMA METR 003 7 April 003 DEPARTMENT OF MATHEMATICAL INFORMATICS

More information

1 Hoeffding s Inequality

1 Hoeffding s Inequality Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing

More information

Learning Session on Genealogies of Interacting Particle Systems

Learning Session on Genealogies of Interacting Particle Systems Learning Session on Genealogies of Interacting Particle Systems A.Depperschmidt, A.Greven University Erlangen-Nuremberg Singapore, 31 July - 4 August 2017 Tree-valued Markov processes Contents 1 Introduction

More information

Mathematical models in population genetics II

Mathematical models in population genetics II Mathematical models in population genetics II Anand Bhaskar Evolutionary Biology and Theory of Computing Bootcamp January 1, 014 Quick recap Large discrete-time randomly mating Wright-Fisher population

More information

BIRS workshop Sept 6 11, 2009

BIRS workshop Sept 6 11, 2009 Diploid biparental Moran model with large offspring numbers and recombination Bjarki Eldon New mathematical challenges from molecular biology and genetics BIRS workshop Sept 6, 2009 Mendel s Laws The First

More information

Erdős-Renyi random graphs basics

Erdős-Renyi random graphs basics Erdős-Renyi random graphs basics Nathanaël Berestycki U.B.C. - class on percolation We take n vertices and a number p = p(n) with < p < 1. Let G(n, p(n)) be the graph such that there is an edge between

More information

Some mathematical models from population genetics

Some mathematical models from population genetics Some mathematical models from population genetics 5: Muller s ratchet and the rate of adaptation Alison Etheridge University of Oxford joint work with Peter Pfaffelhuber (Vienna), Anton Wakolbinger (Frankfurt)

More information

Mathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath

Mathematical Biology. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. Matthias Birkner Jochen Blath J. Math. Biol. DOI 0.007/s00285-008-070-6 Mathematical Biology Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model Matthias Birkner Jochen Blath Received:

More information

14 Branching processes

14 Branching processes 4 BRANCHING PROCESSES 6 4 Branching processes In this chapter we will consider a rom model for population growth in the absence of spatial or any other resource constraints. So, consider a population of

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

Optimal Routing in Chord

Optimal Routing in Chord Optimal Routing in Chord Prasanna Ganesan Gurmeet Singh Manku Astract We propose optimal routing algorithms for Chord [1], a popular topology for routing in peer-to-peer networks. Chord is an undirected

More information

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences Indian Academy of Sciences A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences ANALABHA BASU and PARTHA P. MAJUMDER*

More information

Properties of an infinite dimensional EDS system : the Muller s ratchet

Properties of an infinite dimensional EDS system : the Muller s ratchet Properties of an infinite dimensional EDS system : the Muller s ratchet LATP June 5, 2011 A ratchet source : wikipedia Plan 1 Introduction : The model of Haigh 2 3 Hypothesis (Biological) : The population

More information

THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY

THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY SHINYA FUJITA AND HENRY LIU Astract The alanced decomposition numer f(g) of a graph G was introduced y Fujita and Nakamigawa [Discr Appl Math,

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

2.1 Elementary probability; random sampling

2.1 Elementary probability; random sampling Chapter 2 Probability Theory Chapter 2 outlines the probability theory necessary to understand this text. It is meant as a refresher for students who need review and as a reference for concepts and theorems

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

MATH 131P: PRACTICE FINAL SOLUTIONS DECEMBER 12, 2012

MATH 131P: PRACTICE FINAL SOLUTIONS DECEMBER 12, 2012 MATH 3P: PRACTICE FINAL SOLUTIONS DECEMBER, This is a closed ook, closed notes, no calculators/computers exam. There are 6 prolems. Write your solutions to Prolems -3 in lue ook #, and your solutions to

More information

Spiking problem in monotone regression : penalized residual sum of squares

Spiking problem in monotone regression : penalized residual sum of squares Spiking prolem in monotone regression : penalized residual sum of squares Jayanta Kumar Pal 12 SAMSI, NC 27606, U.S.A. Astract We consider the estimation of a monotone regression at its end-point, where

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Crump Mode Jagers processes with neutral Poissonian mutations

Crump Mode Jagers processes with neutral Poissonian mutations Crump Mode Jagers processes with neutral Poissonian mutations Nicolas Champagnat 1 Amaury Lambert 2 1 INRIA Nancy, équipe TOSCA 2 UPMC Univ Paris 06, Laboratoire de Probabilités et Modèles Aléatoires Paris,

More information

arxiv: v1 [math.pr] 20 Dec 2017

arxiv: v1 [math.pr] 20 Dec 2017 On the time to absorption in Λ-coalescents Götz Kersting and Anton Wakolbinger arxiv:72.7553v [math.pr] 2 Dec 27 Abstract We present a law of large numbers and a central limit theorem for the time to absorption

More information

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ), ABC estimation of the scaled effective population size. Geoff Nicholls, DTC 07/05/08 Refer to http://www.stats.ox.ac.uk/~nicholls/dtc/tt08/ for material. We will begin with a practical on ABC estimation

More information

10 Lorentz Group and Special Relativity

10 Lorentz Group and Special Relativity Physics 129 Lecture 16 Caltech, 02/27/18 Reference: Jones, Groups, Representations, and Physics, Chapter 10. 10 Lorentz Group and Special Relativity Special relativity says, physics laws should look the

More information

Lecture 18 : Ewens sampling formula

Lecture 18 : Ewens sampling formula Lecture 8 : Ewens sampling formula MATH85K - Spring 00 Lecturer: Sebastien Roch References: [Dur08, Chapter.3]. Previous class In the previous lecture, we introduced Kingman s coalescent as a limit of

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants

Statistical Tests for Detecting Positive Selection by Utilizing. High-Frequency Variants Genetics: Published Articles Ahead of Print, published on September 1, 2006 as 10.1534/genetics.106.061432 Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Kai Zeng,*

More information

Mean-field dual of cooperative reproduction

Mean-field dual of cooperative reproduction The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time

More information

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2 Least squares: Mathematical theory Below we provide the "vector space" formulation, and solution, of the least squares prolem. While not strictly necessary until we ring in the machinery of matrix algera,

More information

The effects of a weak selection pressure in a spatially structured population

The effects of a weak selection pressure in a spatially structured population The effects of a weak selection pressure in a spatially structured population A.M. Etheridge, A. Véber and F. Yu CNRS - École Polytechnique Motivations Aim : Model the evolution of the genetic composition

More information

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 Institute of Mathematical Statistics, 2006 HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO BY NICOLAI

More information

Each element of this set is assigned a probability. There are three basic rules for probabilities:

Each element of this set is assigned a probability. There are three basic rules for probabilities: XIV. BASICS OF ROBABILITY Somewhere out there is a set of all possile event (or all possile sequences of events which I call Ω. This is called a sample space. Out of this we consider susets of events which

More information

The variance for partial match retrievals in k-dimensional bucket digital trees

The variance for partial match retrievals in k-dimensional bucket digital trees The variance for partial match retrievals in k-dimensional ucket digital trees Michael FUCHS Department of Applied Mathematics National Chiao Tung University January 12, 21 Astract The variance of partial

More information

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1. Chapter 11 Min Cut By Sariel Har-Peled, December 10, 013 1 Version: 1.0 I built on the sand And it tumbled down, I built on a rock And it tumbled down. Now when I build, I shall begin With the smoke from

More information

GENE genealogies under neutral evolution are commonly

GENE genealogies under neutral evolution are commonly Copyright Ó 2008 by the Genetics Society of America DOI: 10.1534/genetics.107.076018 An Accurate Model for Genetic Hitchhiking Anders Eriksson,* Pontus Fernström, Bernhard Mehlig,1 and Serik Sagitov *Department

More information

Star discrepancy of generalized two-dimensional Hammersley point sets

Star discrepancy of generalized two-dimensional Hammersley point sets Star discrepancy of generalized two-dimensional Hammersley point sets Henri Faure Astract We generalize to aritrary ases recent results on the star discrepancy of digitally shifted two-dimensional Hammersley

More information

THE COALESCENT Lectures given at CIMPA school St Louis, Sénégal, April Étienne Pardoux

THE COALESCENT Lectures given at CIMPA school St Louis, Sénégal, April Étienne Pardoux THE COALESCENT Lectures given at CIMPA school St Louis, Sénégal, April 2010 Étienne Pardoux 2 Contents 1 Kingman s coalescent 5 1.1 The Wright Fisher model.................... 5 1.2 Cannings model.........................

More information

Theoretical Population Biology

Theoretical Population Biology Theoretical Population Biology 74 008 104 114 Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb A coalescent process with simultaneous

More information

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent

A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent A representation for the semigroup of a two-level Fleming Viot process in terms of the Kingman nested coalescent Airam Blancas June 11, 017 Abstract Simple nested coalescent were introduced in [1] to model

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Genetic hitch-hiking in a subdivided population

Genetic hitch-hiking in a subdivided population Genet. Res., Camb. (1998), 71, pp. 155 160. With 3 figures. Printed in the United Kingdom 1998 Cambridge University Press 155 Genetic hitch-hiking in a subdivided population MONTGOMERY SLATKIN* AND THOMAS

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

Ancestor Problem for Branching Trees

Ancestor Problem for Branching Trees Mathematics Newsletter: Special Issue Commemorating ICM in India Vol. 9, Sp. No., August, pp. Ancestor Problem for Branching Trees K. B. Athreya Abstract Let T be a branching tree generated by a probability

More information

Georgian Mathematical Journal 1(94), No. 4, ON THE INITIAL VALUE PROBLEM FOR FUNCTIONAL DIFFERENTIAL SYSTEMS

Georgian Mathematical Journal 1(94), No. 4, ON THE INITIAL VALUE PROBLEM FOR FUNCTIONAL DIFFERENTIAL SYSTEMS Georgian Mathematical Journal 1(94), No. 4, 419-427 ON THE INITIAL VALUE PROBLEM FOR FUNCTIONAL DIFFERENTIAL SYSTEMS V. ŠEDA AND J. ELIAŠ Astract. For a system of functional differential equations of an

More information

Genetic Algorithms applied to Problems of Forbidden Configurations

Genetic Algorithms applied to Problems of Forbidden Configurations Genetic Algorithms applied to Prolems of Foridden Configurations R.P. Anstee Miguel Raggi Department of Mathematics University of British Columia Vancouver, B.C. Canada V6T Z2 anstee@math.uc.ca mraggi@gmail.com

More information

Poisson random measure: motivation

Poisson random measure: motivation : motivation The Lévy measure provides the expected number of jumps by time unit, i.e. in a time interval of the form: [t, t + 1], and of a certain size Example: ν([1, )) is the expected number of jumps

More information

process on the hierarchical group

process on the hierarchical group Intertwining of Markov processes and the contact process on the hierarchical group April 27, 2010 Outline Intertwining of Markov processes Outline Intertwining of Markov processes First passage times of

More information

The range of tree-indexed random walk

The range of tree-indexed random walk The range of tree-indexed random walk Jean-François Le Gall, Shen Lin Institut universitaire de France et Université Paris-Sud Orsay Erdös Centennial Conference July 2013 Jean-François Le Gall (Université

More information

Modelling populations under fluctuating selection

Modelling populations under fluctuating selection Modelling populations under fluctuating selection Alison Etheridge With Aleksander Klimek (Oxford) and Niloy Biswas (Harvard) The simplest imaginable model of inheritance A population of fixed size, N,

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on

More information

Martingale Problems. Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi

Martingale Problems. Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi s Abhay G. Bhatt Theoretical Statistics and Mathematics Unit Indian Statistical Institute, Delhi Lectures on Probability and Stochastic Processes III Indian Statistical Institute, Kolkata 20 24 November

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition Filtrations, Markov Processes and Martingales Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition David pplebaum Probability and Statistics Department,

More information

STOCHASTIC FLOWS ASSOCIATED TO COALESCENT PROCESSES III: LIMIT THEOREMS

STOCHASTIC FLOWS ASSOCIATED TO COALESCENT PROCESSES III: LIMIT THEOREMS Illinois Journal of Mathematics Volume 5, Number 1, Spring 26, Pages 147 181 S 19-282 STOCHASTIC FLOWS ASSOCIATED TO COALESCENT PROCESSES III: LIMIT THEOREMS JEAN BERTOIN AND JEAN-FRANÇOIS LE GALL Abstract.

More information