Can evolution paths be explained by chance alone? arxiv: v1 [math.pr] 12 Oct The model

Similar documents
Predicting Fitness Effects of Beneficial Mutations in Digital Organisms

Accessibility percolation with backsteps

Haploid & diploid recombination and their evolutionary impact

V. Evolutionary Computing. Read Flake, ch. 20. Assumptions. Genetic Algorithms. Fitness-Biased Selection. Outline of Simplified GA

Evolving Antibiotic Resistance in Bacteria by Controlling Mutation Rate

V. Evolutionary Computing. Read Flake, ch. 20. Genetic Algorithms. Part 5A: Genetic Algorithms 4/10/17. A. Genetic Algorithms

IV. Evolutionary Computing. Read Flake, ch. 20. Assumptions. Genetic Algorithms. Fitness-Biased Selection. Outline of Simplified GA

Evolution in Action: The Power of Mutation in E. coli

There s plenty of time for evolution

Introduction to Digital Evolution Handout Answers

Biology 11 UNIT 1: EVOLUTION LESSON 2: HOW EVOLUTION?? (MICRO-EVOLUTION AND POPULATIONS)

Non-Adaptive Evolvability. Jeff Clune

Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation

WXML Final Report: Chinese Restaurant Process

Lecture Notes: Markov chains

Introduction to population genetics & evolution

Computational approaches for functional genomics

Local and Stochastic Search

lim Bounded above by M Converges to L M

Random Boolean Networks

SEMELPAROUS PERIODICAL INSECTS

September Math Course: First Order Derivative

WHEN MORE SELECTION IS WORSE

Chapter 13 - Inverse Functions

The Evolution of Gene Dominance through the. Baldwin Effect

The Evolution of Sex Chromosomes through the. Baldwin Effect

Local Search & Optimization

A Simple Haploid-Diploid Evolutionary Algorithm

Lectures on Probability and Statistical Models

Modeling Noise in Genetic Sequences

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

Evolutionary Computation

Year 10 Science Learning Cycle 3 Overview

Last Update: April 7, 201 0

A Simple Model of Unbounded Evolutionary Versatility as a Largest-Scale Trend in Organismal Evolution

Theorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies,

BLAST: Target frequencies and information content Dannie Durand

MA 777: Topics in Mathematical Biology

Why EvoSysBio? Combine the rigor from two powerful quantitative modeling traditions: Molecular Systems Biology. Evolutionary Biology

Basic modeling approaches for biological systems. Mahesh Bule

Formalizing the gene centered view of evolution

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

1 Sequences of events and their limits

Bacterial Genetics & Operons

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Local Search & Optimization

EXAM 3 MAT 167 Calculus I Spring is a composite function of two functions y = e u and u = 4 x + x 2. By the. dy dx = dy du = e u x + 2x.

Statistical mechanics of fitness landscapes

Lecture 11: Non-Equilibrium Statistical Physics

Experiments with. Cascading Design. Ben Kovitz Fluid Analogies Research Group Indiana University START

Math 266: Autonomous equation and population dynamics

Assignment 16 Assigned Weds Oct 11

2. Overproduction: More species are produced than can possibly survive

Citation PHYSICAL REVIEW E (2005), 72(1) RightCopyright 2005 American Physical So

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1

A.I.: Beyond Classical Search

Systems Biology: A Personal View IX. Landscapes. Sitabhra Sinha IMSc Chennai

MAT30S Grade 10 Review Mr. Morris

Introduction to course: BSCI 462 of BIOL 708 R

6.842 Randomness and Computation March 3, Lecture 8

BIOINFORMATICS: An Introduction

Processes of Evolution

Mixing Times and Hitting Times

arxiv: v1 [math.pr] 6 Jan 2014

Haploid-Diploid Algorithms

Organizing Diversity Taxonomy is the discipline of biology that identifies, names, and classifies organisms according to certain rules.

Closeness to the Diagonal for Longest Common Subsequences

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

Predator escape: an ecologically realistic scenario for the evolutionary origins of multicellularity. Student handout

Evolving plastic responses to external and genetic environments

Chaos and Liapunov exponents

SIMPLE RANDOM WALKS: IMPROBABILITY OF PROFITABLE STOPPING

Interdisciplinary Lively Application Project Spread of Disease Activity

Some Review Problems for Exam 2: Solutions

14.30 Introduction to Statistical Methods in Economics Spring 2009

Extensive evidence indicates that life on Earth began more than 3 billion years ago.

So in terms of conditional probability densities, we have by differentiating this relationship:

Derivation of Itô SDE and Relationship to ODE and CTMC Models

Computational Biology: Basics & Interesting Problems

Logarithmic scaling of planar random walk s local times

Lecture 22. Introduction to Genetic Algorithms

The Derivative of a Function

Mechanisms of evolution in experiment and theory. Christopher Knight

Bioinformatics: Network Analysis

8 Basics of Hypothesis Testing

Statistical Models in Evolutionary Biology An Introductory Discussion

Spatial Moran model. Rick Durrett... A report on joint work with Jasmine Foo, Kevin Leder (Minnesota), and Marc Ryser (postdoc at Duke)

Lecture 4: September 19

Biological Networks Analysis

Network motifs in the transcriptional regulation network (of Escherichia coli):

Population growth: Galton-Watson process. Population growth: Galton-Watson process. A simple branching process. A simple branching process

56:198:582 Biological Networks Lecture 8

Lecture 2: Convergence of Random Variables

Name Date #92 pg. 1. Growth: Tables, Graphs & Evaluating Equations Complete the tables and graph each function, then answer the questions.

Miller & Levine Biology

Name: Firas Rassoul-Agha

Bi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016

Markov Chains and Hidden Markov Models. = stochastic, generative models

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Transcription:

Can evolution paths be explained by chance alone? Rinaldo B. Schinazi Department of Mathematics University of Colorado at Colorado Springs rinaldo.schinazi@uccs.edu arxiv:1810.05705v1 [math.pr] 12 Oct 2018 Abstract We propose a purely probabilistic model to explain the evolution path of a population maximum fitness. We show that after n births in the population there are about ln n upwards jumps. This is true for any mutation probability and any fitness distribution and therefore suggests a general law for the number of upwards jumps. Simulations of our model show that a typical evolution path has first a steep rise followed by long plateaux. Moreover, independent runs show parallel paths. This is consistent with what was observed by Lenski and Travisano (1994) in their bacteria experiments. Keywords: evolution, probability model, adaptive walk The model Lenski and Travisano (1994) studied the evolution of the bacteria Escherichia coli for 10,000 generations. They graphed the average fitness as a function of time. We were intrigued by the regularity of their curves (steep rise at first and then long plateaux) and the fact that different lines of bacteria had distinct but somehow parallel evolution paths. Can a simple probability model account for such curves? This is what we attempt to answer in this paper. We now describe our model. Let µ be the probability distribution for individual fitness. Let 0 < s < 1 be the mutation probability per birth. We start with a single individual whose fitness is a λ 0 sampled from the distribution µ. At every discrete time unit there is exactly one birth in the population. There are two possibilities. With probability 1 s the new individual has no mutation. We assign to the new individual a fitness that has previously appeared in the population. The exact way a fitness is assigned to the new individual is not important for our purposes. We could for instance sample a random individual in the population and assign the sampled individual fitness to the new individual. With probability s the new individual has a mutation. We sample a new λ from the distribution µ. This new λ will be the fitness of the new individual. Assuming mutations appear or not independently at each birth, the number of mutations at time n has a binomial distribution b(n, s) with parameters n and s. Assuming the distribution µ is continuous every sampled fitness is distinct from all previously sampled fitnesses. Hence, at time n there is a sequence λ 0, λ 1, λ 2,..., λ b(n,s) (numbered in the order of appearance) of distinct fitnesses that have been sampled from the distribution

µ. Moreover, every individual born up to time n has been assigned one of these b(n, s) fitnesses. The number of upwards jumps Let r(n) be the number of records in the sequence λ 0, λ 1, λ 2,..., λ b(n,s). More precisely, r(n) is the number of indices k in {0, 1,..., b(n, s)} such that λ k = max{λ 0, λ 1,..., λ k }. In words, r(n) is the number of times that the sequence of fitnesses reaches a maximum. Hence, at time n there are r(n) upwards jumps in the evolution path. Assume that µ is a continuous probability distribution with support in (0, + ). Our main result is r(n) (1) lim n ln n = 1. Here are a few consequences of this result. The limit in (1) does not depend on the mutation probability s nor on the probability distribution µ! That is, in this model the number of upwards mutations follows a general law which is independent of the mutation probability and of the distribution of the fitness. By the Law Large Numbers the total number of mutations b(n, s) up to time n is of order ns for n large. Hence, (1) shows that the fraction of upwards mutations as compared to the total number of mutations is of order ln n. ns When an upwards mutation appears it need not persist in the population. By bad luck it may disappear before giving birth. Hence, ln n is really an upper bound on the number of upwards mutations. Simulation We graph the maximum fitness as a function of time for three independent runs, see the figure below. We use for µ (i.e. the fitness of the distribution) a mean 1 exponential distribution and we take the mutation probability per birth to be s = 0.02. For each run the total number of births is 10 6. The initial individual is sampled independently for each run from µ. The number of jumps per run are 11, 11 and 7, respectively. We observe a steep rise in the beginning followed by long plateaux. The three runs are somewhat parallel. Literature There is a vast literature on adaptive walks that goes back to at least Gillepsie (1983), see also Orr (2002) and (2003). Our model is somewhat related to these models. However, 2

our point of view is different. Many papers are concerned with the number of steps that a single DNA sequence makes in a certain fitness landscape before being stuck at a local maximum. On the other hand we are concerned with the global maximum of a population for which all jumps are allowed. Our main purpose is to test whether chance alone can produce an evolution path for the whole population consistent with biological observations. Next we describe an adaptive walk first studied by Kauffman and Levin (1987) for which there is a result reminiscent of ours. In that model a genome is a string of n sites. Every site has a 0 or a 1. The genome evolves by flipping one site at a time. Each of the 2 n possible configurations of the genome is assigned a fitness a priori. In the simplest variation of this model fitnesses are assigned independently to every configuration according to a uniform distribution. An accessible path is a path going from all 0 s configuration to all 1 s configuration along a sequence of configurations that increases fitness at every step. Hegarty and Martinsson (2014) proved that the probability that an accessible path exists is of order ln n as n goes to infinity. Note that the fraction of n upwards mutations in our model is of the same order. Discussion Our model shows that by chance alone one can obtain evolution paths consistent with those obtained experimentally by Lenski and Travisano (1994) for bacteria lines. Moreover, independent runs of our model have somewhat parallel trajectories, see the figure below. This is also what is observed by Lenski and Travisano (1994) for different bacteria lines. They formulate two hypotheses to explain this. The first is that the bacteria lines evolve in identical environments and the second is that large population sizes will trigger identical mutations in both lines. In our model the environment is determined by the fitness distribution µ. We also find that the specific trajectories depend on µ but the general shape of the trajectory (i.e. fast rising at first and then reaching successive plateaux) does not. As for the second hypothesis, in our model two mutations are never identical and we still get parallel trajectories. Hence, our model gives arguments against the hypotheses formulated by Lenski and Travisano (1994) but supports their general conclusion that our experiment demonstrates the crucial role of chance events. Proof of (1) The proof of (1) is a consequence of a classical result in probability theory that we now state. Let X 1, X 2,..., X n be a sequence of independent identically distributed random variables with the same continuous distribution. Let R n be the number of records in this sequence. That is, R n is the number indices k in {1, 2,..., n} such that X k = max{x 1, X 2,..., X k }. 3

Then, R n (2) lim n ln n = 1. For a proof of (2) see for instance Port (1994). We now turn to the proof of (1). Recall that the total number of mutations up to time n is b(n, s), a binomial random variable with parameters n and s. Hence, the sequence of λ s up to time n has length b(n, s) and r(n) (i.e. the number of records in that sequence) is such that r(n) = R b(n,s). Since b(n, s) goes to infinity with n (by the Law of Large Numbers) we have by (2) that lim n r(n) ln b(n, s) = 1, where the convergence above as well as all subsequent convergences occurs almost surely (i.e. with probability one). By the Law of Large Numbers lim n b(n,s) n = s > 0. Therefore, lim n ln b(n,s) ln n = 1. Observe that r(n) ln n = r(n) ln b(n, s) ln b(n, s) ln n. We have shown that both fractions on the r.h.s. converge to 1. So does the l.h.s. The proof of (1) is complete. Remark. Intuitively one may expect that a higher probability mutation will trigger more upwards mutations. Our result (1) shows that in the long run this is not so. The total number of mutations is increasing as a function of s but the number of upwards mutations is not. We explain why. As noted before the total number of mutations b(n, s) up to time n has a binomial distribution with parameters n and s. Let 0 < s 1 < s 2 < 1. It is easy to couple the models with mutation probabilities s 1 and s 2 respectively so that for every n 1, b(n, s 1 ) b(n, s 2 ) and {λ 0,..., λ b(n,s1 )} {λ 0,..., λ b(n,s2 )}. In particular, at any given time the model with s 2 has had more mutations and has a larger maximum fitness than the model with s 1. On the other hand the model with s 2 need not have more upwards mutations than the model with s 1. References J.H. Gillepsie (1983) A simple stochastic gene substitution model. Theoretical Population Biology 23, 202-215. P. Hegarty and S. Martinsson (2014) On the existence of accessible paths in various models of fitness landscapes. Annals of Applied Probability 24, 1375-1395. S. Kaufman and S. Levin (1987) Towards a general theory of adaptive walks in rugged landscapes. Journal of theoretical biology 128, 11-45. 4

R.E. Lenski and M. Travisano (1994) Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. PNAS 91, 6808-6814. H.A. Orr (2002) The population genetics of adaptation: the adaptation of DNA sequences. Evolution 56, 1317-1330. H.A. Orr (2003) A minimum on the mean number of steps taken in adaptive walks. Journal of theoretical biology 220, 241-247. S. C. Port (1994) Theoretical probability for applications. Wiley. 5

Maximum fitness 12 10 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 Time 10 5 6