Neutral Networks of RNA Genotypes and RNA Evolution in silico Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien RNA Secondary Structures in Dijon Dijon, 24. 26.06.2002
No new principle will declare itself from below a heap of facts. Sir Peter Medawar, 1985
The genotypes or genomes of individuals and species, being reproductively related ensembles of individuals, are DNA or RNA sequences. They are changing from generation to generation through mutation and recombination. Genotypes unfold into phenotypes or organisms, which are the targets of the evolutionary selection process. Point mutations are single nucleotide exchanges. The Hamming distance of two sequences is the minimal number of single nucleotide exchanges that mutually converts the two sequence into each other.
A A A A A U U U U U U C C C C C C C C G G G G G G G G A U C G = adenylate = uridylate = cytidylate = guanylate 5 - -3 Genotype: The sequence of an RNA molecule consisting of monomers chosen from four classes.
Phenotype: Three-dimensional structure of phenylalanyl transfer-rna
Sequence 5'-End 3'-End GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYA UCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA 3'-End 5'-End 70 Secondary Structure 10 60 50 20 30 40 Symbolic Notation 5'-End 3'-End Definition and formation of the secondary structure of phenylalanyl-trna
CGTCGTTACAATTTAGGTTATGTGCGAATTCACAAATTGAAAATACAAGAG... CGTCGTTACAATTTAAGTTATGTGCGAATTCCCAAATTAAAAACACAAGAG... Hamming distance d (S,S ) = H 1 2 4 (i) (ii) (iii) d (S,S) = 0 H 1 1 d (S,S) = d (S,S) H 1 2 H 2 1 d (S,S) d (S,S) + d (S,S) H 1 3 H 1 2 H 2 3 The Hamming distance induces a metric in sequence space
Hydrogen bonds Hydrogen bonding between nucleotide bases is the principle of template action of RNA and DNA.
5' 3' Plus Strand G C C C G Synthesis 5' 3' Plus Strand G C C C G C 3' G Synthesis 5' 3' Plus Strand G C C C G Minus Strand C G G G C 3' 5' Complex Dissociation 5' 3' Plus Strand G C C C G 5' Minus Strand C G G + G C 3' Complementary replication as the simplest copying mechanism of RNA
(A) + I 1 f 1 I 1 + I 1 dx / dt = f x - x j j j j Φ =( f j - Φ) xj (A) + I 2 f 2 I 2 + I 2 Φ =Σ i f i x i ; Σ x = 1 ; i,j i i =1,2,...,n [A] = a = constant f m = max { f; j=1,2,...,n} j (A) + I j f j Ij + Ij x (t) 1 for t m (A) + I m f m I m + I m s = (f m+1 -f m )/f m succession of temporarily fittest variants: m m+1... (A) + I n f n I n + I n Selection of the fittest or fastest replicating species I m
Stock Solution [A] = a 0 Reaction Mixture: A; I, k=1,2,... k k 1 A + I 2 I 1 1 d 1 k 2 A + I 2 I 2 2 d 2 k 3 A + I 2 I 3 3 d 3 k 4 A + I 2 I 4 4 d 4 k 5 A + I 2 I 5 5 d 5 Replication in the flow reactor P.Schuster & K.Sigmund, Dynamics of evolutionary optimization, Ber.Bunsenges.Phys.Chem. 89: 668-682 (1985)
Concentration of stock solution a 0 A + I + I 1 2 A + I 1 A k 1 A + I 2 I 1 1 d 1 k 2 A + I 2 I 2 2 d 2 k 3 A + I 2 I 3 3 d 3 k 4 A + I 2 I 4 4 d 4 k 5 A + I 2 I 5 5 d 5 k > k > k > k > k 1 2 3 4 5 Flow rate r = R -1 Selection in the flow reactor: Reversible replication reactions
Concentration of stock solution a 0 A + I 1 A f 1 A + I 2 I 1 1 f 2 A + I 2 I 2 2 f 3 A + I 2 I 3 3 f 4 A + I 2 I 4 4 f 5 A + I 2 I 5 5 f > f > f > f > f 1 2 3 4 5 Flow rate r = R -1 Selection in the flow reactor: Irreversible replication reactions
1 Fraction of advantageous variant 0.8 0.6 0.4 0.2 s = 0.1 s = 0.02 s = 0.01 0 0 200 400 600 800 1000 Time [Generations] Selection of advantageous mutants in populations of N = 10 000 individuals
Σ dx / dt = f x - x j i iq ji i j Φ I 1 + Ij Φ = Σ f x i i i ; Σ x = 1 ; i i Σ Q i ij = 1 f j Q 1j I 2 + Ij Q = (1-p) p ij n-d(i,j) d(i,j) p... Error rate per digit f j Q 2j d(i,j)... Hamming distance between I i and Ij (A) + I j f j Q jj Ij + Ij [A] = a = constant f j Q nj I n + Ij Chemical kinetics of replication and mutation
Concentration Master sequence Mutant cloud Sequence space The molecular quasispecies in sequence space
The RNA model considers RNA sequences as genotypes and simplified RNA structures, called secondary structures, as phenotypes. The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks. Neutral networks are represented by graphs in sequence space.
S k = ψ( I. ) f k = fs ( k ) Sequence space Phenotype space Non-negative numbers Mapping from sequence space into phenotype space and into fitness values
A multi-component neutral network Giant Component
A connected neutral network
Optimization of RNA molecules in silico W.Fontana, P.Schuster, A computer model of evolutionary optimization. Biophysical Chemistry 26 (1987), 123-147 W.Fontana, W.Schnabl, P.Schuster, Physical aspects of evolutionary optimization and adaptation. Phys.Rev.A 40 (1989), 3301-3321 M.A.Huynen, W.Fontana, P.F.Stadler, Smoothness within ruggedness. The role of neutrality in adaptation. Proc.Natl.Acad.Sci.USA 93 (1996), 397-401 W.Fontana, P.Schuster, Continuity in evolution. On the nature of transitions. Science 280 (1998), 1451-1455 W.Fontana, P.Schuster, Shaping space. The possible and the attainable in RNA genotypephenotype mapping. J.Theor.Biol. 194 (1998), 491-515
Evolution in the Flow Reactor: The RNA Model Sequence-structure map Structure-function map Environment ψ : f Ω(t) h s { I; d } { S; d }. ij ij { s S; } R. : d ij + Mapping into fitness values f k ( S, Ω( t) ) = f ( ( I, Ω( t)), Ω( )) ( t) = f ψ t k k dx dt k ( Q f ( t) Φ( t) ) + Q f ( t) x + η ( x, t) ω ( t), k = 1,..., n = xk kk k n j= 1, j k kj j j k k Wiener process dw ( t) = ω ( t) dt
Genotype-Phenotype Mapping I { S { = ( I { ) S { f = ƒ( S ) { { Evaluation of the Phenotype Mutation Q {j f 2 I 2 f 1 I1 I n I 1 f n I 2 f 2 f 1 I n+1 f n+1 f { Q f 3 I 3 Q f 3 I 3 f 4 I 4 I { f { I 4 I 5 f 4 f 5 f 5 I 5 Evolutionary dynamics including molecular phenotypes
Concentration Master sequence Mutant cloud Off-the-cloud mutations Sequence space The molecular quasispecies and mutations producing new variants
Stock Solution Reaction Mixture The flowreactor as a device for studies of evolution in vitro and in silico
50 S Average structure distance to target d 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory
36 Relay steps Number of relay step 38 Evolutionary trajectory 1250 44 10 0 40 42 44 Time Average structure distance to target d S Endconformation of optimization
36 Relay steps Number of relay step 38 Evolutionary trajectory 1250 44 43 10 0 40 42 44 Time Average structure distance to target d S Reconstruction of the last step 43 44
36 Relay steps Number of relay step 44 43 Average structure distance to target d S 42 Evolutionary trajectory 1250 10 0 38 40 42 44 Reconstruction of last-but-one step 42 43 ( 44) Time
36 Relay steps Number of relay step 44 43 Average structure distance to target d S 41 42 Evolutionary trajectory 1250 10 0 38 40 42 44 Reconstruction of step 41 42 ( 43 44) Time
36 Relay steps Number of relay step 44 43 Average structure distance to target d S 40 41 42 Evolutionary trajectory 1250 10 0 38 40 42 44 Reconstruction of step 40 41 ( 42 43 44) Time
Average structure distance to target d S 10 Relay steps 36 38 40 42 44 Number of relay step Evolutionary trajectory 0 1250 Time Evolutionary process 39 40 41 42 43 44 Reconstruction Reconstruction of the relay series
Transition inducing point mutations Neutral point mutations Change in RNA sequences during the final five relay steps 39 44
50 Relay steps S Average structure distance to target d 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory and relay steps
50 Relay steps S Average structure distance to target d 40 30 20 10 Uninterrupted presence Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Uninterrupted presence
Number of relay step Average structure distance to target d S 20 10 0 Uninterrupted presence Evolutionary trajectory 250 500 08 10 12 14 Time (arbitrary units) Transition inducing point mutations Neutral point mutations Neutral genotype evolution during phenotypic stasis
Number of relay step Average structure distance to target d S 30 20 10 Uninterrupted presence Evolutionary trajectory 750 1000 1250 20 25 30 35 Time (arbitrary units) 18 20 19 21 26 28 29 31 A random sequence of minor or continuous transitions in the relay series
18 20 19 21 26 28 29 31 A random sequence of minor or continuous transitions in the relay series
Shortening of Stacks Elongation of Stacks Multiloop Minor or continuous transitions: Occur frequently on single point mutations Opening of Constrained Stacks
50 Relay steps S Average structure distance to target d 40 30 20 10 Uninterrupted presence Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Uninterrupted presence
Main transition leading to clover leaf 36 37 38 Average structure distance to target d S 10 Relay steps Evolutionary trajectory 36 38 40 42 44 Number of relay step 0 1250 Time Reconstruction of a main transitions 36 37 ( 38)
50 Relay steps Main transitions Average structure distance to target d S 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Main transitions
Shift Roll-Over α α a Flip a α a b Double Flip a α b β β Main or discontinuous transitions: Structural innovations, occur rarely on single point mutations Closing of Constrained Stacks Multiloop
50 Relay steps Main transitions Average structure distance to target d S 40 30 20 10 Uninterrupted presence Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor
Variation in genotype space during optimization of phenotypes
Statistics of evolutionary trajectories Population size N Number of replications < n > rep Number of transitions < n > tr Number of main transitions < n > dtr The number of main transitions or evolutionary innovations is constant.
00 09 31 44 Three important steps in the formation of the trna clover leaf from a randomly chosen initial structure corresponding to three main transitions.
Main results of computer simulations of molecular evolution No trajectory was reproducible in detail. Sequences of target structures were different. Nevertheless solutions of comparable or the same quality are almost always achieved. Transitions between molecular phenotypes represented by RNA structures can be classified with respect to the induced structural changes. Highly probable minor transitions are opposed by main transitions with low probability of occurrence. Main transitions represent important innovations in the course of evolution. The number of minor transitions decreases with increasing population size. The number of main transitions or evolutionary innovations is approximately constant for given start and stop structures. Not all structures are accessible through evolution in the flow reactor. An example is the trna clover leaf for GC-only sequences.
...Variations neither useful not injurious would not be affected by natural selection, and would be left either a fluctuating element, as perhaps we see in certain polymorphic species, or would ultimately become fixed, owing to the nature of the organism and the nature of the conditions.... Charles Darwin, Origin of species (1859)
Adaptive Periods End of Walk Fitness Random Drift Periods Start of Walk Genotype Space Evolution in genotype space sketched as a non-descending walk in a fitness landscape
Coworkers Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Universität Leipzig, GE Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT Michael Kospach, Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber