How Nature Circumvents Low Probabilities: The Molecular Basis of Information and Complexity Peter Schuster Institut für Theoretische Chemie Universität Wien, Austria Nonlinearity, Fluctuations, and Complexity Brussels, 16. 19.03.2005
Web-Page for further information: http://www.tbi.univie.ac.at/~pks
Protein folding: Levinthal s paradox How can Nature find the native conformation in the folding process? Evolution: Wigner s paradox How can Nature find the optimal sequence of a protein in the evolutionary optimization process? Lysozyme A small protein molecule n = 130 amino acid residues 6 130 = 1.44 10 101 conformations 20 130 = 1.36 10 169 sequences
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
The golf course landscape Levinthal s paradox K.A. Dill, H.S. Chan, Nature Struct. Biol. 4:10-19
The pathway landscape The pathway solution to Levinthal s paradox K.A. Dill, H.S. Chan, Nature Struct. Biol. 4:10-19
The folding funnel The answer to Levinthal s paradox K.A. Dill, H.S. Chan, Nature Struct. Biol. 4:10-19
A more realistic folding funnel The answer to Levinthal s paradox K.A. Dill, H.S. Chan, Nature Struct. Biol. 4:10-19
An all (or many) paths lead to Rome situation N native conformation A reconstructed free energy surface for lysozyme folding: C.M. Dobson, A. Šali, and M. Karplus, Angew.Chem.Internat.Ed. 37: 868-893, 1988
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
Earlier abstract of the Origin of Species Charles Robert Darwin, 1809-1882 Alfred Russell Wallace, 1823-1913 The two competitors in the formulation of evolution by natural selection
(A) + I 1 f 1 I 1 + I 1 (A) + I 2 f 2 I 2 + I 2 dx / dt = f x - x i i i i Φ = x (f i - Φ i ) Φ =Σ j f j x j ; Σ x = 1 ; i,j =1,2,...,n j j (A) + I i f i I i + I i [I i] = x i 0 ; i =1,2,...,n ; [A] = a = constant f m = max { f j ; j=1,2,...,n} (A) + I m f m I m + I m x m (t) 1 for t (A) + I n f n I n + I n Reproduction of organisms or replication of molecules as the basis of selection
Selection equation: [I i ] = x i 0, f i > 0 n n ( f φ ), i = 1,2, L, n; x = 1; φ = f x f dx i = xi i i i j j j dt = = 1 = 1 Mean fitness or dilution flux, φ (t), is a non-decreasing function of time, n dφ = dt i= 1 f i dx dt i = f 2 ( ) 2 f = var{ f } 0 Solutions are obtained by integrating factor transformation ( 0) exp( f t) xi i xi () t = ; i = 1,2, L, n n x ( ) ( f jt) j = j 0 exp 1
s = ( f 2 -f 1 ) / f 1 ; f 2 > f 1 ; x 1 (0) = 1-1/N ; x 2 (0) = 1/N 1 Fraction of advantageous variant 0.8 0.6 0.4 0.2 s = 0.1 s = 0.02 s = 0.01 0 0 200 400 600 800 1000 Time [Generations] Selection of advantageous mutants in populations of N = 10 000 individuals
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
I 1 + I j f j Q j1 I 2 + I j Σ dx / dt = f x - x i j jq ji j i Φ Φ = Σ f x j j i ; Σ x = 1 ; j j Σ Q i ij = 1 f j Q j2 f j Q ji I i + I j [I i ] = x i 0 ; i =1,2,...,n ; [A] = a = constant (A) + I j f j Q jj I j + I j Q = (1-) p ij l-d(i,j) p d(i,j) p... Error rate per digit f j Q jn I n + I j l... Chain length of the polynucleotide d(i,j)... Hamming distance between I i and I j Chemical kinetics of replication and mutation as parallel reactions
Mutation-selection equation: [I i ] = x i 0, f i > 0, Q ij 0 dx i n n n = f Q x x i n x f x j j ji j i φ, = 1,2, L, ; i i = 1; φ = j j j dt = = 1 = 1 = 1 f Solutions are obtained after integrating factor transformation by means of an eigenvalue problem x i () t = n 1 k n j= 1 ( 0) exp( λkt) c ( 0) exp( λ t) l = 0 ik ck ; i = 1,2, L, n; c (0) = n 1 k l k= 0 jk k k n i= 1 h ki x i (0) W 1 { f Q ; i, j= 1,2, L, n} ; L = { l ; i, j= 1,2, L, n} ; L = H = { h ; i, j= 1,2, L, n} i ij ij ij { λ ; k = 0,1,, n 1} 1 L W L = Λ = k L
e 1 x 1 l 0 e 1 e 3 x 3 x 2 e 2 e 2 e 3 l 2 l 1 { } 3 The quasispecies on the concentration simplex S 3 = x 0, i = 1, 2,3; x = 1 i i= 1 i
Quasispecies Uniform distribution 0.00 0.05 0.10 Error rate p = 1-q Quasispecies as a function of the replication accuracy q
Concentration Master sequence Mutant cloud Off-the-cloud mutations Sequence space The molecular quasispecies in sequence space
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
Computer simulation of RNA optimization Walter Fontana and Peter Schuster, Biophysical Chemistry 26:123-147, 1987 Walter Fontana, Wolfgang Schnabl, and Peter Schuster, Phys.Rev.A 40:3301-3321, 1989
Walter Fontana, Wolfgang Schnabl, and Peter Schuster, Phys.Rev.A 40:3301-3321, 1989
Evolution in silico W. Fontana, P. Schuster, Science 280 (1998), 1451-1455
Mapping from sequence space into structure space and into function
Neutral networks are sets of sequences forming the same object in a phenotype space. The neutral network G k is, for example, the preimage of the structure S k in sequence space: G k = -1 (S k ) π { j (I j ) = S k } The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small biomolecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4 n, becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
-1 k k j j k G = ( S) U I ( I) = S λ j = 12 / 27 = 0.444, λ k = (k) j G k Connectivity threshold: / κ- λcr = 1 - κ -1 ( 1) λ λ Alphabet size : AUGC = 4 > λ k cr.... < λ k cr.... network G k is connected network G k is not connected cr 2 0.5 3 0.423 4 0.370 GC,AU GUC,AUG AUGC Mean degree of neutrality and connectivity of neutral networks
A connected neutral network formed by a common structure
Giant Component A multi-component neutral network formed by a rare structure
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures n = 100, stem-loop structures n = 30 RNA secondary structures and Zipf s law
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures 4. Neutral networks of common structures are connected
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures 4. Neutral networks of common structures are connected
Genotype-Phenotype Mapping I { S { = ( I { ) S { f = ƒ( S ) { { Evaluation of the Phenotype Mutation Q {j f 2 I 2 f 1 I1 I n I 1 f n I 2 f 2 f 1 I n+1 f n+1 f { Q f 3 I 3 Q f 3 I 3 f 4 I 4 I { f { I 4 I 5 f 4 f 5 f 5 I 5 Evolutionary dynamics including molecular phenotypes
Replication rate constant: f k = / [ + d S (k) ] d S (k) = d H (S k,s ) Selection constraint: Population size, N = # RNA molecules, is controlled by the flow N ( t) N ± N Mutation rate: p = 0.001 / site replication The flowreactor as a device for studies of evolution in vitro and in silico
Replication rate constant: f k = / [ + d S (k) ] d S (k) = d H (S k,s ) f 6 f 7 f5 f 0 f f 4 f 1 f 2 f 3 Evaluation of RNA secondary structures yields replication rate constants
3'-End 5'-End 70 60 10 50 20 30 40 Randomly chosen initial structure Phenylalanyl-tRNA as target structure
50 S Average structure distance to target d 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Evolutionary trajectory
Number of relay step 28 neutral point mutations during a long quasi-stationary epoch Average structure distance to target d S 20 10 0 Uninterrupted presence Evolutionary trajectory 250 500 08 10 12 14 Time (arbitrary units) Transition inducing point mutations Neutral point mutations Neutral genotype evolution during phenotypic stasis
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
Evolutionary trajectory Spreading of the population through diffusion on the neutral network Velocity of the population center in sequence space
Spread of population in sequence space during a quasistationary epoch: t = 150
Spread of population in sequence space during a quasistationary epoch: t = 170
Spread of population in sequence space during a quasistationary epoch: t = 200
Spread of population in sequence space during a quasistationary epoch: t = 350
Spread of population in sequence space during a quasistationary epoch: t = 500
Spread of population in sequence space during a quasistationary epoch: t = 650
Spread of population in sequence space during a quasistationary epoch: t = 820
Spread of population in sequence space during a quasistationary epoch: t = 825
Spread of population in sequence space during a quasistationary epoch: t = 830
Spread of population in sequence space during a quasistationary epoch: t = 835
Spread of population in sequence space during a quasistationary epoch: t = 840
Spread of population in sequence space during a quasistationary epoch: t = 845
Spread of population in sequence space during a quasistationary epoch: t = 850
Spread of population in sequence space during a quasistationary epoch: t = 855
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
A ribozyme switch E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis- -virus (B)
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
1. Solutions to Levinthal s paradox 2. Selection as solution to low probabilities in evolution 3. Origin of information by mutation and selection 4. Evolution of RNA phenotypes 5. The role of neutrality in molecular evolution 6. An experiment with RNA molecules 7. Summary
Example of a smooth landscape on Earth Mount Fuji
Dolomites Bryce Canyon Examples of rugged landscapes on Earth
Fitness End of Walk Start of Walk Genotype Space Evolutionary optimization in absence of neutral paths in sequence space
Adaptive Periods End of Walk Fitness Random Drift Periods Start of Walk Genotype Space Evolutionary optimization including neutral paths in sequence space
Example of a landscape on Earth with neutral ridges and plateaus Grand Canyon
Conformational and mutational landscapes of biomolecules as well as fitness landscapes of evolutionary biology are rugged. Adaptive or non-descending walks on rugged landscapes end commonly at one of the low lying local maxima. Fitness End of Walk Start of Walk Genotype Space End of Walk Selective neutrality in the form of neutral networks plays an active role in evolutionary optimization and enables populations to reach high local maxima or even the global optimum. Fitness Start of Walk Genotype Space
Acknowledgement of support Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Universität Wien Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Project No. EU-980189 Austrian Genome Research Program GEN-AU Siemens AG, Austria Universität Wien and the Santa Fe Institute
Coworkers Walter Fontana, Harvard Medical School, MA Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Universität Wien Peter Stadler, Bärbel Stadler, Universität Leipzig, GE Jord Nagel, Kees Pleij, Universiteit Leiden, NL Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT Andreas Wernitznig, Michael Kospach, Universität Wien, AT Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder Jan Cupal, Kurt Grünberger, Andreas Svrcek-Seiler, Stefan Wuchty Stefan Bernhart, Lukas Endler Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber
Web-Page for further information: http://www.tbi.univie.ac.at/~pks