RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm. Peter Schuster

Similar documents
Evolution of Biomolecular Structure 2006 and RNA Secondary Structures in the Years to Come. Peter Schuster

Is the Concept of Error Catastrophy Relevant for Viruses? Peter Schuster

How Nature Circumvents Low Probabilities: The Molecular Basis of Information and Complexity. Peter Schuster

Error thresholds on realistic fitness landscapes

Designing RNA Structures

Neutral Networks of RNA Genotypes and RNA Evolution in silico

Evolution on simple and realistic landscapes

How computation has changed research in chemistry and biology

Evolution and Molecules

Mathematical Modeling of Evolution

Tracing the Sources of Complexity in Evolution. Peter Schuster

Complexity in Evolutionary Processes

RNA From Mathematical Models to Real Molecules

RNA From Mathematical Models to Real Molecules

Mechanisms of molecular cooperation

Mathematische Probleme aus den Life-Sciences

The Advantage of Using Mathematics in Biology

Evolution on Realistic Landscapes

Chemistry and Evolution at the Origin of Life. Visions and Reality

Von der Thermodynamik zu Selbstorganisation, Evolution und Information

Self-Organization and Evolution

Problem solving by inverse methods in systems biology

Different kinds of robustness in genetic and metabolic networks

RNA folding at elementary step resolution

Complete Suboptimal Folding of RNA and the Stability of Secondary Structures

COMP598: Advanced Computational Biology Methods and Research

RNA secondary structure prediction. Farhat Habib

Origin of life and early evolution in the light of present day molecular biology. Peter Schuster

Chemistry on the Early Earth

Prediction of Locally Stable RNA Secondary Structures for Genome-Wide Surveys

Systems biology and complexity research

The Role of Topology in the Study of Evolution

Algorithms in Bioinformatics

arxiv: v1 [q-bio.bm] 31 Oct 2017

Plasticity, Evolvability, and Modularity in RNA

On low energy barrier folding pathways for nucleic acid sequences

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

Sparse RNA Folding Revisited: Space-Efficient Minimum Free Energy Prediction

Replication and Mutation on Neutral Networks: Updated Version 2000

Vom Modell zur Steuerung

Design of Multi-Stable RNA Molecules

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

NP-completeness of the direct energy barrier problem without pseudoknots

The RNA World, Fitness Landscapes, and all that

DANNY BARASH ABSTRACT

EVOLUTIONARY DYNAMICS ON RANDOM STRUCTURES SIMON M. FRASER CHRISTIAN M, REIDYS DISCLAIMER. United States Government or any agency thereof.

Predicting RNA Secondary Structure

Quantitative modeling of RNA single-molecule experiments. Ralf Bundschuh Department of Physics, Ohio State University

Evolutionary Dynamics and Optimization. Neutral Networks as Model-Landscapes. for. RNA Secondary-Structure Folding-Landscapes

The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands

SI Appendix. 1. A detailed description of the five model systems

IIASA. Continuity in Evolution: On the Nature of Transitions INTERIM REPORT. IR / April

A New Model for Approximating RNA Folding Trajectories and Population Kinetics

Dot Bracket Notation for RNA and DNA nanostructures. Slides by Reem Mokhtar

BarMap & SundialsWrapper

Structure-Based Comparison of Biomolecules

Chance and necessity in evolution: lessons from RNA

RNA Abstract Shape Analysis

Lecture 3: Markov chains.

Rapid Dynamic Programming Algorithms for RNA Secondary Structure

Elucidation of the RNA-folding mechanism at the level of both

Lab III: Computational Biology and RNA Structure Prediction. Biochemistry 208 David Mathews Department of Biochemistry & Biophysics

arxiv: v1 [q-bio.bm] 16 Aug 2015

Quantifying slow evolutionary dynamics in RNA fitness landscapes

RNA Secondary Structures: A Tractable Model of Biopolymer Folding

Computational Biology and Chemistry

Junction-Explorer Help File

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

QUANTIFYING SLOW EVOLUTIONARY DYNAMICS IN RNA FITNESS LANDSCAPES

Neutral Evolution of Mutational Robustness

Journal of Mathematical Analysis and Applications

Networks in Molecular Evolution

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

Lecture 8: RNA folding

Sparse RNA folding revisited: space efficient minimum free energy structure prediction

The Riboswitch is functionally separated into the ligand binding APTAMER and the decision-making EXPRESSION PLATFORM

Stochastic Pairwise Alignments. Ulrike Mückstein a, Ivo L. Hofacker a,, and Peter F. Stadler a,b. Abstract 1 INTRODUCTION 2 THEORY

Introduction to Polymer Physics

Extending the Tools of Chemical Reaction Engineering to the Molecular Scale

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters

CelloS: a Multi-level Approach to Evolutionary Dynamics

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

RNA Matrices and RNA Secondary Structures

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands

Arrhenius Lifetimes of RNA Structures from Free Energy Landscapes

CS681: Advanced Topics in Computational Biology

A Note On Minimum Path Bases

Introduction to Evolutionary Concepts

When to use bit-wise neutrality

Wang-Landau sampling for Quantum Monte Carlo. Stefan Wessel Institut für Theoretische Physik III Universität Stuttgart

BIOINFORMATICS. Prediction of RNA secondary structure based on helical regions distribution

The wonderful world of RNA informatics

Monte Carlo (MC) Simulation Methods. Elisa Fadda

Lecture 8: RNA folding

Mitochondrial Genome Annotation

Beyond Energy Minimization: Approaches to the kinetic Folding of RNA

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

COMBINATORICS OF LOCALLY OPTIMAL RNA SECONDARY STRUCTURES

Chapter 6 Cyclic urea - a new central unit in bent-core compounds

Transcription:

RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm Peter Schuster Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA 2008 Molecular Informatics and Bioinformatics Collegium Budapest, 27. 29.03.2008

Web-Page for further information: http://www.tbi.univie.ac.at/~pks

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

O 5' - end CH 2 O N 1 5'-end GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA 3 -end O OH N k = A, U, G, C O P O CH 2 O N 2 Na O O OH O P O CH 2 O N 3 Na O Definition of RNA structure O OH O P O CH 2 O N 4 Na O O OH O P O 3' - end Na O

N = 4 n N S < 3 n Criterion: Minimum free energy (mfe) Rules: _ ( _ ) _ {AU,CG,GC,GU,UA,UG} A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs

Conventional definition of RNA secondary structures

H-type pseudoknot

S n n+ = Sn + 1 1 S j= j 1 1 S n j Counting the numbers of structures of chain length n n+1 M.S. Waterman, T.F. Smith (1978) Math.Bioscience 42:257-266

Restrictions on physically acceptable mfe-structures: 3 and 2

Size restriction of elements: (i) hairpin loop (ii) stack σ λ stack loop n n + = + + + = + + + + Ξ = Φ Φ + = Ξ + Φ Ξ = 2 1)/ ( 1 1 2 1 2 2 2 1 1 1 1 1 λ σ σ λ m k k m m m k k m k m m m m m S S S S n # structures of a sequence with chain length n Recursion formula for the number of physically acceptable stable structures I.L.Hofacker, P.Schuster, P.F. Stadler. 1998. Discr.Appl.Math. 89:177-207

RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function Biophysical chemistry: thermodynamics and kinetics Empirical parameters RNA structure of minimal free energy Sequence, structure, and design

S 5 (h) S 4 (h) S 3 (h) 0 Free energy G S 6 (h) S 7 (h) S 8 (h) (h) S (h) 1 S 2 (h) S 9 Suboptimal conformations S 0 (h) Minimum of free energy The minimum free energy structures on a discrete space of conformations

Elements of RNA secondary structures as used in free energy calculations gij kl + h( nl ) + b ( nb ) + i ( ni + 300 G0 = ), stacks base pairs of hairpin loops bulges internal loops L

Maximum matching An example of a dynamic programming computation of the maximum number of base pairs Back tracking yields the structure(s). [i,k-1] [ k+1,j ] j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i G G C G C G C C C G G C G C C 1 G * * 1 1 1 1 2 3 3 3 4 4 5 6 6 2 G * * 0 1 1 2 2 2 3 3 4 4 5 6 3 C * * 0 1 1 1 2 3 3 3 4 5 5 4 G * * 0 1 1 2 2 2 3 4 5 5 5 C * * 0 1 1 2 2 3 4 4 4 6 G * * 1 1 1 2 3 3 3 4 7 C * * 0 1 2 2 2 2 3 8 C * * 1 1 1 2 2 2 9 C * * 1 1 2 2 2 10 G * * 1 1 1 2 11 G * * 0 1 1 12 C * * 0 1 13 G * * 1 14 C * * 15 C * i i+1 i+2 k j-1 j j+1 X i,k-1 X k+1,j X { X, max (( X + 1 X ρ )} i, j+ 1 = max i, j i k j 1 i, k 1 + k+ 1, j) k, j+ 1 Minimum free energy computations are based on empirical energies

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA RNA folding: Structural biology, spectroscopy of biomolecules, understanding molecular function Iterative determination of a sequence for the given secondary structure Inverse Folding Algorithm Inverse folding of RNA: Biotechnology, design of biomolecules with predefined structures and functions RNA structure of minimal free energy Sequence, structure, and design

Compatibility of sequences and structures

Compatibility of sequences and structures

Inverse folding algorithm I 0 I 1 I 2 I 3 I 4... I k I k+1... I t S 0 S 1 S 2 S 3 S 4... S k S k+1... S t I k+1 = M k (I k ) and d S (S k,s k+1 ) = d S (S k+1,s t ) - d S (S k,s t ) < 0 M... base or base pair mutation operator d S (S i,s j )... distance between the two structures S i and S j Unsuccessful trial... termination after n steps

Approach to the target structure S k in the inverse folding algorithm

The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.

Space of genotypes: I = { I, I, I, I,..., I } ; Hamming metric 1 2 3 4 N Space of phenotypes: S = { S, S, S, S,..., S } ; metric (not required) 1 2 3 4 M N M ( I) = j S k -1 G k = ( S ) U I ( I) = S k j j k A mapping and its inversion

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

Structure of andomly chosen initial sequence Phenylalanyl-tRNA as target structure

Evolution in silico W. Fontana, P. Schuster, Science 280 (1998), 1451-1455

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Evolution of RNA molecules as a Markow process and its analysis by means of the relay series

Replication rate constant: f k = / [ + d S (k) ] d S (k) = d H (S k,s ) Selection constraint: Population size, N = # RNA molecules, is controlled by the flow N ( t) N ± N Mutation rate: p = 0.001 / site replication The flowreactor as a device for studies of evolution in vitro and in silico

In silico optimization in the flow reactor: Evolutionary Trajectory

28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations change the molecular structure Neutral point mutations leave the molecular structure unchanged Neutral genotype evolution during phenotypic stasis

A sketch of optimization on neutral networks

Randomly chosen initial structure Phenylalanyl-tRNA as target structure

Application of molecular evolution to problems in biotechnology

1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding

RNA secondary structures derived from a single sequence

An algorithm for the computation of all suboptimal structures of RNA molecules using the same concept for retrieval as applied in the sequence alignment algorithm by M.S. Waterman and T.F. Smith. Math.Biosci. 42:257-266, 1978.

An algorithm for the computation of RNA folding kinetics

The Folding Algorithm A sequence I specifies an energy ordered set of compatible structures S(I): S(I) = {S 0, S 1,, S m, O} A trajectory T k (I) is a time ordered series of structures in S(I). A folding trajectory is defined by starting with the open chain O and ending with the global minimum free energy structure S 0 or a metastable structure S k which represents a local energy minimum: T 0 (I) = {O, S (1),, S (t-1), S (t), S (t+1),, S 0 } T k (I) = {O, S (1),, S (t-1), S (t), S (t+1),, S k } dp dt Kinetic equation ( P ( t) P ( t) ) k = m+ 1 m+ 1 1 = m+ = 0 k P P k i ik ki i= 0 ik i k i= 0 ki k = 0,1, K, m + 1 Transition rate prameters P ij (t) are defined by P ij (t) = P i (t) k ij = P i (t) exp(- G ij /2RT) / Σ i P ji (t) = P j (t) k ji = P j (t) exp(- G ji /2RT) / Σ j Σ k = m + 2 k = 1, k i exp(- G ki /2RT) The symmetric rule for transition rate parameters is due to Kawasaki (K. Kawasaki, Diffusion constants near the critical point for time dependent Ising models. Phys.Rev. 145:224-230, 1966). Formulation of kinetic RNA folding as a stochastic process and by reaction kinetics

S 5 (h) S 2 (h) S 1 (h) 0 Free energy G S 6 (h) S 7 (h) S 9 (h) Suboptimal conformations Search for local minima in conformation space S h Local minimum

0 Free energy G Saddle point T {k S { 0 Free energy G T {k S { S k S k "Reaction coordinate" "Barrier tree" Definition of a barrier tree

CUGCGGCUUUGGCUCUAGCC...((((...)))) -4.30 (((.(((...))).))).. -3.50 (((..((...))..))).. -3.10...(((...))) -2.80..(((((...)))...)). -2.20...(((...))) -2.20 ((..(((...)))..)).. -2.00..((.((...))...)). -1.60...(((...)))... -1.60...(((...))). -1.50.((.(((...))).))... -1.40...((((..(...).)))) -1.40.((..((...))..))... -1.00 (((.(((...)).)))).. -0.90 (((.((...)).))).. -0.90...((((..(...))))) -0.80...((...))... -0.80..(.(((...))))... -0.60...(((...)).)... -0.60 (((..(...)..))).. -0.50..(((((...)).)..)). -0.50..(.(((...))).)... -0.40..((...))... -0.30...((...)) -0.30...((...)). -0.30 (((.(((...)))).)).. -0.20...(((.(...)))) -0.20...(((..((...))))) -0.20..(..((...))..)... 0.00... 0.00.(..(((...)))..)... 0.10 S 0 S 1 M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

CUGCGGCUUUGGCUCUAGCC...((((...)))) -4.30 (((.(((...))).))).. -3.50 (((..((...))..))).. -3.10...(((...))) -2.80..(((((...)))...)). -2.20...(((...))) -2.20 ((..(((...)))..)).. -2.00..((.((...))...)). -1.60...(((...)))... -1.60...(((...))). -1.50.((.(((...))).))... -1.40...((((..(...).)))) -1.40.((..((...))..))... -1.00 (((.(((...)).)))).. -0.90 (((.((...)).))).. -0.90...((((..(...))))) -0.80...((...))... -0.80..(.(((...))))... -0.60...(((...)).)... -0.60 (((..(...)..))).. -0.50..(((((...)).)..)). -0.50..(.(((...))).)... -0.40..((...))... -0.30...((...)) -0.30...((...)). -0.30 (((.(((...)))).)).. -0.20...(((.(...)))) -0.20...(((..((...))))) -0.20..(..((...))..)... 0.00... 0.00.(..(((...)))..)... 0.10 M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741. Arrhenius kinetics

Arrhenius kinetic Exact solution of the kinetic equation M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.

Design of an RNA switch 1D R 2D GGGUGGAACCACGAGGUUCCACGAGGAACCACGAGGUUCCUCCC 3 13 G 23 33 44 1D 2D 13 33 CG CG A A A A C G C G C G C G A U A U A U A U G C G C G C G C U A/G A U 3 G C G C 44 GG R 23 CC 5' 3-28.6 kcal mol -1-28.2 kcal mol -1 JN1LH R 23 CG G/ A A C G C G U A U A G C G C A A 13 1D G C 2D C G 33 A A C G C G A U A U G C G C U U 3G CCC G G 44 5' 3-28.6 kcal mol -1-31.8 kcal mol -1

-26.0-28.0 2.89-30.0 6.8 4.88 6.13 3.04 3.04 2.97 Free energy [kcal / mole] -32.0-34.0-36.0-38.0-40.0-42.0-44.0 2.14 2.51 49 41 46 47 36 38 39 33 34 25 27 24 19 20 11 2.44 2.09 9 8 2.44 4 5 2.14 40 1.66 3.4 3 43 35 42 37 2.44 1.47 2.14 2.51 1.49 48 50 45 44 1.9 32 31 28 29 30 23 26 21 22 16 17 18 12 13 14 15 1.46 1.44 10 7 6 2.36 5.32 J.H.A. Nagel, C. Flamm, I.L. Hofacker, K. Franke, M.H. de Smit, P. Schuster, and C.W.A. Pleij. Nucleic Acids Res. 34:3568-3576 (2006) -46.0-48.0-50.0 J1LH barrier tree 2 1 2.77

A ribozyme switch E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452

Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis- -virus (B)

The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures

Two neutral walks through sequence space with conservation of structure and catalytic activity

Acknowledgement of support Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Projects No. 09942, 10578, 11065, 13093 13887, and 14898 Universität Wien Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF) Project No. Mat05 Jubiläumsfonds der Österreichischen Nationalbank Project No. Nat-7813 European Commission: Contracts No. 98-0189, 12835 (NEST) Austrian Genome Research Program GEN-AU: Bioinformatics Network (BIN) Österreichische Akademie der Wissenschaften Siemens AG, Austria Universität Wien and the Santa Fe Institute

Coworkers Walter Fontana, Harvard Medical School, MA Christian Forst, Christian Reidys, Los Alamos National Laboratory, NM Universität Wien Peter Stadler, Bärbel Stadler, Universität Leipzig, GE Jord Nagel, Kees Pleij, Universiteit Leiden, NL Christoph Flamm, Ivo L.Hofacker, Andreas Svrček-Seiler, Universität Wien, AT Stefan Bernhart, Jan Cupal, Lukas Endler, Kurt Grünberger, Michael Kospach, Ulrike Langhammer, Rainer Machne, Ulrike Mückstein, Hakim Tafer, Andreas Wernitznig, Stefanie Widder, Michael Wolfinger, Stefan Wuchty, Dilmurat Yusuf, Universität Wien, AT Ulrike Göbel, Walter Grüner, Stefan Kopp, Jaqueline Weber, Institut für Molekulare Biotechnologie, Jena, GE

Web-Page for further information: http://www.tbi.univie.ac.at/~pks