RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

Similar documents
98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

Algorithms in Bioinformatics

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

Combinatorial approaches to RNA folding Part I: Basics

RNA secondary structure prediction. Farhat Habib

proteins are the basic building blocks and active players in the cell, and

DNA/RNA Structure Prediction

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters

BCB 444/544 Fall 07 Dobbs 1

In Genomes, Two Types of Genes

Using SetPSO to determine RNA secondary structure

CS681: Advanced Topics in Computational Biology

Predicting RNA Secondary Structure

RNA Secondary Structure Prediction

RNA Folding and Interaction Prediction: A Survey

Lecture 12. DNA/RNA Structure Prediction. Epigenectics Epigenomics: Gene Expression

RNA Structure Prediction and Comparison. RNA folding

BIOINFORMATICS. Prediction of RNA secondary structure based on helical regions distribution

Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model

Grand Plan. RNA very basic structure 3D structure Secondary structure / predictions The RNA world

RNA and Protein Structure Prediction

RecitaLon CB Lecture #10 RNA Secondary Structure

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

Algorithmic Aspects of RNA Secondary Structures

Semi-Supervised CONTRAfold for RNA Secondary Structure Prediction: A Maximum Entropy Approach

Bachelor Thesis. RNA Secondary Structure Prediction

Computational approaches for RNA energy parameter estimation

RNA folding with hard and soft constraints

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

Using distance geomtry to generate structures

COMBINATORICS OF LOCALLY OPTIMAL RNA SECONDARY STRUCTURES

Lab III: Computational Biology and RNA Structure Prediction. Biochemistry 208 David Mathews Department of Biochemistry & Biophysics

RNA$2 nd $structure$predic0on

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

RNA Abstract Shape Analysis

Structure-Based Comparison of Biomolecules

Supplementary Material

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

A Combinatorial Framework for Multiple RNA Interaction Prediction

Junction-Explorer Help File

Sparse RNA Folding Revisited: Space-Efficient Minimum Free Energy Prediction

Rapid Dynamic Programming Algorithms for RNA Secondary Structure

Dot Bracket Notation for RNA and DNA nanostructures. Slides by Reem Mokhtar

Quantitative modeling of RNA single-molecule experiments. Ralf Bundschuh Department of Physics, Ohio State University

A Novel Statistical Model for the Secondary Structure of RNA

Characterising RNA secondary structure space using information entropy

D Dobbs ISU - BCB 444/544X 1

The Ensemble of RNA Structures Example: some good structures of the RNA sequence

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

The wonderful world of RNA informatics

Bio nformatics. Lecture 23. Saad Mneimneh

The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands

A Method for Aligning RNA Secondary Structures

Unit 1: Chemistry - Guided Notes

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

DANNY BARASH ABSTRACT

Novel Algorithms for Structural Alignment of Noncoding

Contents. xiii. Preface v

arxiv: v1 [q-bio.bm] 25 Jul 2012

Introduction to Polymer Physics

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Overview Multiple Sequence Alignment

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

A Structure-Based Flexible Search Method for Motifs in RNA

Gibbs Sampling Methods for Multiple Sequence Alignment

Sparse RNA folding revisited: space efficient minimum free energy structure prediction

DYNAMIC PROGRAMMING ALGORITHMS FOR RNA STRUCTURE PREDICTION WITH BINDING SITES

A nucleotide-level coarse-grained model of RNA

RNA Secondary Structure Prediction

Masterarbeit. Titel der Masterarbeit. Computational Refinement of SHAPE RNA probing Experiments. verfasst von. Roman Wilhelm Ochsenreiter, BSc

RNA-RNA interaction is NP-complete and some approximation algorithms

Chapter 1. A Method to Predict the 3D Structure of an RNA Scaffold. Xiaojun Xu and Shi-Jie Chen. Abstract. 1 Introduction

A statistical sampling algorithm for RNA secondary structure prediction

Bioinformatics Advance Access published July 14, Jens Reeder, Robert Giegerich

Computational approaches for RNA energy parameter estimation

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Biomolecules. Energetics in biology. Biomolecules inside the cell

Shape Based Indexing For Faster Search Of RNA Family Databases

Detecting non-coding RNA in Genomic Sequences

RNA Graph Partitioning for the Discovery of RNA Modularity: A Novel Application of Graph Partition Algorithm to Biology

Introduction to Computational Structural Biology

The wonderful world of NUCLEIC ACID NMR!

The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands

Stable stem enabled Shannon entropies distinguish non-coding RNAs from random backgrounds

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Sequence analysis and comparison

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Complete Suboptimal Folding of RNA and the Stability of Secondary Structures

Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

SUPPLEMENTARY INFORMATION

Finding Consensus Energy Folding Landscapes Between RNA Sequences

A tutorial on RNA folding methods and resources

A phylogenetic view on RNA structure evolution

Biphasic Folding Kinetics of RNA Pseudoknots and Telomerase RNA Activity

BIOINFORMATICS. Fast evaluation of internal loops in RNA secondary structure prediction. Abstract. Introduction

Transcription:

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de

RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi loop f. exterior loop g. dangling nucleotide h. energy penalty after stem i. coaxial stacking http://www.clcbio.com/scienceimages/rna_prediction/rna_structure_predictio n_web.png

Most nucleotides are bound to helical structures 4 types of loops: H: hairpins I: interior loops B: bulges M: multi-loops Example of secondary structure of RNA helical hairpin-loop bulge Interior loop RNAse P from Bacillus subtilis (M. Zuker)

Timeline of secondary structure prediction R. Lorenz et al. / Methods 103 (2016) 86 98

FIRST STEP: THERMODYNAMIC PREDICTION

Thermodynamic prediction algorithms R. Lorenz et al. / Methods 103 (2016) 86 98

secondary structure prediction programs for RNA Mfold http://mfold.rna.albany.edu/?q=mfold RNAfold (Vienna RNA server) http://rna.tbi.univie.ac.at/cgi-bin/rnafold.cgi RNA structure https://rna.urmc.rochester.edu/rnastructure. html

Structure prediction methods 1.8 N possibilites (N = length of sequence) Normally free energy minimization Combinatorial puzzling pieces of sec. Structure Recursive one nucleotide addition Comparative sequence analysis Context free grammar for training (without energy calculations)

BP MAXIMIZATION: NUSSINOV (1978) { } ) (4 (3) (2) (1) (0) ) ( ) ( min ) ( ) ( ) ( ), ( 4 0 ) ( min ) (, 1, 1, 1, 1 1,,, + + < = = < < + + j k k i j k i j i j i j i j i j i j i S E S E S E S E S E r r i j für S E S E α (0): rule prevent to strong bends in the structure (1): basepairing of r i and r j sum of energy for basepairing and part of secondary structure before E(S i+1,j-1 ) (2): base r i no basepairing to R i,j (3): base r j no basepairing to R i,j (4): bifurcation; bases r i and r j are bound in two different parts of the secondary structure

Dot Plot for trnaphe

Old energy rules (2.3) best structure -12.1 kcal/mol worst structure -11.1 kcal/mol

New energy rules (3.0) -For changing the temperature use RNA 2.3 -Decide if the RNA is circular or linear -The maximum between paired bases is not so important -To check the possible structures use a greater window size to increase the energy difference

Output RNAFold http://rna.tbi.univie.ac.at/cgi-bin/rnafold.cgi First you get the best predicted structure in dot-bracket notation and the minimum free energy.

Predicted secondary structure 5 3 - base pairing probability and entropy can be checked with the partition function - forbid the wobble pair GU - forbid to build 1 bp Helices - prediction for 37 C

RNAfold Example using snorna: snr44 Prediction:

RNA structure

Structure comparison using MFE R. Lorenz et al. / Methods 103 (2016) 86 98

NEXT STEP: TERTIARY STRUCTURE PREDICTION

RNA structures are complex Not only two different structural elements Single and double stranded regions Pseudoknots between single stranded regions

Tertiary structure elements increasing complexity for prediction pseudoknot kissing hairpins hairpin loop-bulge contact

Structure prediction including pseudknots R. Lorenz et al. 2016

P. Zhao et al. 2004 RNA kinetic folding

Main classes of structure prediction Iterated Loop Matching algorithm An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots (Ruan, Stormo, Zhang; 2004) Stochastic modelling using parallel grammars Stochastic modeling of RNA pseudoknotted structures: a grammatical approach (Cai, Russell, Wu; 2003) Graph-theoretical / Multisequence approaches A graph theoretical approach to predict common RNA secondary sructure motifs including pseudoknots in unaligned sequences (Yongmei, Stormo, Xing; 2004)

Contrafold http://contra.stanford.edu/contrafold/ Do et al. 2006

Dynalign or PPfold Using multiple sequence alignment (Ppfold) Or having template sequence for corrections (Dynalign) Calculation of the free energy having inforamtion about conserved regions

BENCHMARKING AND DRAWBACKS FOR SECONDARY STRUCTURE PREDICTION

Comparison single vs. multiple sequence algorithms Multiple sequences Single sequence (Medium similarity) Multiple sequences (High similarity) P. Gardner et al. 2004

Comparative RNA structure prediction strategies P. Gardner et al. 2004

Comparison of comparative structure prediction strategies Sensitivity: True predicted positives vs. all true positives P. Gardner et al. 2004 Selectivity: All true predicted vs. all predicted

Prediction of single RNA secondary Accuracy: structure About 500 nt long RNAs 70% of correct basepairs >500 nt long RNAs fall down to 40% accuracy Reasons: simplifications in the energy model inaccuracies of parameters ignoring the effect of binding to ions proteins and other ligands non-equilibrium states of the RNA R. Lorenz et al. / Methods 103 (2016) 86 98

Prediction of multiple RNA secondary structures Multiple sequence advances: Additional information like phylogenetic tree, substitution model unpaired/paired Energy-based and evolutionary based Drawbacks: Dependent on the alignment quality Pairwise identity >80% for consensus structure

HOW PROBING CAN IMPROVE STRUCTURE PREDICTION?

High throughput probing and guided structure prediction R. Lorenz et al. / Methods 103 (2016) 86 98

Usage of experimental data Structure probing: Biochemical method to find structure of nucleic acids on molecular level Physical methods: Crystalstructures, NMR Chemical methods: Modification of nucleic acids DMS or SHAPE

SHAPE for guided structure prediction Selective 2 -hydroxyl acylation analyzed by primer extension = SHAPE 2 -Hydroxyl group of ribose is bound by 1-methyl-7-nitroisatoic anhydrid (1M7) Reacts on all unbound riboses in the RNA molecule Reverse transcriptase is stopping and falls of comparison to control shows unbound regions of the RNA

Guided structure prediction R. Lorenz et al. / Methods 103 (2016) 86 98

DMS for high throughput Sequencing DMS methylates nitrogen in Adenin and Tyrosin Unbound bases, end of helices and GU basepairs are detectable

Genome-wide Structurome

Mod-seq for footprinting

Limitations by experimental setup effectiveness of the probing agent can be influenced by solvent accessibility tertiary and even quaternary interactions bulky enzymes may not be able to reach all parts of the RNA de- and re-naturing steps devoid of any RNA-binding proteins or other factors

Future perspectives slightest error in hard constraints might yield an entirely wrong prediction secondary structures does not account for tertiary effects such as non-canonical base pairs extremely short hairpin loops and long interior loops are excluded