RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de
RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi loop f. exterior loop g. dangling nucleotide h. energy penalty after stem i. coaxial stacking http://www.clcbio.com/scienceimages/rna_prediction/rna_structure_predictio n_web.png
Most nucleotides are bound to helical structures 4 types of loops: H: hairpins I: interior loops B: bulges M: multi-loops Example of secondary structure of RNA helical hairpin-loop bulge Interior loop RNAse P from Bacillus subtilis (M. Zuker)
Timeline of secondary structure prediction R. Lorenz et al. / Methods 103 (2016) 86 98
FIRST STEP: THERMODYNAMIC PREDICTION
Thermodynamic prediction algorithms R. Lorenz et al. / Methods 103 (2016) 86 98
secondary structure prediction programs for RNA Mfold http://mfold.rna.albany.edu/?q=mfold RNAfold (Vienna RNA server) http://rna.tbi.univie.ac.at/cgi-bin/rnafold.cgi RNA structure https://rna.urmc.rochester.edu/rnastructure. html
Structure prediction methods 1.8 N possibilites (N = length of sequence) Normally free energy minimization Combinatorial puzzling pieces of sec. Structure Recursive one nucleotide addition Comparative sequence analysis Context free grammar for training (without energy calculations)
BP MAXIMIZATION: NUSSINOV (1978) { } ) (4 (3) (2) (1) (0) ) ( ) ( min ) ( ) ( ) ( ), ( 4 0 ) ( min ) (, 1, 1, 1, 1 1,,, + + < = = < < + + j k k i j k i j i j i j i j i j i j i S E S E S E S E S E r r i j für S E S E α (0): rule prevent to strong bends in the structure (1): basepairing of r i and r j sum of energy for basepairing and part of secondary structure before E(S i+1,j-1 ) (2): base r i no basepairing to R i,j (3): base r j no basepairing to R i,j (4): bifurcation; bases r i and r j are bound in two different parts of the secondary structure
Dot Plot for trnaphe
Old energy rules (2.3) best structure -12.1 kcal/mol worst structure -11.1 kcal/mol
New energy rules (3.0) -For changing the temperature use RNA 2.3 -Decide if the RNA is circular or linear -The maximum between paired bases is not so important -To check the possible structures use a greater window size to increase the energy difference
Output RNAFold http://rna.tbi.univie.ac.at/cgi-bin/rnafold.cgi First you get the best predicted structure in dot-bracket notation and the minimum free energy.
Predicted secondary structure 5 3 - base pairing probability and entropy can be checked with the partition function - forbid the wobble pair GU - forbid to build 1 bp Helices - prediction for 37 C
RNAfold Example using snorna: snr44 Prediction:
RNA structure
Structure comparison using MFE R. Lorenz et al. / Methods 103 (2016) 86 98
NEXT STEP: TERTIARY STRUCTURE PREDICTION
RNA structures are complex Not only two different structural elements Single and double stranded regions Pseudoknots between single stranded regions
Tertiary structure elements increasing complexity for prediction pseudoknot kissing hairpins hairpin loop-bulge contact
Structure prediction including pseudknots R. Lorenz et al. 2016
P. Zhao et al. 2004 RNA kinetic folding
Main classes of structure prediction Iterated Loop Matching algorithm An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots (Ruan, Stormo, Zhang; 2004) Stochastic modelling using parallel grammars Stochastic modeling of RNA pseudoknotted structures: a grammatical approach (Cai, Russell, Wu; 2003) Graph-theoretical / Multisequence approaches A graph theoretical approach to predict common RNA secondary sructure motifs including pseudoknots in unaligned sequences (Yongmei, Stormo, Xing; 2004)
Contrafold http://contra.stanford.edu/contrafold/ Do et al. 2006
Dynalign or PPfold Using multiple sequence alignment (Ppfold) Or having template sequence for corrections (Dynalign) Calculation of the free energy having inforamtion about conserved regions
BENCHMARKING AND DRAWBACKS FOR SECONDARY STRUCTURE PREDICTION
Comparison single vs. multiple sequence algorithms Multiple sequences Single sequence (Medium similarity) Multiple sequences (High similarity) P. Gardner et al. 2004
Comparative RNA structure prediction strategies P. Gardner et al. 2004
Comparison of comparative structure prediction strategies Sensitivity: True predicted positives vs. all true positives P. Gardner et al. 2004 Selectivity: All true predicted vs. all predicted
Prediction of single RNA secondary Accuracy: structure About 500 nt long RNAs 70% of correct basepairs >500 nt long RNAs fall down to 40% accuracy Reasons: simplifications in the energy model inaccuracies of parameters ignoring the effect of binding to ions proteins and other ligands non-equilibrium states of the RNA R. Lorenz et al. / Methods 103 (2016) 86 98
Prediction of multiple RNA secondary structures Multiple sequence advances: Additional information like phylogenetic tree, substitution model unpaired/paired Energy-based and evolutionary based Drawbacks: Dependent on the alignment quality Pairwise identity >80% for consensus structure
HOW PROBING CAN IMPROVE STRUCTURE PREDICTION?
High throughput probing and guided structure prediction R. Lorenz et al. / Methods 103 (2016) 86 98
Usage of experimental data Structure probing: Biochemical method to find structure of nucleic acids on molecular level Physical methods: Crystalstructures, NMR Chemical methods: Modification of nucleic acids DMS or SHAPE
SHAPE for guided structure prediction Selective 2 -hydroxyl acylation analyzed by primer extension = SHAPE 2 -Hydroxyl group of ribose is bound by 1-methyl-7-nitroisatoic anhydrid (1M7) Reacts on all unbound riboses in the RNA molecule Reverse transcriptase is stopping and falls of comparison to control shows unbound regions of the RNA
Guided structure prediction R. Lorenz et al. / Methods 103 (2016) 86 98
DMS for high throughput Sequencing DMS methylates nitrogen in Adenin and Tyrosin Unbound bases, end of helices and GU basepairs are detectable
Genome-wide Structurome
Mod-seq for footprinting
Limitations by experimental setup effectiveness of the probing agent can be influenced by solvent accessibility tertiary and even quaternary interactions bulky enzymes may not be able to reach all parts of the RNA de- and re-naturing steps devoid of any RNA-binding proteins or other factors
Future perspectives slightest error in hard constraints might yield an entirely wrong prediction secondary structures does not account for tertiary effects such as non-canonical base pairs extremely short hairpin loops and long interior loops are excluded