Algorithms in Bioinformatics

Similar documents
RNA secondary structure prediction. Farhat Habib

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

CS681: Advanced Topics in Computational Biology

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

proteins are the basic building blocks and active players in the cell, and

DNA/RNA Structure Prediction

Combinatorial approaches to RNA folding Part I: Basics

RNA Secondary Structure Prediction

RecitaLon CB Lecture #10 RNA Secondary Structure

Supplementary Material

RNA Structure Prediction and Comparison. RNA folding

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Predicting RNA Secondary Structure

BIOINFORMATICS. Prediction of RNA secondary structure based on helical regions distribution

RNA Secondary Structure Prediction

BCB 444/544 Fall 07 Dobbs 1

RNA Folding and Interaction Prediction: A Survey

Lab III: Computational Biology and RNA Structure Prediction. Biochemistry 208 David Mathews Department of Biochemistry & Biophysics

Algorithms in Bioinformatics

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Lecture 12. DNA/RNA Structure Prediction. Epigenectics Epigenomics: Gene Expression

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

Basics of protein structure

Bachelor Thesis. RNA Secondary Structure Prediction

Using SetPSO to determine RNA secondary structure

Structure-Based Comparison of Biomolecules

The Ensemble of RNA Structures Example: some good structures of the RNA sequence

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

Algorithms in Bioinformatics

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands

Rapid Dynamic Programming Algorithms for RNA Secondary Structure

BIOINFORMATICS. Fast evaluation of internal loops in RNA secondary structure prediction. Abstract. Introduction

RNA Matrices and RNA Secondary Structures

Sparse RNA Folding Revisited: Space-Efficient Minimum Free Energy Prediction

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model

RNA Abstract Shape Analysis

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

Algorithmic Aspects of RNA Secondary Structures

Sparse RNA folding revisited: space efficient minimum free energy structure prediction

Candidates for Novel RNA Topologies

Characterising RNA secondary structure space using information entropy

Computational Biology: Basics & Interesting Problems

13 Comparative RNA analysis

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Semi-Supervised CONTRAfold for RNA Secondary Structure Prediction: A Maximum Entropy Approach

Finding Consensus Energy Folding Landscapes Between RNA Sequences

In Genomes, Two Types of Genes

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

RNA$2 nd $structure$predic0on

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

BINF6201/8201. Molecular phylogenetic methods

Evolutionary Tree Analysis. Overview

Selecting protein fuzzy contact maps through information and structure measures

Grand Plan. RNA very basic structure 3D structure Secondary structure / predictions The RNA world

Graph Alignment and Biological Networks

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Sparse RNA Folding: Time and Space Efficient Algorithms

Supplementary Material

RNA and Protein Structure Prediction

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

A Combinatorial Framework for Multiple RNA Interaction Prediction

Sequence analysis and comparison

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

Parametrized Stochastic Grammars for RNA Secondary Structure Prediction

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Palindromes in Viral Genomes. Ming-Ying Leung Department of Mathematical Sciences The University of Texas at El Paso (UTEP)

Supersecondary Structures (structural motifs)

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Moments of the Boltzmann distribution for RNA secondary structures

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Complete Suboptimal Folding of RNA and the Stability of Secondary Structures

COMBINATORICS OF LOCALLY OPTIMAL RNA SECONDARY STRUCTURES

RNA SECONDARY STRUCTURES AND THEIR PREDICTION 1. Centre de Recherche de MatMmatiques Appliqu6es, Universit6 de Montr6al, Montreal, Canada H3C 3J7

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Dynamic Programming 1

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

DYNAMIC PROGRAMMING ALGORITHMS FOR RNA STRUCTURE PREDICTION WITH BINDING SITES

Structural biomathematics: an overview of molecular simulations and protein structure prediction

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Lecture 10: December 22, 2009

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Hairpin Database: Why and How?

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Large-Scale Genomic Surveys

DANNY BARASH ABSTRACT

Bioinformatics Chapter 1. Introduction

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

A tutorial on RNA folding methods and resources

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

A two length scale polymer theory for RNA loop free energies and helix stacking

Transcription:

Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary Structure Base-Pairing Stems & Loops Minimum Energy Nussinov Algorithm Covariation SCFG 5.1

RNA Structure Prediction Problem: Given a primary sequence, predict the secondary and tertiary structure. Some RNAs have a consensus structure. Example: transfer RNA Other RNAs (mrna and rrna) do not have a predefined structure It is very difficult to predict the 3 dimension folding of RNAs. Transfer RNA 5.2

Ribonucleic Acids RNA includes some of the most ancient molecules Example: Ribosomal RNAs. Many RNAs are like molecular fossils that have been handed down in evolutionary time from an extinct RNA world. Base-Pairing Patterns Sequence variations in RNA maintain base-pairing patterns that give rise to double-stranded regions (secondary structure) in the molecule. Alignments of two sequences that specify the same RNA molecules will show covariation at interacting basepair positions. 5.3

Importance of Secondary Structure RNAs and proteins are single sequences that fold into 3-D structures: Secondary structure describes how a sequence pairs with itself Tertiary structure describes the overall 3-D shape Folding maximizes RNA and Protein s chemical effect Over the history of evolution, members of many RNA families conserve their secondary structure more than they conserve their primary sequence - This shows the importance of secondary structure, and provides a basis for comparative analysis of RNA secondary structure RNA Secondary Structure RNA secondary structure is an intermediate step in the formation of a three-dimensional structure. RNA secondary structure is composed primarily of double-stranded RNA regions formed by folding the single-stranded molecule back on itself. 5.4

Secondary Structure Analysis Primary sequence poorly conserved Secondary structure highly conserved => Many RNAs or functional elements in RNAs cannot be identified by sequence comparison but only by the analysis of secondary structure Conservation of Ribonucleic Acids Structure of molecules is conserved across many species and may be used both to infer phylogenetic relationships and to determine two and three dimensional structure. 5.5

Types of Secondary Structure RNA Secondary Structure Pseudoknot Single-Stranded Stem Interior Loop Bulge Loop Hairpin loop Junction (Multiloop) 5.6

Predicting Methods Structure may be predicted from sequence by searching for regions that can potentially base pair or by examining covariation in different sequence positions in aligned sequences. The most modern methods are very accurate and will find all candidates of a class of RNA molecules with high reliability. Methods involve a combination of hidden Markov models, and new type of covariation tool called SCFGs (stochastic context free grammars). Assumptions of RNA Secondary Structure The most likely structure is similar to the energetically most stable structure. The energy associated with any position in the structure is only influenced by the local sequence and structure. 5.7

Free Energy Minimization Assumptions: Secondary structure has the lowest possible energy Free energies of stems depend only on the nearest neighbor base pairs in the sequences Stem and loop free energies are additive Free energies of stems and loops come from experimentally measured values of oligonucleotides. Energy Minimization Algorithms Input: Primary RNA sequence Output: Predicted Secondary Structure Minimizes free energy while maximizing the number of consecutive base pairing. 5.8

Dot Matrix Analysis Repeats represents regions that can potentially selfhyberdize to form doublestranded RNA. The compatible regions may be used to predict a minimum free-energy structure. A dot plot of an RNA sequence against its complementary strand scoring matches MFOLD and Energy MFOLD is commonly used to predict the energetically most stable structures of an RNA molecule. The most energetic is often the longest region in the molecule. MFOLD provides a set of possible structures within a given energy range and provides an indication in their reliability. 5.9

The Output of MFOLD MFOLD looks for the arrangement that yields the secondary structures with lowest possible energy. Thus, the result is dependent on the correctness of the energy model (such as Table 8.2, Mount). MFOLD output includes the following parts: The Energy Dot Plot The View Individual Structures The Dot Plot Folding Comparisons Obtaining Minimal Energies Plot sequences across the page and also down the left side of the page. Look for rows of complementary matches. Use the table of predicted free-energy values (kcal/mole at 37 degrees Celsius) for base pairs to add up the stacking energies. 5.10

SCFG For Modeling RNA Use both, covariational and energy minimization methods together generally yield very good results. Stochastic Context Free Grammars (SCFG) can help define base interactions in specific classes of RNA molecules and sequence variations at those positions. Nussinov s Algorithm Algorithms for Computational Biology by Manolis Kellis (MIT) 5.11

Dynamic Programming Approach Solve problem for all sub problems of size 1: The solution is zero Iteratively, knowing the solution of all problems of size less than k, compute the solution of all problems of size k. Optimally Solving Subproblems Input X = x 1,x 2,x 3,x 4,x 5,x 6,,x n Solve subproblems of size 2: x 1,x 2,x 3,x 4,x 5,x 6,,x n Solve subproblems of size 3: x 1,x 2,x 3,x 4,x 5,x 6,,x n Continue until all sizes are studied 5.12

Nussinov: Base Pair Maximization S(i,j) is the folding of the subsequence of the RNA sequence from index i to index j which results in the highest number of base pairs First Case: i and j base pair Add the i, j pair onto best structure found for subsequence i+1,j-i i and j base pair 5.13

Second Case: i is unpaired Add unpaired position i onto best structure for subsequence i+1,j i is unpaired Third Case: j is unpaired Add unpaired position j onto best structure for subsequence i,j-1 j is unpaired 5.14

Fourth Case: Bifurcation Combine two optimal substructures: i,k and k+1,j Bifurcation Nussinov Algorithm: Example Initialization: Initialize first two diagonal arrays to 0 The next diagonal gives the best score for all subsequences of length 2 5.15

Nussinov: Example (II) S(i, j 1) S(i + 1, j) We still have to consider bifurcations for k=2,3,4,5,6,7,8 S(i + 1, j 1) +1 Nussinov: Example (III) k=2: We have 2 substructures: (i,k) and (k+1,j) i.e., (1,2) and (3,9). Proceed for k=3,4,5,6,7,8 Maximum value is 2, obtained when k=2: 0 + 2 = 2. 5.16

Nussinov: Example (IV) S(1,9) = max { 2, 3, 2, 2 } = 3. Traceback to find the actual structure. Phase 2: Traceback Value at S(1,9) is the total base pair count in the maximally base-paired structure. As is usually the case with Dynamic Programming Algorithms, we have to traceback from S(1, 9) to actually construct the RNA secondary structure. 5.17

Constructing the RNA Structure Nussinov algorithm gives the structure with maximum number of base pairings, but does not always create viable secondary structures Conclusion Raw data provided by experimental methods X-ray crystallography Nuclear Magnetic Resonance Computational prediction algorithms a) Minimum energy algorithms: Dynamic programming algorithms: Nussinov algorithm, Zuker algorithm, Akustu algorithm, b) Stochastic Context Free Grammar Utilize various energy functions and covariation scores to define branch probabilities c) Maximum Weighted Matching A heuristic algorithm. Edge weight definition utilize energy functions and covariation scores. 5.18