RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

Size: px
Start display at page:

Download "RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev"

Transcription

1 RNA Folding Algorithms Michal Ziv-Ukelson Ben Gurion University of the Negev

2 The RNA Folding Problem: Given an RNA sequence, predict its energetically most stable structure (minimal free energy). AUCCCCGUAUCGAUC AAAAUCCAUGGGUAC CCUAGUGAAAGUGUA UAUACGUGCUCUGAU UCUUUACUGAGGAGU CAGUGAACGAACUGA

3 RNA structure prediction by basepair maximization Input: Output: a string over A,C,G,U A pairs with U, C pairs with G a subset of possible base-pairs of maximal size such that no two base-pairs intersect.

4 Sequence Alignment as a method to determine structure Bases pair in order to form backbones and determine the secondary structure Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure Problem: number of potential structures grows exponentially with the number, n, of bases

5 RNA secondary structure prediction algorithms cont d We must distinguish the biologically correct structure from all the incorrect structures. We need both a function that assigns the highest score to the correct structure, and an algorithm for evaluating the scores of all possible structures דצמבר 15

6 RNA secondary structure prediction algorithms cont d One approach might be to find the structure with the most base pairs. Nussinov introduced an efficient dynamic programming algorithm for this problem in Although this criterion is too simplistic, the mechanics of this algorithm are the same as those of more sophisticated energy minimization folding algorithms

7 Ruth Nussinov Professor in the Department of Human Genetics, School of Medicine, Tel Aviv University. Proposed the first dynamic programming algorithm for RNA secondary structure prediction, by maximizing the number of base pairs (1978).

8 Folding an RNA Sequence of length n. 1. Classical, O(n 3 ), [Nussinov et. Al. 1978, 1980] [Waterman and Smith 1978] [Zuker and Stiegler 1981] MFOLD: Vienna RNA Package: 2. Complex worst case speedup based on Fast Matrix Multiplication: O(n 3 * log 3 logn /(log 2 n) [Akutsu 1999] Based on Four Russians: O(n 3 /(logn) [Frid&Gusfield] O(n 3 /(log 2 n) [Pinhas*, Zakov*, Tsur and Ziv-Ukelson 2013] 3. Sparsification: Practical speed up: O(nZ) where Z is in [n, n 2 ] [Wexler, Zilberstein, Ziv-Ukelson 2007] [Backofen, Tsur, Zakov, Ziv-Ukelson 2009] 9

9 Assumptions of the RNA secondary structure prediction algorithm, based on MFE: 1. The most likely structure of the RNA molecule is identical or similar to the energetically most stable structure. 2. The energy associated with any position in the structure is only influenced by local sequence structure. 3. The structure is assumed to be formed by folding of the chain back on itself in a manner that does not produce any pseudoknots. i i j j i Legal structural elements Illegal structural elements

10 (Simplified) Problem Definition An RNA molecule is a sequence of length n over the alphabet {A, C, G, U}. Each base (= letter) in the sequence may form a bond with at most one other base, where A can pair only with U, and C only with G. The base-pairs are nested: Why can this problem be solved by dynamic programming? AAAGUUUCGUCCGGG (((.)))((.)(.)) A set of nested base-pairs is called a secondary structure, or a folding of the sequence. Goal: for a given RNA sequence, compute a folding with a maximum number of base-pairs 12

11 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 13

12 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 14

13 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 15

14 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 16

15 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 17

16 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 18

17 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 19

18 Co-terminus foldings: A U C A U G G C A U Partitionable foldings: 20

19 A recursive solution Co-terminus foldings: Partitionable foldings: L c (i,j) the maximum cardinality of a co-terminus folding of S i,j A U C A U G G C A U A U C A U G G C A U L p (i,j) the maximum cardinality of a partitionable folding of S i,j L(i,j) the maximum cardinality of a folding of S i,j (the objective function) - i<q j q-1 q 21

20 A recursive solution

21 The Nussinov-Jacobson Algorithm A DP algorithm which performs a bottom-up computation of the recurrence. Uses a table M which stores solutions for subsequences: M[i,j] = L(i,j). Upon reaching M[i,j], all entries which are needed for the computation of L(i,j) have already been computed and stored in M. 23 i 1 A C A G U U G C A 0 0 j

22 The Nussinov-Jacobson Algorithm 1 A ? 2 C A G U U G C A

23 The Nussinov-Jacobson Algorithm - 1 A C A G U U G C A

24 The Nussinov-Jacobson Algorithm i < q j q-1 q 1 A C A G U U G C A

25 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 2 1 A C A G U U G C A

26 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 3 1 A C A G U U G C A

27 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 4 1 A C A G U U G C A

28 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 5 1 A C A G U U G C A

29 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 6 1 A C A G U U G C A

30 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 7 1 A C A G U U G C A

31 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 8 1 A C A G U U G C A

32 The Nussinov-Jacobson Algorithm i < q j q-1 q q = 9 1 A C A G U U G C A

33 The Nussinov-Jacobson Algorithm i < q j q-1 q Space complexity: O(n 2 ) Time complexity: O(n 3 ) 1 A C A G U U G C A

34 The Nussinov-Jacobson Algorithm i < q j q-1 q Sparsification: Do we really need to consider all O(n) sums of pairs? Space complexity: O(n 2 ) Time complexity: O(n 3 ) 1 A C A G U U G C A

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev

RNA Folding Algorithms. Michal Ziv-Ukelson Ben Gurion University of the Negev RNA Folding Algorithms Michal Ziv-Ukelson Ben Gurion University of the Negev The RNA Folding Problem: Given an RNA sequence, predict its energetically most stable structure (minimal free energy). AUCCCCGUAUCGAUC

More information

RNA Secondary Structure Prediction: taking conservation into account

RNA Secondary Structure Prediction: taking conservation into account RNA Secondary Structure Prediction: taking conservation into account 1 Assumptions of the RNA secondary structure prediction algorithm, based on MFE: 1. The most likely structure of the RNA molecule is

More information

Sparse RNA Folding: Time and Space Efficient Algorithms

Sparse RNA Folding: Time and Space Efficient Algorithms Sparse RNA Folding: Time and Space Efficient Algorithms Rolf Backofen 1, Dekel Tsur 2, Shay Zakov 2, and Michal Ziv-Ukelson 2 1 Albert Ludwigs University, Freiburg, Germany backofen@informatik.uni-freiburg.de

More information

RNA Secondary Structure Prediction: taking conservation into account

RNA Secondary Structure Prediction: taking conservation into account RNA Secondary Structure Prediction: taking conservation into account 1 13 June 2006 2 Main approaches to RNA secondary structure prediction Energy minimization (Single-strand Folding) does not require

More information

REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH

REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH REDUCING THE WORST CASE RUNNING TIMES OF A FAMILY OF RNA AND CFG PROBLEMS, USING VALIANT S APPROACH SHAY ZAKOV, DEKEL TSUR, AND MICHAL ZIV-UKELSON Abstract. We study Valiant s classical algorithm for Context

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

CS681: Advanced Topics in Computational Biology

CS681: Advanced Topics in Computational Biology CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 10 Lecture 1 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ RNA folding Prediction of secondary structure

More information

Sparse RNA Folding Revisited: Space-Efficient Minimum Free Energy Prediction

Sparse RNA Folding Revisited: Space-Efficient Minimum Free Energy Prediction Sparse RNA Folding Revisited: Space-Efficient Minimum Free Energy Prediction Sebastian Will 1 and Hosna Jabbari 2 1 Bioinformatics/IZBI, University Leipzig, swill@csail.mit.edu 2 Ingenuity Lab, National

More information

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo,

More information

A Simple, Practical and Complete O( n3

A Simple, Practical and Complete O( n3 A Simple, Practical and Complete O( n3 log n )-Time Algorithm for RNA Folding Using the Four-Russians Speedup Yelena Frid and Dan Gusfield Department of Computer Science, U.C. Davis Abstract. The problem

More information

Sparse RNA folding revisited: space efficient minimum free energy structure prediction

Sparse RNA folding revisited: space efficient minimum free energy structure prediction DOI 10.1186/s13015-016-0071-y Algorithms for Molecular Biology RESEARCH ARTICLE Sparse RNA folding revisited: space efficient minimum free energy structure prediction Sebastian Will 1* and Hosna Jabbari

More information

Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach

Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach RESEARCH Open Access Reducing the worst case running times of a family of RNA and CFG problems, using Valiant s approach Shay Zakov, Dekel Tsur and Michal Ziv-Ukelson * Abstract Background: RNA secondary

More information

RNA secondary structure prediction. Farhat Habib

RNA secondary structure prediction. Farhat Habib RNA secondary structure prediction Farhat Habib RNA RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) Some forms of RNA can form secondary structures

More information

A faster algorithm for RNA co-folding

A faster algorithm for RNA co-folding A faster algorithm for RNA co-folding Michal Ziv-Ukelson 1, Irit Gat-Viks 2, Ydo Wexler 3, and Ron Shamir 4 1 Computer Science Department, Ben Gurion University of the Negev, Beer-Sheva. 2 Computational

More information

RNA Structure Prediction and Comparison. RNA folding

RNA Structure Prediction and Comparison. RNA folding RNA Structure Prediction and Comparison Session 3 RNA folding Faculty of Technology robert@techfak.uni-bielefeld.de Bielefeld, WS 2013/2014 Base Pair Maximization This was the first structure prediction

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary

More information

Journal of Discrete Algorithms

Journal of Discrete Algorithms Journal of Discrete Algorithms 9 (2011) 2 11 Contents lists available at ScienceDirect Journal of Discrete Algorithms www.elsevier.com/locate/da Fast RNA structure alignment for crossing input structures

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming ombinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming Matthew Macauley Department of Mathematical Sciences lemson niversity http://www.math.clemson.edu/~macaule/ Math

More information

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable RNA STRUCTURE RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U wobble pairing Bases can only pair with one other base. 23 Hydrogen Bonds more stable RNA Basics transfer RNA (trna) messenger

More information

proteins are the basic building blocks and active players in the cell, and

proteins are the basic building blocks and active players in the cell, and 12 RN Secondary Structure Sources for this lecture: R. Durbin, S. Eddy,. Krogh und. Mitchison, Biological sequence analysis, ambridge, 1998 J. Setubal & J. Meidanis, Introduction to computational molecular

More information

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi

More information

RNA Secondary Structure Prediction

RNA Secondary Structure Prediction RNA Secondary Structure Prediction 1 RNA structure prediction methods Base-Pair Maximization Context-Free Grammar Parsing. Free Energy Methods Covariance Models 2 The Nussinov-Jacobson Algorithm q = 9

More information

13 Comparative RNA analysis

13 Comparative RNA analysis 13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,

More information

Combinatorial approaches to RNA folding Part I: Basics

Combinatorial approaches to RNA folding Part I: Basics Combinatorial approaches to RNA folding Part I: Basics Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Spring 2015 M. Macauley (Clemson)

More information

RNA$2 nd $structure$predic0on

RNA$2 nd $structure$predic0on RNA$2 nd $structure$predic0on Recall Nucleic Acids - RNA and DNA The carrier of genetic information - The blueprints of proteins Nucleotides Bases Adenine (A) Guanine (G) Cytosine(C) Thymine(T) Uracil

More information

Syddansk Universitet. A phase transition in energy-filtered RNA secondary structures. Han, Hillary Siwei; Reidys, Christian

Syddansk Universitet. A phase transition in energy-filtered RNA secondary structures. Han, Hillary Siwei; Reidys, Christian Syddansk Universitet A phase transition in energy-filtered RNA secondary structures Han, Hillary Siwei; Reidys, Christian Published in: Journal of Computational Biology DOI: 10.1089/cmb.2012.0151 Publication

More information

DNA/RNA Structure Prediction

DNA/RNA Structure Prediction C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Master Course DNA/Protein Structurefunction Analysis and Prediction Lecture 12 DNA/RNA Structure Prediction Epigenectics Epigenomics:

More information

RNA Secondary Structure Prediction

RNA Secondary Structure Prediction RN Secondary Structure Prediction Perry Hooker S 531: dvanced lgorithms Prof. Mike Rosulek University of Montana December 10, 2010 Introduction Ribonucleic acid (RN) is a macromolecule that is essential

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters 1 Binod Kumar, Assistant Professor, Computer Sc. Dept, ISTAR, Vallabh Vidyanagar,

More information

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems - BIOINF 4120 Bioinforma2cs 2 - Structures and Systems - Oliver Kohlbacher Summer 2014 3. RNA Structure Part II Overview RNA Folding Free energy as a criterion Folding free energy of RNA Zuker- SCegler algorithm

More information

Dynamic Programming 1

Dynamic Programming 1 Dynamic Programming 1 lgorithmic Paradigms Divide-and-conquer. Break up a problem into two sub-problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original

More information

Sparsification of RNA structure prediction including pseudoknots

Sparsification of RNA structure prediction including pseudoknots SEARCH Sparsification of RNA structure prediction including pseudoknots Mathias Möhl 1, Raheleh Salari 2, Sebastian ill 1,3, Rolf Backofen 1,4*, S Cenk Sahinalp 2* Open Access Abstract Background: Although

More information

A tutorial on RNA folding methods and resources

A tutorial on RNA folding methods and resources A tutorial on RNA folding methods and resources Alain Denise, LRI/IGM, Université Paris-Sud with invaluable help from Yann Ponty, CNRS/Ecole Polytechnique 1 Master BIBS 2014-2015 Goals To help your work

More information

On low energy barrier folding pathways for nucleic acid sequences

On low energy barrier folding pathways for nucleic acid sequences On low energy barrier folding pathways for nucleic acid sequences Leigh-Anne Mathieson and Anne Condon U. British Columbia, Department of Computer Science, Vancouver, BC, Canada Abstract. Secondary structure

More information

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 Protein Structure levels or organization Primary structure: sequence of amino acids (from

More information

CSE 202 Dynamic Programming II

CSE 202 Dynamic Programming II CSE 202 Dynamic Programming II Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally,

More information

Statistical process control of the stochastic complexity of discrete processes

Statistical process control of the stochastic complexity of discrete processes UDC 59.84 59.876. S p e c i a l G u e s t I s s u e CDQM, Volume 8, umber, 5, pp. 55-6 COMMUICATIOS I DEPEDABILITY AD QUALITY MAAGEMET An International Journal Statistical process control of the stochastic

More information

Pairwise RNA Edit Distance

Pairwise RNA Edit Distance Pairwise RNA Edit Distance In the foowing: Sequences S 1 and S 2 associated structures P 1 and P 2 scoring of aignment: different edit operations arc atering arc removing 1) ACGUUGACUGACAACAC..(((...)))...

More information

RNA Folding and Interaction Prediction: A Survey

RNA Folding and Interaction Prediction: A Survey RNA Folding and Interaction Prediction: A Survey Syed Ali Ahmed Graduate Center, City University of New York New York, NY November 19, 2015 Abstract The problem of computationally predicting the structure

More information

Rapid Dynamic Programming Algorithms for RNA Secondary Structure

Rapid Dynamic Programming Algorithms for RNA Secondary Structure ADVANCES IN APPLIED MATHEMATICS 7,455-464 I f Rapid Dynamic Programming Algorithms for RNA Secondary Structure MICHAEL S. WATERMAN* Depurtments of Muthemutics und of Biologicul Sciences, Universitk of

More information

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen 1, Danny Hermelin 2, ad M. Landau 2,3, and Oren Weimann 4 1 Institute of omputer Science, Albert-Ludwigs niversität Freiburg,

More information

Classified Dynamic Programming

Classified Dynamic Programming Bled, Feb. 2009 Motivation Our topic: Programming methodology A trade-off in dynamic programming between search space design and evaluation of candidates A trade-off between modifying your code and adding

More information

Lecture 12. DNA/RNA Structure Prediction. Epigenectics Epigenomics: Gene Expression

Lecture 12. DNA/RNA Structure Prediction. Epigenectics Epigenomics: Gene Expression C N F O N G A V B O N F O M A C S V U Master Course DNA/Protein Structurefunction Analysis and Prediction Lecture 12 DNA/NA Structure Prediction pigenectics pigenomics: Gene xpression ranscription factors

More information

A Structure-Based Flexible Search Method for Motifs in RNA

A Structure-Based Flexible Search Method for Motifs in RNA JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 7, 2007 Mary Ann Liebert, Inc. Pp. 908 926 DOI: 10.1089/cmb.2007.0061 A Structure-Based Flexible Search Method for Motifs in RNA ISANA VEKSLER-LUBLINSKY,

More information

RecitaLon CB Lecture #10 RNA Secondary Structure

RecitaLon CB Lecture #10 RNA Secondary Structure RecitaLon 3-19 CB Lecture #10 RNA Secondary Structure 1 Announcements 2 Exam 1 grades and answer key will be posted Friday a=ernoon We will try to make exams available for pickup Friday a=ernoon (probably

More information

The Ensemble of RNA Structures Example: some good structures of the RNA sequence

The Ensemble of RNA Structures Example: some good structures of the RNA sequence The Ensemble of RNA Structures Example: some good structures of the RNA sequence GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU free energy in kcal/mol (((((((..((((...))))...((((...))))(((((...)))))))))))).

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

BIOINFORMATICS. Fast evaluation of internal loops in RNA secondary structure prediction. Abstract. Introduction

BIOINFORMATICS. Fast evaluation of internal loops in RNA secondary structure prediction. Abstract. Introduction BIOINFORMATICS Fast evaluation of internal loops in RNA secondary structure prediction Abstract Motivation: Though not as abundant in known biological processes as proteins, RNA molecules serve as more

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Regular expression constrained sequence alignment revisited

Regular expression constrained sequence alignment revisited Regular expression constrained sequence alignment revisited Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson To cite this version: Gregory Kucherov, Tamar Pinhas, Michal Ziv-Ukelson. Regular expression

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Lecture 4: September 19

Lecture 4: September 19 CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots Impact Of The Energy Model On The omplexity Of RN Folding With Pseudoknots Saad Sheikh, Rolf Backofen Yann Ponty, niversity of Florida, ainesville, S lbert Ludwigs niversity, Freiburg, ermany LIX, NRS/Ecole

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Bio nformatics. Lecture 23. Saad Mneimneh

Bio nformatics. Lecture 23. Saad Mneimneh Bio nformatics Lecture 23 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

DANNY BARASH ABSTRACT

DANNY BARASH ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 11, Number 6, 2004 Mary Ann Liebert, Inc. Pp. 1169 1174 Spectral Decomposition for the Search and Analysis of RNA Secondary Structure DANNY BARASH ABSTRACT Scales

More information

Maximum sum contiguous subsequence Longest common subsequence Matrix chain multiplication All pair shortest path Kna. Dynamic Programming

Maximum sum contiguous subsequence Longest common subsequence Matrix chain multiplication All pair shortest path Kna. Dynamic Programming Dynamic Programming Arijit Bishnu arijit@isical.ac.in Indian Statistical Institute, India. August 31, 2015 Outline 1 Maximum sum contiguous subsequence 2 Longest common subsequence 3 Matrix chain multiplication

More information

Efficient Cache-oblivious String Algorithms for Bioinformatics

Efficient Cache-oblivious String Algorithms for Bioinformatics Efficient Cache-oblivious String Algorithms for ioinformatics Rezaul Alam Chowdhury Hai-Son Le Vijaya Ramachandran UTCS Technical Report TR-07-03 February 5, 2007 Abstract We present theoretical and experimental

More information

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many

More information

Today s Outline. CS 362, Lecture 13. Matrix Chain Multiplication. Paranthesizing Matrices. Matrix Multiplication. Jared Saia University of New Mexico

Today s Outline. CS 362, Lecture 13. Matrix Chain Multiplication. Paranthesizing Matrices. Matrix Multiplication. Jared Saia University of New Mexico Today s Outline CS 362, Lecture 13 Jared Saia University of New Mexico Matrix Multiplication 1 Matrix Chain Multiplication Paranthesizing Matrices Problem: We are given a sequence of n matrices, A 1, A

More information

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

6. DYNAMIC PROGRAMMING I

6. DYNAMIC PROGRAMMING I 6. DYNAMIC PRORAMMIN I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

Algorithmic Aspects of RNA Secondary Structures

Algorithmic Aspects of RNA Secondary Structures Algorithmic Aspects of RNA Secondary Structures Stéphane Vialette CNRS & LIGM, Université Paris-Est Marne-la-Vallée, France 214-115 S. Vialette (CNRS & LIGM) RNA Secondary Structures 214-215 1 / 124 Introduction

More information

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models Supplementary Material for CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models Chuong B Do, Daniel A Woods, and Serafim Batzoglou Stanford University, Stanford, CA 94305, USA, {chuongdo,danwoods,serafim}@csstanfordedu,

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

CSCE 222 Discrete Structures for Computing. Dr. Hyunyoung Lee

CSCE 222 Discrete Structures for Computing. Dr. Hyunyoung Lee CSCE 222 Discrete Structures for Computing Sequences and Summations Dr. Hyunyoung Lee Based on slides by Andreas Klappenecker 1 Sequences 2 Sequences A sequence is a function from a subset of the set of

More information

Computational approaches for RNA energy parameter estimation

Computational approaches for RNA energy parameter estimation omputational approaches for RNA energy parameter estimation by Mirela Ştefania Andronescu M.Sc., The University of British olumbia, 2003 B.Sc., Bucharest Academy of Economic Studies, 1999 A THESIS SUBMITTED

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh Computational Biology Lecture 5: ime speedup, General gap penalty function Saad Mneimneh We saw earlier that it is possible to compute optimal global alignments in linear space (it can also be done for

More information

RNA-RNA interaction is NP-complete and some approximation algorithms

RNA-RNA interaction is NP-complete and some approximation algorithms RNA-RNA interaction is NP-complete and some approximation algorithms Saad Mneimneh Visiting Professor, Computer Science, Hunter College of CUNY 695 Park Avenue, New York, NY 10021 saad@alum.mit.edu dedicated

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend

More information

6. DYNAMIC PROGRAMMING I

6. DYNAMIC PROGRAMMING I 6. DYNAMIC PROGRAMMING I weighted interval scheduling segmented least squares knapsack problem RNA secondary structure Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley Copyright 2013

More information

Grand Plan. RNA very basic structure 3D structure Secondary structure / predictions The RNA world

Grand Plan. RNA very basic structure 3D structure Secondary structure / predictions The RNA world Grand Plan RNA very basic structure 3D structure Secondary structure / predictions The RNA world very quick Andrew Torda, April 2017 Andrew Torda 10/04/2017 [ 1 ] Roles of molecules RNA DNA proteins genetic

More information

arxiv: v3 [math.co] 17 Jan 2018

arxiv: v3 [math.co] 17 Jan 2018 An infinite class of unsaturated rooted trees corresponding to designable RNA secondary structures arxiv:1709.08088v3 [math.co] 17 Jan 2018 Jonathan Jedwab Tara Petrie Samuel Simon 23 September 2017 (revised

More information

DAA Unit- II Greedy and Dynamic Programming. By Mrs. B.A. Khivsara Asst. Professor Department of Computer Engineering SNJB s KBJ COE, Chandwad

DAA Unit- II Greedy and Dynamic Programming. By Mrs. B.A. Khivsara Asst. Professor Department of Computer Engineering SNJB s KBJ COE, Chandwad DAA Unit- II Greedy and Dynamic Programming By Mrs. B.A. Khivsara Asst. Professor Department of Computer Engineering SNJB s KBJ COE, Chandwad 1 Greedy Method 2 Greedy Method Greedy Principal: are typically

More information

RNA Abstract Shape Analysis

RNA Abstract Shape Analysis ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:

More information

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n ) Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8

More information

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Genetic Algorithms: Basic Principles and Applications

Genetic Algorithms: Basic Principles and Applications Genetic Algorithms: Basic Principles and Applications C. A. MURTHY MACHINE INTELLIGENCE UNIT INDIAN STATISTICAL INSTITUTE 203, B.T.ROAD KOLKATA-700108 e-mail: murthy@isical.ac.in Genetic algorithms (GAs)

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Enumerating Binary Strings

Enumerating Binary Strings International Mathematical Forum, Vol. 7, 2012, no. 38, 1865-1876 Enumerating Binary Strings without r-runs of Ones M. A. Nyblom School of Mathematics and Geospatial Science RMIT University Melbourne,

More information

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University. Lexical Analysis Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Today

More information

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Pairwise alignment, Gunnar Klau, November 9, 2005, 16: Pairwise alignment, Gunnar Klau, November 9, 2005, 16:36 2012 2.1 Growth rates For biological sequence analysis, we prefer algorithms that have time and space requirements that are linear in the length

More information

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein! The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral

More information

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem

More information

Math 8803/4803, Spring 2008: Discrete Mathematical Biology

Math 8803/4803, Spring 2008: Discrete Mathematical Biology Math 8803/4803, Spring 2008: Discrete Mathematical Biology Prof. hristine Heitsch School of Mathematics eorgia Institute of Technology Lecture 12 February 4, 2008 Levels of RN structure Selective base

More information

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm

CSCI Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm CSCI 1760 - Final Project Report A Parallel Implementation of Viterbi s Decoding Algorithm Shay Mozes Brown University shay@cs.brown.edu Abstract. This report describes parallel Java implementations of

More information

Finding Consensus Energy Folding Landscapes Between RNA Sequences

Finding Consensus Energy Folding Landscapes Between RNA Sequences University of Central Florida Electronic Theses and Dissertations Masters Thesis (Open Access) Finding Consensus Energy Folding Landscapes Between RNA Sequences 2015 Joshua Burbridge University of Central

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model

Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model J. Waldispühl 1,3 P. Clote 1,2, 1 Department of Biology, Higgins 355, Boston

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Analysis of tree edit distance algorithms

Analysis of tree edit distance algorithms Analysis of tree edit distance algorithms Serge Dulucq 1 and Hélène Touzet 1 LaBRI - Universit Bordeaux I 33 0 Talence cedex, France Serge.Dulucq@labri.fr LIFL - Universit Lille 1 9 6 Villeneuve d Ascq

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information