Shape Based Indexing For Faster Search Of RNA Family Databases

Size: px
Start display at page:

Download "Shape Based Indexing For Faster Search Of RNA Family Databases"

Transcription

1 For Faster Search Of RNA Family Databases Stefan Janssen Jens Reeder Robert Giegerich 26. April 2008

2 RNA homology Why? build homologous groups find new group members How? sequence & structure

3 Covariance Models used in Rfam sequence & structure analog to HMMs base pairing expensive runtime: O ( N 4), N = sequence length R = #families (607 in Rfam) One query against Rfam = O ( R N 4) 20 min.

4 Abstract shapes approach thermodynamical folding + abstraction many levels of abstraction ignore primary sequence homologue = share a common shape shapes are composed of: [ ]

5 Shape based filter families represented by shapes store shapes in a data structure calculate shape for query string use query shape to search for correct family fast: O ( N 3) Problems: several families share same shape several shapes within one family sometimes sequence matters...

6 Abstract shapes approach Arrangement of helices adjacency embedding As trees adjacency = sibling nodes embedding = parent - child relation

7 Tree representation

8 Tree representation HL Hairpin-Loop: C UAGC G

9 Tree representation IL HL Internal loop: A G x G U

10 Tree representation IL HL three stacked regions: C x G and G x C and U x A

11 Tree representation SS SS E IL HL Single stranded regions CC and A + adjacency

12 Primary sequence SS SS E IL HL primary sequence: CCCGUAGCUAGCGGUACGA

13 Shape string: [[ ]] SS SS E IL HL [ ] HL base left, region, base right = [ ]

14 Shape string: [[ ]] SS SS E IL [ ] [] IL base left, region left, structure, region right, base right = [ + +structure + + ]

15 Shape string: [[ ]] SS SS E [[]] base left, structure, base right = structure

16 Shape string: [[ ]] SS [[]] SS E SS region = E = ɛ

17 Shape string: [[ ]] [[]] structure left, structure right = [ structure left + +structure right ]

18 General workflow of an 1 compute family shape spectrum fss (f ), f Rfam 2 merge all fss (f ) into a suitable data structure I Rfam 3 compute a query shape spectrum qss (x), for a given query x 4 access I Rfam to determine the match set: M(x) = {f qss(x) fss(f ) } 5 if M(x) = end, otherwise execute cmsearch f (x) f M(x)

19 Family shape spectra fss (f ) 1 1-SS cons-shape-index: fss(f ) = {π(ss cons)} f = family, x = sequence, π(x) = shape

20 Family shape spectra fss (f ) 1 1-SS cons-shape-index: fss(f ) = {π(ss cons)} 2 1-Consensus-shape-index: fss(f ) = rankmin{ x f RNAshapes(0, π, x)} f = family, x = sequence, π(x) = shape

21 Family shape spectra fss (f ) 1 1-SS cons-shape-index: fss(f ) = {π(ss cons)} 2 1-Consensus-shape-index: fss(f ) = rankmin{ x f RNAshapes(0, π, x)} 3 1-Hybrid-shape-index: fss(f ) = {fss(f ) Consensus fss(f ) SS cons } f = family, x = sequence, π(x) = shape

22 Family shape spectra fss (f ) Union-shape-index: fss(f ) = {π (rnafold C (SS cons, x)) x f } f = family, x = sequence, π(x) = shape

23 Family shape spectra fss (f ) 1 1-SS cons-shape-index: fss(f ) = {π(ss cons)} 2 1-Consensus-shape-index: fss(f ) = rankmin{ x f RNAshapes(0, π, x)} 3 1-Hybrid-shape-index: fss(f ) = {fss(f ) Consensus fss(f ) SS cons } 4 Union-shape-index: fss(f ) = {π (rnafold C (SS cons, x)) x f } 5 k-best-shape-index: fss(f ) = x f RNAshapes(k, π, x) f = family, x = sequence, π(x) = shape

24 Query shape spectra qss (x) 1 1-shape-spectrum: qss(x) = RNAshapes(1, π, x) 2 k-best-shape-spectrum: qss(x) = RNAshapes(k, π, x) f = family, x = sequence, π(x) = shape

25 Further Improvements 1 Multilevel Abstraction [_[_[_]]_[_[_]_]]_ [[_[]][_[]_]] [[[]][[]]] [[][[]]] [[][]]

26 Further Improvements 1 Multilevel Abstraction 2 Using folding energies Coronavirus packaging signal (RF00182) UnaL2 LINE 3' element (RF00436) Hepatitis C virus stem-loop VII (RF00468) common shape for all sequences: single hairpin = [] number of sequences GC = 0.57 GC = 0.55 GC = length normalized energies

27 Further Improvements 1 Multilevel Abstraction 2 Using folding energies 3 Omitting difficult families

28 k-best-shape-index 1-SS_cons-shape-index 1-consensus-shape-index 1-hybrid-shape-index 1-union-shape-index 1-RNAalifold-shape-index k-rnalishapes-shape-index cmsearch --hmmfilter

29 Thanks for your attention is available at:

30 CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG ((((((...(((..(((...))))))...(((..((...))..))))))))).. Shape Type 5: [[][]] Shape Type 4: [[][[]]] Shape Type 3: [[[]][[]]] Shape Type 2: [[ []][ [] ]] Shape Type 1: [ [ [ ]] [ [ ] ]] 20 C U G U G G A G G C* A* C* C* U* * C U A 10 C C 30 A * U G U G C * C * G C A A C U U C * * A G A U C C U 40 A 50 A U* A U* G C* A U* C 56 G* 1 G A C * G 1

31

32 [[[[]]]][[[]]] [[][]][][] [[[]][[[]]]] [[][[][]]] 53,116 more shapes 12,156 more shapes [][[[[]]]] [] _[_[_[]]]_ >Query: hg17_ct_rnazset190_s5031 _[_[_[_[_[]]_[_[]_]_]_]_]_[] [_[_[_[_[]]_[_[]_]_]_]_]_[] _[_[_[_[_[]]_[_[]_]]_]_]_[]_ [_[_[[_[]][_[]_]]_]_][] [[_[_[[_[]][_[]_]]_]_][]] [_[]_][_[_[[_[]_][]]_]_] [[[[[]][[]]]]][] [[[[[[]][[]]]]][]] [[]][[[[[]][]]]] 112,489 more shapes [[[_[_[]_]_]_]_] [_[[_[[]_]_]_]]_ [_[_[_[]_]_]_][_[_[]_]] 93,840 more shapes [[_[_[[_[]][_[]_]]_]_][]] [_[]_][_[_[[_[]_][]]_]_] [[[[[]][[]]]]][] 59,337 more shapes [[[[[[]][[]]]]][]] [[]][[[[[]][]]]]

33 cmsearch HMM-filter BLAST-filter

34

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable RNA STRUCTURE RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U wobble pairing Bases can only pair with one other base. 23 Hydrogen Bonds more stable RNA Basics transfer RNA (trna) messenger

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

CS681: Advanced Topics in Computational Biology

CS681: Advanced Topics in Computational Biology CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 10 Lecture 1 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ RNA folding Prediction of secondary structure

More information

Bioinformatics Advance Access published July 14, Jens Reeder, Robert Giegerich

Bioinformatics Advance Access published July 14, Jens Reeder, Robert Giegerich Bioinformatics Advance Access published July 14, 2005 BIOINFORMATICS Consensus Shapes: An Alternative to the Sankoff Algorithm for RNA Consensus Structure Prediction Jens Reeder, Robert Giegerich Faculty

More information

RNA Abstract Shape Analysis

RNA Abstract Shape Analysis ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:

More information

RNA Structure Prediction and Comparison. RNA folding

RNA Structure Prediction and Comparison. RNA folding RNA Structure Prediction and Comparison Session 3 RNA folding Faculty of Technology robert@techfak.uni-bielefeld.de Bielefeld, WS 2013/2014 Base Pair Maximization This was the first structure prediction

More information

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi

More information

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems -

BIOINF 4120 Bioinforma2cs 2 - Structures and Systems - BIOINF 4120 Bioinforma2cs 2 - Structures and Systems - Oliver Kohlbacher Summer 2014 3. RNA Structure Part II Overview RNA Folding Free energy as a criterion Folding free energy of RNA Zuker- SCegler algorithm

More information

INF2220: algorithms and data structures Series 1

INF2220: algorithms and data structures Series 1 Universitetet i Oslo Institutt for Informatikk I. Yu, D. Karabeg INF2220: algorithms and data structures Series 1 Topic Function growth & estimation of running time, trees (Exercises with hints for solution)

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary

More information

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming ombinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming Matthew Macauley Department of Mathematical Sciences lemson niversity http://www.math.clemson.edu/~macaule/ Math

More information

Algebraic Dynamic Programming. Solving Satisfiability with ADP

Algebraic Dynamic Programming. Solving Satisfiability with ADP Algebraic Dynamic Programming Session 12 Solving Satisfiability with ADP Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Summer 2013 http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis Fang XY, Luo ZG, Wang ZH. Predicting RNA secondary structure using profile stochastic context-free grammars and phylogenic analysis. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 23(4): 582 589 July 2008

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

A Structure-Based Flexible Search Method for Motifs in RNA

A Structure-Based Flexible Search Method for Motifs in RNA JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 7, 2007 Mary Ann Liebert, Inc. Pp. 908 926 DOI: 10.1089/cmb.2007.0061 A Structure-Based Flexible Search Method for Motifs in RNA ISANA VEKSLER-LUBLINSKY,

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models

CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models Supplementary Material for CONTRAfold: RNA Secondary Structure Prediction without Physics-Based Models Chuong B Do, Daniel A Woods, and Serafim Batzoglou Stanford University, Stanford, CA 94305, USA, {chuongdo,danwoods,serafim}@csstanfordedu,

More information

Combinatorial approaches to RNA folding Part I: Basics

Combinatorial approaches to RNA folding Part I: Basics Combinatorial approaches to RNA folding Part I: Basics Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Spring 2015 M. Macauley (Clemson)

More information

Digital search trees JASS

Digital search trees JASS Digital search trees Analysis of different digital trees with Rice s integrals. JASS Nicolai v. Hoyningen-Huene 28.3.2004 28.3.2004 JASS 04 - Digital search trees 1 content Tree Digital search tree: Definition

More information

RNA secondary structure prediction. Farhat Habib

RNA secondary structure prediction. Farhat Habib RNA secondary structure prediction Farhat Habib RNA RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) Some forms of RNA can form secondary structures

More information

SnoPatrol: How many snorna genes are there? Supplementary

SnoPatrol: How many snorna genes are there? Supplementary SnoPatrol: How many snorna genes are there? Supplementary materials. Paul P. Gardner 1, Alex G. Bateman 1 and Anthony M. Poole 2,3 1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton,

More information

RNA Secondary Structure Prediction

RNA Secondary Structure Prediction RN Secondary Structure Prediction Perry Hooker S 531: dvanced lgorithms Prof. Mike Rosulek University of Montana December 10, 2010 Introduction Ribonucleic acid (RN) is a macromolecule that is essential

More information

COMBINATORICS OF LOCALLY OPTIMAL RNA SECONDARY STRUCTURES

COMBINATORICS OF LOCALLY OPTIMAL RNA SECONDARY STRUCTURES COMBINATORICS OF LOCALLY OPTIMAL RNA SECONDARY STRUCTURES ÉRIC FUSY AND PETER CLOTE Abstract. It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366

More information

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Rapid Dynamic Programming Algorithms for RNA Secondary Structure

Rapid Dynamic Programming Algorithms for RNA Secondary Structure ADVANCES IN APPLIED MATHEMATICS 7,455-464 I f Rapid Dynamic Programming Algorithms for RNA Secondary Structure MICHAEL S. WATERMAN* Depurtments of Muthemutics und of Biologicul Sciences, Universitk of

More information

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options for grouping with metabolism

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options for grouping with metabolism OECD QSAR Toolbox v.4.1 Tutorial illustrating new options for grouping with metabolism Outlook Background Objectives Specific Aims The exercise Workflow 2 Background Grouping with metabolism is a procedure

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Searching genomes for non-coding RNA using FastR

Searching genomes for non-coding RNA using FastR Searching genomes for non-coding RNA using FastR Shaojie Zhang Brian Haas Eleazar Eskin Vineet Bafna Keywords: non-coding RNA, database search, filtration, riboswitch, bacterial genome. Address for correspondence:

More information

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters 1 Binod Kumar, Assistant Professor, Computer Sc. Dept, ISTAR, Vallabh Vidyanagar,

More information

A tutorial on RNA folding methods and resources

A tutorial on RNA folding methods and resources A tutorial on RNA folding methods and resources Alain Denise, LRI/IGM, Université Paris-Sud with invaluable help from Yann Ponty, CNRS/Ecole Polytechnique 1 Master BIBS 2014-2015 Goals To help your work

More information

Classified Dynamic Programming

Classified Dynamic Programming Bled, Feb. 2009 Motivation Our topic: Programming methodology A trade-off in dynamic programming between search space design and evaluation of candidates A trade-off between modifying your code and adding

More information

DG/UX System. MAC Regions

DG/UX System. MAC Regions DG/UX System Provides mandatory access controls MAC label identifies security level Default labels, but can define others Initially Subjects assigned MAC label of parent Initial label assigned to user,

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications, Cvicek et al. Supporting Text 1 Here we compare the GRoSS alignment

More information

Least Random Suffix/Prefix Matches in Output-Sensitive Time

Least Random Suffix/Prefix Matches in Output-Sensitive Time Least Random Suffix/Prefix Matches in Output-Sensitive Time Niko Välimäki Department of Computer Science University of Helsinki nvalimak@cs.helsinki.fi 23rd Annual Symposium on Combinatorial Pattern Matching

More information

Current; Forest Tree Theorem; Potential Functions and their Bounds

Current; Forest Tree Theorem; Potential Functions and their Bounds April 13, 2008 Franklin Kenter Current; Forest Tree Theorem; Potential Functions and their Bounds 1 Introduction In this section, we will continue our discussion on current and induced current. Review

More information

Bio nformatics. Lecture 23. Saad Mneimneh

Bio nformatics. Lecture 23. Saad Mneimneh Bio nformatics Lecture 23 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Sequence Analysis and Databases 2: Sequences and Multiple Alignments 1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:

More information

Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model

Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model J. Waldispühl 1,3 P. Clote 1,2, 1 Department of Biology, Higgins 355, Boston

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

A Browser for Pig Genome Data

A Browser for Pig Genome Data A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes

More information

Algebraic Dynamic Programming

Algebraic Dynamic Programming Algebraic Dynamic Programming Unit 2.b: Introduction to Bellman s GAP Robert Giegerich 1 (Lecture) Benedikt Löwes (Exercises) Faculty of Technology Bielefeld University http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

Taxonomical Classification using:

Taxonomical Classification using: Taxonomical Classification using: Extracting ecological signal from noise: introduction to tools for the analysis of NGS data from microbial communities Bergen, April 19-20 2012 INTRODUCTION Taxonomical

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo,

More information

Semi-Supervised CONTRAfold for RNA Secondary Structure Prediction: A Maximum Entropy Approach

Semi-Supervised CONTRAfold for RNA Secondary Structure Prediction: A Maximum Entropy Approach Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2011 Semi-Supervised CONTRAfold for RNA Secondary Structure Prediction: A Maximum Entropy Approach Jianping

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms Professor Henry Carter Fall 2016 Recap Space-time tradeoffs allow for faster algorithms at the cost of space complexity overhead Dynamic

More information

RNA and Protein Structure Prediction

RNA and Protein Structure Prediction RNA and Protein Structure Prediction Bioinformatics: Issues and Algorithms CSE 308-408 Spring 2007 Lecture 18-1- Outline Multi-Dimensional Nature of Life RNA Secondary Structure Prediction Protein Structure

More information

DNA/RNA Structure Prediction

DNA/RNA Structure Prediction C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Master Course DNA/Protein Structurefunction Analysis and Prediction Lecture 12 DNA/RNA Structure Prediction Epigenectics Epigenomics:

More information

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Robert B. Gramacy University of Chicago Booth School of Business faculty.chicagobooth.edu/robert.gramacy

More information

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn Priority Queue / Heap Stores (key,data) pairs (like dictionary) But, different set of operations: Initialize-Heap: creates new empty heap

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information

Introduction to Polymer Physics

Introduction to Polymer Physics Introduction to Polymer Physics Enrico Carlon, KU Leuven, Belgium February-May, 2016 Enrico Carlon, KU Leuven, Belgium Introduction to Polymer Physics February-May, 2016 1 / 28 Polymers in Chemistry and

More information

Protein Structure Prediction and Display

Protein Structure Prediction and Display Protein Structure Prediction and Display Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each

More information

Detecting non-coding RNA in Genomic Sequences

Detecting non-coding RNA in Genomic Sequences Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université

More information

Protein Secondary Structure Prediction

Protein Secondary Structure Prediction part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 the goal is the prediction of the secondary structure conformation which is local each amino

More information

Fibonacci (Min-)Heap. (I draw dashed lines in place of of circular lists.) 1 / 17

Fibonacci (Min-)Heap. (I draw dashed lines in place of of circular lists.) 1 / 17 Fibonacci (Min-)Heap A forest of heap-order trees (parent priority child priority). Roots in circular doubly-linked list. Pointer to minimum-priority root. Siblings in circular doubly-linked list; parent

More information

Enhancing Active Automata Learning by a User Log Based Metric

Enhancing Active Automata Learning by a User Log Based Metric Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor

More information

Algebraic Dynamic Programming. Dynamic Programming, Old Country Style

Algebraic Dynamic Programming. Dynamic Programming, Old Country Style Algebraic Dynamic Programming Session 2 Dynamic Programming, Old Country Style Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Summer 2013 http://www.techfak.uni-bielefeld.de/ags/pi/lehre/adp

More information

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals OECD QSAR Toolbox v.4.1 Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals Background Outlook Objectives The exercise Workflow 2 Background This is a

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

De novo prediction of structural noncoding RNAs

De novo prediction of structural noncoding RNAs 1/ 38 De novo prediction of structural noncoding RNAs Stefan Washietl 18.417 - Fall 2011 2/ 38 Outline Motivation: Biological importance of (noncoding) RNAs Algorithms to predict structural noncoding RNAs

More information

Supplementary Material

Supplementary Material Supplementary Material Sm-I Formal Description of the Sampling Process In the sequel, given an RNA molecule r consisting of n nucleotides, we denote the corresponding sequence fragment from position i

More information

Towards a Comprehensive Annotation of Structured RNAs in Drosophila

Towards a Comprehensive Annotation of Structured RNAs in Drosophila Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules

More information

Notes on Logarithmic Lower Bounds in the Cell Probe Model

Notes on Logarithmic Lower Bounds in the Cell Probe Model Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.)

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802

Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802 Mutual Information & Genotype-Phenotype Association Norman MacDonald January 31, 2011 CSCI 4181/6802 2 Overview What is information (specifically Shannon Information)? What are information entropy and

More information

TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs

TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs 11570 11581 Nucleic Acids Research, 2017, Vol. 45, No. 20 Published online 28 September 2017 doi: 10.1093/nar/gkx815 TurboFold II: RNA structural alignment and secondary structure prediction informed by

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

13 Comparative RNA analysis

13 Comparative RNA analysis 13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,

More information

BCB 444/544 Fall 07 Dobbs 1

BCB 444/544 Fall 07 Dobbs 1 BCB 444/544 Required Reading (before lecture) Lecture 25 Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction Chp 15 - pp 214-230 More RNA Structure Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab

More information

Efficient Reassembling of Graphs, Part 1: The Linear Case

Efficient Reassembling of Graphs, Part 1: The Linear Case Efficient Reassembling of Graphs, Part 1: The Linear Case Assaf Kfoury Boston University Saber Mirzaei Boston University Abstract The reassembling of a simple connected graph G = (V, E) is an abstraction

More information

Lesson 3: Networks and Matrix Arithmetic

Lesson 3: Networks and Matrix Arithmetic Opening Exercise Suppose a subway line also connects the four cities. Here is the subway and bus line network. The bus routes connecting the cities are represented by solid lines, and the subway routes

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach

Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 1, NO. 1, JANUARY-MARCH 2004 1 Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach Matthias Höchsmann, Björn

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

On low energy barrier folding pathways for nucleic acid sequences

On low energy barrier folding pathways for nucleic acid sequences On low energy barrier folding pathways for nucleic acid sequences Leigh-Anne Mathieson and Anne Condon U. British Columbia, Department of Computer Science, Vancouver, BC, Canada Abstract. Secondary structure

More information

Tutorial 4. Dynamic Set: Amortized Analysis

Tutorial 4. Dynamic Set: Amortized Analysis Tutorial 4 Dynamic Set: Amortized Analysis Review Binary tree Complete binary tree Full binary tree 2-tree P115 Unlike common binary tree, the base case is not an empty tree, but a external node Heap Binary

More information

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October

ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October ENS Lyon Camp. Day 2. Basic group. Cartesian Tree. 26 October Contents 1 Cartesian Tree. Definition. 1 2 Cartesian Tree. Construction 1 3 Cartesian Tree. Operations. 2 3.1 Split............................................

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

Prediction of RNA secondary structure including kissing hairpin motifs

Prediction of RNA secondary structure including kissing hairpin motifs Prediction of RNA secondary structure including kissing hairpin motifs Corinna Theis, Stefan Janssen, and Robert Giegerich Faculty of Technology, Bielefeld University 33501 Bielefeld, Germany robert@techfak.uni-bielefeld.de

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

SUPPLEMENTARY MATERIALS

SUPPLEMENTARY MATERIALS SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:

More information

Using distance geomtry to generate structures

Using distance geomtry to generate structures Using distance geomtry to generate structures David A. Case Genomic systems and structures, Spring, 2009 Converting distances to structures Metric Matrix Distance Geometry To describe a molecule in terms

More information

Appendix of Computational Protein Design Using AND/OR Branch and Bound Search

Appendix of Computational Protein Design Using AND/OR Branch and Bound Search Appendix of Computational Protein Design Using AND/OR Branch and Bound Search Yichao Zhou 1, Yuexin Wu 1, and Jianyang Zeng 1,2, 1 Institute for Interdisciplinary Information Sciences, Tsinghua University,

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information