A Method for Aligning RNA Secondary Structures

Similar documents
proteins are the basic building blocks and active players in the cell, and

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Structure-Based Comparison of Biomolecules

RNA Abstract Shape Analysis

In Genomes, Two Types of Genes

EVALUATION OF RNA SECONDARY STRUCTURE MOTIFS USING REGRESSION ANALYSIS

Homology Modeling. Roberto Lins EPFL - summer semester 2005

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Multiple Sequence Alignment

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

Bio nformatics. Lecture 23. Saad Mneimneh

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Sequence analysis and comparison

A Structure-Based Flexible Search Method for Motifs in RNA

Genomics and bioinformatics summary. Finding genes -- computer searches

Combinatorial approaches to RNA folding Part I: Basics

Searching for Noncoding RNA

Comparative Bioinformatics Midterm II Fall 2004

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Mitochondrial Genome Annotation

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

BLAST. Varieties of BLAST

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Computational Approaches for determination of Most Probable RNA Secondary Structure Using Different Thermodynamics Parameters

Week 10: Homology Modelling (II) - HHpred

RNA secondary structure prediction. Farhat Habib

Sequence Alignment Techniques and Their Uses

Algorithms in Bioinformatics

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

SUPPLEMENTARY INFORMATION

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

A Novel Statistical Model for the Secondary Structure of RNA

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

Basic Local Alignment Search Tool

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Overview Multiple Sequence Alignment

Domain-based computational approaches to understand the molecular basis of diseases

Pairwise & Multiple sequence alignments

A phylogenetic view on RNA structure evolution

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes

13 Comparative RNA analysis

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Position-specific scoring matrices (PSSM)

Motivating the need for optimal sequence alignments...

Comparative Network Analysis

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Genome 559 Wi RNA Function, Search, Discovery

Sequence Alignment (chapter 6)

Computational approaches for RNA energy parameter estimation

STRUCTURAL BIOINFORMATICS I. Fall 2015

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Tools and Algorithms in Bioinformatics

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Bioinformatics and BLAST

Detecting local deviations. Optimisation and applications to RNA-gene searching.

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Searching genomes for non-coding RNA using FastR

SA-REPC - Sequence Alignment with a Regular Expression Path Constraint

Predicting RNA Secondary Structure

Algorithms in Bioinformatics

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

An Introduction to Sequence Similarity ( Homology ) Searching

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Large-Scale Genomic Surveys

Hidden Markov Models

Sequence analysis and Genomics

RNA and Protein Structure Prediction

EECS730: Introduction to Bioinformatics

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Computational Design of New and Recombinant Selenoproteins

DNA/RNA Structure Prediction

EBI web resources II: Ensembl and InterPro

HMMs and biological sequence analysis

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Computational Molecular Biology (

Hairpin Database: Why and How?

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Transcription:

Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1

Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 2

Molecule building blocks Protein building blocks: 20 types of amino acid RN building blocks: Purine: denine, uanine Pyrimidine: ytosine, racil 3

RN structure elements RN sequence folds to form secondary/tertiary structure Majority of base connections involve two bases Watson-rick: or Non-canonical: or Basic structure elements of RN 4

Definition of structural components iven an RN sequence: : r 1 r 2 r 3 r n Two types of structural components [1] : Single bases (blue) Bonded base pairs (red) [1] Zuker, M. (1989) Science 5

Secondary structure constraint (1) Prohibited! No common base can be shared by any two pairs [2]. Bad: is shared by two pairs: - and - (a) OOD (b) BD [2] Hofacker, I.L. (2003) NR 6

Secondary structure constraint (2) hairpin Prohibited! hairpin element must have at least 3 bases on the loop part [3]. Bad: only two bases ( and ) present in the loop (a) OOD (b) BD [3] Zuker, M. (1991) NR 7

Secondary structure constraint (3) Pseudoknots are not included [4] (a) BD (b) OOD (nested structure) (c) OOD (branching) Prohibited! [4] Mathews, D.H. (1999) JMB 8

RN secondary structure representation schemes a. Bond annotation [5] b. rc representation [6] c. Tree representation [7] d. Nested parenthesis representation [8] [5] Shapiro, B. (1990) BIOS [6] Zhang, K. (1999) PM [7] Ma, B. (2002) TS [8] Hofacker, I.L. (2002) JMB 9

Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 10

Extended circle model circle 5 circle 4 circle 3 circle 2 circle 1 circle 7 circle 0 circle 6 circle 8 ircle model [9] : circle 0:,,,,, circle 1:,,, circle 7:,,,, circle 8:,,,,,, Sequential order between components: > > -> > -> - [9] Liu, J. (2005) BM Bioinformatics 11

Hierarchical organization circles are organized in a tree-like hierarchy circle 5 circle 4 circle 3 circle 2 circle 1 circle 7 circle 0 circle 6 circle 8 circle 3 circle 4 circle 5 circle 0 circle 1 circle 2 circle 6 circle 7 circle 8 12

Hierarchical relationship between two structural components (1) the same circle: e.g. each pair from,,, -, -,, - (2) descendant/ancestor circles: e.g. pair (, -) (3) cousin circles: e.g. pairs (, ), (-, -) and (, -) (1) (2) (3) circle 13

Partial structure induced by a structural component 10 30 parent structure child structure 14

Structural alignment rules (1) 1 precedes 2 iff B 1 precedes B 2 where 1, 2, B 1,B 2 are structural components. 15

Structural alignment rules (2) RN 1 RN 2 (a) (a) Same loop relationship preserved: 1 is in the same loop as 2 iff B 1 is in the same loop as B 2 (b) ncestor/descendant relationship preserved: 1 is ancestor of 2 iff B 1 is ancestor of B 2 (b) (c) ousin relationship preserved: 1 is cousin of 2 iff B 1 is cousin of B 2 (c) 16

Example alignment First RN..((...(((...)))((.(...))).)).. Second RN..((..((...))(((...))).)).. ll structural alignment rules must be satisfied for a valid alignment In addition, a single base can not be aligned with a base pair lignment Result..((...(((...)))((.(.....))).)).. - ----..((.. ((... ))(( (...))).)).. 17

Dynamic programming algorithm: overview First structure Second structure DP scoring table - - - The best alignment between partial structures of and - 18

ase 1 19

ase 2 20

ase 3 21

ase 4.1 22

ase 4.2 23

Example of matching score function Score function of matching two equal-length structural components: i.e. 1, if both a and b are single bases and a = g( a, b ) = 2, if both a and b are base pairs and a = b 0, otherwise ap penalty equals 0 Extending g to the whole set of matched component pairs, our goal is to maximize f(r 1, R 2 ) f ( R, R2 ) = g(, 1 a i bi i ) b 24

ell type 1 : single base vs. single base?..(...)....(...). ()..(...). --...(...). (B)..(...). --- -...(...). ()..(...). --- -...(...). 25

ell type 2: base pair vs. single base? first score second score?? 26

ell type 2: base pair vs. single base (first score) (...)?...(...). (...) -----...(...). (... ) ------- -...(...). 27

ell type 2: base pair vs. single base (second score)..(...)?...(...). ()..(...) ---...(...). (B).. (...) ----- --...(...). ().. (...) ---------- -------...(...). 28

ell type 3: base pair vs. base pair..(...)?...(...) () (B) ()?? (b1)?? (b2) 29

ell type 3: base pair vs. base pair (first score) (...)? (...) () (B) () (...) (...) (... ) -- -- (...) (...) -- (... ) 30

ell type 3: base pair vs. base pair (2 nd & 3 rd score)..(...)? (...) (...)?...(...) (... ) ------ --...(...) (...) ----...(...) (...) ------- ---... (...) 31

ell type 3: base pair vs. base pair (final score)? ()..(...)..(...) --...(...)...(...) (B) ().. (...) ---- --...(...)..(...) ---- --... (...) (D).. (...) --------- -------...(...)..(...) -------- -------... (...) 32

nalysis of algorithm Time and space complexity Each score is calculated only once. Time is bounded by the number of score calculations needed to fill up the table. Each base pair will contribute to two or four score calculations. Single bases: N s ; base pairs: N p Total number of score calculations: N s2 +4N s N p +4N 2 p =O(N 2 ) N 2 s score calculations are contributed by two single bases 4N s N p score calculations are contributed by one single base and one base pair 4N p2 score calculations are contributed by two base pairs 33

Software RSmatch http://aria.njit.edu/rnacenter/rsmatch/ 34

Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 35

Motif example: detection/instantiation Motif structure is known IB ambiguity symbols: N: W: H: not 36

ap Penalty Example motif structure subject structure 37

Position independent scoring matrices Two scoring matrices ap penalty: -3 for each single base, -6 for each base pair, involved in the gap 38

Motifs used in the experiments (a) HSL3 (b) IRE HSL3 has a typical stem loop structure with two flanking tails IRE has specific stem-loop structure for gene regulation related to cell iron metabolism Wildcard n is allowed to match with 0 or 1 nucleotide IB code: M:, T/; Y:, T/; H: not ; R:, ; W:, T; 39

Experiments Performance measurements: sensitivity (recall) and specificity (precision) 19,986 human RefSeq mrn sequences were obtained from NBI; 39,972 TR regions were extracted Each TR sequence was chopped and folded into secondary structures using Vienna RN package, yielding ~575,000 structures ompare RSmatch with PatSearch [10] [10] Pesole. (2000) Bioinformatics 40

hop and fold TR sequences TR 50 100 150 200 ORF TR ORF 50 100 150 200 ORF: Open Reading Frame 41

Detecting HSL3 motif PatSearch: specificity (98.2%), sensitivity (87.1%). Several histone genes (i.e. NM_003542, NM_003548) were found by RSmatch, but not by PatSearch. 42

Detecting IRE motif se PatSearch to search 39,972 TR sequences for IRE motif and get 27 hit structures belonging to 18 TR sequences The 18 TR sequences were chopped and folded into 1,196 structures ompare RSmatch, Rsearch [11] and stemloc [12]. well-known IRE-containing structure (NM_000032) was used as the query (it does not have wildcard or ambiguity symbols since Rsearch and stemloc cannot handle them) [11] Klein, R.J. (2003) BM Bioinformatics [12] Holms, I. (2002) PSB 43

Experimental results for IRE motif 44

Dealing with complex structures 45

Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 46

Extension to multiple structural alignment search small database YES expand best alignment score (best alignment) < δ OR non-expandable NO pairwise match profile expand seed alignment seed alignment 47

Example expand expand 48

RMulti Webserver http://aria.njit.edu/rnacenter/multi.html 49

Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (Rmulti) ombining RSmatch with RNView onclusion and future work 50

51

52

Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 53

onclusion n efficient algorithm RSmatch to align and analyze RN secondary structures multiple structural alignment tool RMulti visualization tool combining RSmatch with RNView 54

Future Work Extending RSmatch to handle pseudoknots Large-scale genome-wide motif mining Indexing very large RN structure databases Improved multiple structural alignment of RN sequences RN classification and clustering RN-RN interactions and protein-rn interactions 55

56