Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1
Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 2
Molecule building blocks Protein building blocks: 20 types of amino acid RN building blocks: Purine: denine, uanine Pyrimidine: ytosine, racil 3
RN structure elements RN sequence folds to form secondary/tertiary structure Majority of base connections involve two bases Watson-rick: or Non-canonical: or Basic structure elements of RN 4
Definition of structural components iven an RN sequence: : r 1 r 2 r 3 r n Two types of structural components [1] : Single bases (blue) Bonded base pairs (red) [1] Zuker, M. (1989) Science 5
Secondary structure constraint (1) Prohibited! No common base can be shared by any two pairs [2]. Bad: is shared by two pairs: - and - (a) OOD (b) BD [2] Hofacker, I.L. (2003) NR 6
Secondary structure constraint (2) hairpin Prohibited! hairpin element must have at least 3 bases on the loop part [3]. Bad: only two bases ( and ) present in the loop (a) OOD (b) BD [3] Zuker, M. (1991) NR 7
Secondary structure constraint (3) Pseudoknots are not included [4] (a) BD (b) OOD (nested structure) (c) OOD (branching) Prohibited! [4] Mathews, D.H. (1999) JMB 8
RN secondary structure representation schemes a. Bond annotation [5] b. rc representation [6] c. Tree representation [7] d. Nested parenthesis representation [8] [5] Shapiro, B. (1990) BIOS [6] Zhang, K. (1999) PM [7] Ma, B. (2002) TS [8] Hofacker, I.L. (2002) JMB 9
Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 10
Extended circle model circle 5 circle 4 circle 3 circle 2 circle 1 circle 7 circle 0 circle 6 circle 8 ircle model [9] : circle 0:,,,,, circle 1:,,, circle 7:,,,, circle 8:,,,,,, Sequential order between components: > > -> > -> - [9] Liu, J. (2005) BM Bioinformatics 11
Hierarchical organization circles are organized in a tree-like hierarchy circle 5 circle 4 circle 3 circle 2 circle 1 circle 7 circle 0 circle 6 circle 8 circle 3 circle 4 circle 5 circle 0 circle 1 circle 2 circle 6 circle 7 circle 8 12
Hierarchical relationship between two structural components (1) the same circle: e.g. each pair from,,, -, -,, - (2) descendant/ancestor circles: e.g. pair (, -) (3) cousin circles: e.g. pairs (, ), (-, -) and (, -) (1) (2) (3) circle 13
Partial structure induced by a structural component 10 30 parent structure child structure 14
Structural alignment rules (1) 1 precedes 2 iff B 1 precedes B 2 where 1, 2, B 1,B 2 are structural components. 15
Structural alignment rules (2) RN 1 RN 2 (a) (a) Same loop relationship preserved: 1 is in the same loop as 2 iff B 1 is in the same loop as B 2 (b) ncestor/descendant relationship preserved: 1 is ancestor of 2 iff B 1 is ancestor of B 2 (b) (c) ousin relationship preserved: 1 is cousin of 2 iff B 1 is cousin of B 2 (c) 16
Example alignment First RN..((...(((...)))((.(...))).)).. Second RN..((..((...))(((...))).)).. ll structural alignment rules must be satisfied for a valid alignment In addition, a single base can not be aligned with a base pair lignment Result..((...(((...)))((.(.....))).)).. - ----..((.. ((... ))(( (...))).)).. 17
Dynamic programming algorithm: overview First structure Second structure DP scoring table - - - The best alignment between partial structures of and - 18
ase 1 19
ase 2 20
ase 3 21
ase 4.1 22
ase 4.2 23
Example of matching score function Score function of matching two equal-length structural components: i.e. 1, if both a and b are single bases and a = g( a, b ) = 2, if both a and b are base pairs and a = b 0, otherwise ap penalty equals 0 Extending g to the whole set of matched component pairs, our goal is to maximize f(r 1, R 2 ) f ( R, R2 ) = g(, 1 a i bi i ) b 24
ell type 1 : single base vs. single base?..(...)....(...). ()..(...). --...(...). (B)..(...). --- -...(...). ()..(...). --- -...(...). 25
ell type 2: base pair vs. single base? first score second score?? 26
ell type 2: base pair vs. single base (first score) (...)?...(...). (...) -----...(...). (... ) ------- -...(...). 27
ell type 2: base pair vs. single base (second score)..(...)?...(...). ()..(...) ---...(...). (B).. (...) ----- --...(...). ().. (...) ---------- -------...(...). 28
ell type 3: base pair vs. base pair..(...)?...(...) () (B) ()?? (b1)?? (b2) 29
ell type 3: base pair vs. base pair (first score) (...)? (...) () (B) () (...) (...) (... ) -- -- (...) (...) -- (... ) 30
ell type 3: base pair vs. base pair (2 nd & 3 rd score)..(...)? (...) (...)?...(...) (... ) ------ --...(...) (...) ----...(...) (...) ------- ---... (...) 31
ell type 3: base pair vs. base pair (final score)? ()..(...)..(...) --...(...)...(...) (B) ().. (...) ---- --...(...)..(...) ---- --... (...) (D).. (...) --------- -------...(...)..(...) -------- -------... (...) 32
nalysis of algorithm Time and space complexity Each score is calculated only once. Time is bounded by the number of score calculations needed to fill up the table. Each base pair will contribute to two or four score calculations. Single bases: N s ; base pairs: N p Total number of score calculations: N s2 +4N s N p +4N 2 p =O(N 2 ) N 2 s score calculations are contributed by two single bases 4N s N p score calculations are contributed by one single base and one base pair 4N p2 score calculations are contributed by two base pairs 33
Software RSmatch http://aria.njit.edu/rnacenter/rsmatch/ 34
Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 35
Motif example: detection/instantiation Motif structure is known IB ambiguity symbols: N: W: H: not 36
ap Penalty Example motif structure subject structure 37
Position independent scoring matrices Two scoring matrices ap penalty: -3 for each single base, -6 for each base pair, involved in the gap 38
Motifs used in the experiments (a) HSL3 (b) IRE HSL3 has a typical stem loop structure with two flanking tails IRE has specific stem-loop structure for gene regulation related to cell iron metabolism Wildcard n is allowed to match with 0 or 1 nucleotide IB code: M:, T/; Y:, T/; H: not ; R:, ; W:, T; 39
Experiments Performance measurements: sensitivity (recall) and specificity (precision) 19,986 human RefSeq mrn sequences were obtained from NBI; 39,972 TR regions were extracted Each TR sequence was chopped and folded into secondary structures using Vienna RN package, yielding ~575,000 structures ompare RSmatch with PatSearch [10] [10] Pesole. (2000) Bioinformatics 40
hop and fold TR sequences TR 50 100 150 200 ORF TR ORF 50 100 150 200 ORF: Open Reading Frame 41
Detecting HSL3 motif PatSearch: specificity (98.2%), sensitivity (87.1%). Several histone genes (i.e. NM_003542, NM_003548) were found by RSmatch, but not by PatSearch. 42
Detecting IRE motif se PatSearch to search 39,972 TR sequences for IRE motif and get 27 hit structures belonging to 18 TR sequences The 18 TR sequences were chopped and folded into 1,196 structures ompare RSmatch, Rsearch [11] and stemloc [12]. well-known IRE-containing structure (NM_000032) was used as the query (it does not have wildcard or ambiguity symbols since Rsearch and stemloc cannot handle them) [11] Klein, R.J. (2003) BM Bioinformatics [12] Holms, I. (2002) PSB 43
Experimental results for IRE motif 44
Dealing with complex structures 45
Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 46
Extension to multiple structural alignment search small database YES expand best alignment score (best alignment) < δ OR non-expandable NO pairwise match profile expand seed alignment seed alignment 47
Example expand expand 48
RMulti Webserver http://aria.njit.edu/rnacenter/multi.html 49
Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (Rmulti) ombining RSmatch with RNView onclusion and future work 50
51
52
Outline Introduction Structural alignment of RN (preliminaries, RSmatch algorithm, software) Experiments (RN motif detection) Multiple structural alignment (RMulti) ombining RSmatch with RNView onclusion and future work 53
onclusion n efficient algorithm RSmatch to align and analyze RN secondary structures multiple structural alignment tool RMulti visualization tool combining RSmatch with RNView 54
Future Work Extending RSmatch to handle pseudoknots Large-scale genome-wide motif mining Indexing very large RN structure databases Improved multiple structural alignment of RN sequences RN classification and clustering RN-RN interactions and protein-rn interactions 55
56