DUPLICATED RNA GENES IN TELEOST FISH GENOMES

Size: px
Start display at page:

Download "DUPLICATED RNA GENES IN TELEOST FISH GENOMES"

Transcription

1 DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary enter for Bioinformatics, niversity of Leipzig, Härtelstraße 16-18, D Leipzig, ermany Teleost fishes share a duplication of their entire genomes. We report here on a computational survey of structured non-coding RNs in teleost genomes, focussing on the fate of fish-specific duplicates. s in other metazoan groups we find evidence for a large number (11543) of structured RNs, most of which ( 86%) are clade-specific or evolve so fast that their tetrapod homologs cannot be detected. In surprising contrast to protein-coding genes, the fish-specific genome duplication did not lead to a large number of paralogous ncrns: only 188 candidates, mostly microrns, appear in a larger copy number in teleosts than in tetrapods suggesting that large-scale gene duplications do not play a major role in the expansion of the vertebrate ncrn inventory. Keywords: Non-coding RN, ncrn, ncrn evolution, ncrn gene finding, teleost fish, teleosts, Takifugu rubripes, whole-genome duplication, comparative genomics, annotation 1. Introduction The heterogeneous class of non-protein-coding RNs (ncrns) has in recent years moved from a biochemical curiosity to a main research topic in Molecular Biology. Recent results from high-throughput transcriptomics 1,2,3,4 have established that ncrns in fact dominate the transcriptomes of higher eukaryotes, even though the list of non-protein coding RNs is still largely incomplete. With ncrns being implicated in a plethora of regulatory roles 5,6, it becomes an interesting issue to understand their evolution in more detail and beyond the mostly anecdotal narratives available for individual ncrn gene families, see 7 for a review. ompared to the annotation of protein-coding genes, non-coding RN annotation of genomic sequences is still in its infancy. This is true in particular beyond Primary ffiliation: Fraunhofer Institute for ell Therapy and Immunology, Perlickstr. 1, Leipzig, ermany Primary ffiliation: T-Life Research enter, Fudan niversity, Shanghai , hina Secondary ffiliations: Fraunhofer Institute for ell Therapy and Immunology, Perlickstr. 1, Leipzig, ermany; Department of Theoretical hemistry, niversity of Vienna, Währingerstraße 17, Wien, ustria; Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, S 1

2 2 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler () (B) 100% 90% 80% % 60% 50% 40% 30% 20% 10% 0% misc_rn mirn snorn snrn rrn ncrn genes Fugu Tetraodon asterosteus Oryzias Danio Human Fugu Tetraodon asterosteus Oryzias Danio Human Fig. 1. Ensembl-48 annotation of human and teleost fish genomes. mammalian genomes, which mostly inherit the human and mouse annotation in a straightforward way. particular problem for all large-scale studies of ncrn evolution are massive differences in ncrn coverage and the biases in annotation even between fairly closely related organisms. In addition to biased coverage, the currently available ncrn annotation procedures cannot distinguish clearly between functional ncrns and the sometimes huge number of associated pseudogenes. Fig. 1 summarizes the situation for the five available teleost fish genomes, making it obvious that a systematic study of ncrn evolution cannot be meaningfully performed solely on the basis of the available annotation. further drawback of current annotation procedures it that they typically consider only a small subset of the ncrn universe. Beyond the house keeping RNs (trn, snrns, RNse P and MRP RN and a few Pol III transcripts such as vault and Y RNs), annotation is mostly restricted to microrns and the two classes of snorns. In contrast, computational studies have provided convincing evidence for tens of thousands of RNs whose secondary structure is under stabilizing selection 1,8,9,10, while even more functional ncrns without evolutionarily conserved structure were found in the large-scale transcriptomics projects mentioned above. Here, we study the overall characteristics of ncrn evolution in teleost genomes. From an evolutionary perspective, teleost fish genomes are of particular interest because they have undergone a complete duplication of their genomes usually called the Fish Specific enome Duplication (FSD) just before the radiation of the crown-group teleosts 11,12. This raises in particular the question to what extent duplicated ncrns are retained after such a large-scale duplication event. Thus, the purpose of this contribution is two-fold. First we report an RNzbased survey of teleost fish genomes. We demonstrate that like many other groups of organisms (mammals 8,9, urochordates 13, nematodes 14, insects 15, and yeasts 16 ) teleost genomes contain a large number of previously undescribed, clade-specific, non-coding RNs. Secondly, we investigate from a global perspective the fate of

3 Duplicated RN genes in Teleost Fish enomes 3 enomic sequence data RN annotation (lassification of predictions) enome wide alignments (NcDNlign) RNz screen (prediction of structural ncrns) Homology based Blast searches against RN databases Rfam Noncode mirbase mirnmap ncrndb known teleostean ncrns lustering sequence based blastclust SplitsTree structure based LocRN Quality assurance False discovery rates oding Potentials EST overlap Prediction tools trnscan RNmicro SnoReport Orthologs and paralogs blastn detected RNz detected Fig. 2. Overview of the computational workflow. ncrns in the wake of a genome duplication. 2. RNz Screen: nbiased Prediction of Structured RNs We prepared genome-wide alignments of the non-repetitive non-protein-coding DN of the five available teleost genomes (Ensembl-48) (fugu,takifugu rubripes, Tr; pufferfish, Tetraodon nigroviridis, Tn; stickleback, asterosteus aculeatus, a; medaka, Oryzias latipes, Ol; and zebrafish, Danio rerio, Dr) using NcDNlign 17 with fugu serving as the reference organism. Due to the larger evolutionary distances among teleosts compared to mammals, only a relatively small fraction of the genomes can be reliably aligned. In order to improve the performance, we consider only DN that is alignable to the fugu genome after removing all known coding sequences. Local alignments containing at least three species were then scored using the RNz package (v1.0) 18. RNz is a machine learning approach that allows the prediction of structural non-coding RNs. It classifies given alignment slices as either ncrn or other by analysing secondary structure conservation and thermodynamic stability of the fold. lignments are scanned by moving a 120 nt window with a step-width of 40 nt, so that consecutive slices overlap in 80 nt. This sliding-window mechanism is motivated by the fact that many structured RNs are shorter than 100 nt. Such small RN signals would drown in the noise of longer, mostly unstructured, alignments. On the other hand, there is no reliable signal for secondary structure conservation in too short alignments. Previous RNz-based studies discuss these issues in more detail 8,15,16. brief overview of our computational procedure is presented in Fig. 2. t RNz classification probabilities of p > 0.5 and p > 0.9 we obtained and structured sequence windows, which can be combined into and predicted structured RN elements, respectively. In order to estimate the false discovery rate (FDR), the entire screen has been repeated using column-wise shuffled alignments as input. s described in 19 and tested in practice in 8,9,13,14,15,

4 4 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler Table 1. Summary of the RNz screen. Species Tr Tn a Ol Dr genome size [Mb] without DS [Mb] non-coding alignments aligned DN [Mb] scored by RNz [Mb] RNz p> [Kb] RNz p> [Kb] FDR p>0.5 [%] FDR p>0.9 [%] the shuffling procedure of the RNz-package generates randomized alignments that preserves the salient characteristics of multiple alignments: length, base composition, gap patterns, and conservation patterns. omparing the amount of positively scored DN from control and normal screen yields FDR estimates of 26% and 18% for the low and high confidence levels. lternatively, we can base FDR estimates on the numbers of positively scored sequence windows and observe similar FDRs: 26% (5 131/19 916, p>0.5) and 18% (1 207/6 690, p>0.9). These estimates might be somewhat optimistic since the shuffling algorithm of the RNz-package does not account for dinucleotide content. Preserving dinucleotide content while generating randomized alignments is essential to accurately assess the significance of the prediction 20 and previousrnz-based studies have shown that taking this effect into account substantially increases the estimated FDR on mammalian sequences (FDR of 50% for the ENODE regions 9 ). In contrast, the impact of using dinucleotide instead of mononucleotide shuffling has been quite small on drosophilid sequences 15. s an additional control, we therefore applied SISSIz 21, a novel approach to generate dinucleotide-controlled random alignments with the characteristics of a given input alignment. Not unexpectedly, the estimated FDR indeed increased to up to 68%. The SISSIz-randomized control screen still contained 41 known ncr- Ns, indicating that SISSIz leads to a conservative (pessimistic) FDR estimate. Tab. 1 summarizes the statistics of the RNz screen. We excluded currently annotated protein-coding sequences (Ensembl-48) from our analysis. The coding potential scores of P 22 further suggest that almost all RNz hits are indeed in non-coding regions: only 100 (<1 %) are predicted as coding, a value that is less than P s false positive rate, which its authors estimate at 2 %. In order to obtain at least a rough estimate of the sensitivity, we compared the

5 Duplicated RN genes in Teleost Fish enomes 5 Table 2. Sensitivity of RNz on Ensembl-48-annotated fugu ncrns. Denoted percentages refer to the number of recovered ncrns using RNz (p>0.5) over the number of ncrns present in the input alignments. class RNz input Ensembl-48 sensitivity (%) rrns snrns snorns mirns other all RNz output with existing annotation. Since annotated repetitive elements as defined by RepeatMasker have been removed from our input data, several classical RN families have also been excluded, in particular trns, some of the snrns and most of the misc RNs. Only 321 of 541 known fugu ncrns passed the repeat masking step and only 245 of these are sufficiently well conserved to be alignable with homologous sequences of the remaining teleosts. Of these, 221 ncr- Ns (90 %) are recognized by RNz. In order to obtain a more realistic sensitivity estimate, we retrieved all annotated ncrns from the fugu genome, added 100 nt of flanking sequence on both sides, and used NcDNlign to retrieve their homologs and to construct multiple alignments. These were then scored with RNz using the same parameters as the main screen. We obtained an overall sensitivity of about 85 %, see Tab. 2. s in previous work, we note that in particular for snorns the sensitivity is poor, while sensitivity values for mirns are encouraging. considerable number of predicted ncrns overlap ESTs and hence show evidence for active transcription, cp. Table RN nnotation Our annotation procedure follows a recently published scheme 23. Overall, RNz hits can be annotated by the following protocol: Firstly, known RN genes and homologs of known ncrns were identified based on sequence comparison, here using blastn searches against all major ncrn databases: Rfam , NONODE , mirbase , mirnmap , ncrndb 28 (number of RNz hits showing sequence conservation with entries of respective ncrn database: 115 Rfam, 104 NONODE, 208 mirbase, 179 mirnmap, 71 ncrndb). In the second step, we used specialized programs to recognize novel members of three ncrn classes. Since trns were removed from the input set as multi-copy genes, we found only one trn and two trn pseudogenes with trnscan 29. n experimental version of SnoReport 30 identified 885 snorns (727 D- and 136 H-Box snorns, 22 are classified as both), of which 8 match previously annotated mirns. This is

6 6 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler Table 3. comparison of ncrn candidates and EST sequences provides evidence for transcriptional activity. The table lists the number of fugu loci obtained by BLST searches (E-value < 1e-5) between fugu sequences from (1) our teleostean ncrn candidates (p>0.5, overall loci) and (2) the tetrapod-conserved subset (overall loci, cp. Tab. 4) against EST sequences of four teleosts provided by the S Table Browser. The small number of matches with fugu ESTs may be due to the fact that S only offers access to sequences of a deprecated fugu assembly which is only partially compatible with the Ensembl data our study is based on. For tetraodon, no ESTs were available. The last column lists the hits to the medaka EST-database ( ESTs of Tr a Dr Ol m-est-db (1) annotated unknown (2) annotated unknown within the expected FDR of SnoReport. We usedrnmicro 31 to determine putative mirn precursors. t a confidence level of p RNmicro > 0.5, we obtained 434 candidates, of which 190 have blastn matches with mirbase or mirnmap entries. Fig. 3 summarizes the mirn annotation in some more detail. n example of a novel fugu mirn candidate, not yet listed in Ensembl-48, is a homolog of xtr-mir-449, Fig. 4 (top). The structure shows all hallmarks of a microrn precursor including the characteristic conservation pattern. nother novel mirn precursor candidate found by RNz and RNmicro is the intronic locus2693 (scaffold 204, pos ), which contains the mature sequence of dre-mir-728. It is also not yet included in Ensembl-48 s annotation tracks. s a third step, we employed the structural clustering procedure proposed by Will et al. 32 to find possible novel structural classes. ll RNz hits with either a classification probability exceeding 0.9 or a valid annotation form the input of the clustering procedure (2 293 loci). In brief, a modified Sankoff algorithm is used to compute local structural alignments and their consensus structure. The cluster-tree (see Fig. 6) is then obtained by agglomerative clustering based on the alignment scores. It is further processed to identify an optimal partition by evaluating the squared error of the minimum free energies of the individual sequences located within the subtree rooted at an internal node relative to the minimum free energy of their common consensus secondary structure. If the increase of the squared error is unexpectedly high when traversing the cluster-tree from the leaves to the root node, the merging of two internal nodes to one common cluster is stopped. In this way, we estimate that there are at least 106 relevant clusters (which depend on several thresholds, including the chosen cutoff to stop the merging procedure, a structure conservation index (SI) > 0.7, and a consensus minimum free energy (MFE) <

7 Duplicated RN genes in Teleost Fish enomes RNz hits RNmicro mirnmap locus locus10539 N mirbase Fig. 3. MicroRN annotation of RNz hits. The Venn diagram contrasts the number of RNz predictions that are positively classified by RNmicro and hits that have also a mirbase and/or a mirnmap BLST hit. For each subset we record three numbers in the form yx z with y + z = x, where x is the number of microrn precursors in the intersection of all three approaches, z is the number of corresponding mirns annotated in Ensembl, and y is the number of mirns not annotated in Ensembl (these structures are the most interesting ones because they constitute putatively novel mirns). 20kcal/mol) having an average cardinality of 2.9 sequences, an average SI of 0.89, and an average MFE of 34.97kcal/mol. Fig. 6 displays the resulting cluster tree and Fig. 4 (bottom) illustrates an example-subtree in more detail. Beside known ncrns, both trees contain novel, closely related structures demonstrating that our clustering approach indeed yields useful classifications. s in other recent work 32,33, we predict several new mirn candidates by means of clustering, including some candidates that are not recognized by RNmicro. Of particular interest are sets of unannotated RNz hits that do not only fold into similar structures and hence are identifiable in the cluster tree, but which are also located in close vicinity on the genome. Such arrangements are observed e.g. for polycistronic microrn transcripts 34, for multiple unrelated snorns sharing the same host gene 35, and for several Pol III transcripts, including Y RNs 36,37 and vault RNs. ltuvia et al. 34 argued that mirn precursors that are located within short chromosomal distances (<3 000 nt) from each other most likely arise from polycistronic transcripts. Interestingly, the 434 RNmicro predictions contain 41 such clusters. However, there are only six such clusters within the 246 novel mirn candidates, each comprising either two or three loci only. Overall, we observe genomic clusters with a maximal distance of 1 kb consisting of 2 to 8 loci. Increasing the cut-off distance to 10 kb yields genomic clusters, the largest comprising 110 sequences. Within the high confidence p RNz > 0.9 predictions, we still find 234 clusters (730 at 10 kb), each with up to 5 (20) loci. Fig. 5 shows the distribution of pairwise LocRN distances used in structural clustering for RNz

8 June 17, :31 WSP/INSTRTION FILE fugu D. Rose, J. Jo ris, J. Hackermu ller, K. Reiche, Q. LI, P.F. Stadler Fig. 4. Novel microrns in teleost genomes. Top. Fugu mirn-449 (locus 1285, scaffold 360, pos ) is an example of a homolog of a known mirn not annotated in Ensembl. This RNz prediction is classified as mirn precursor by RNmicro and a BLST search against the mature mirns in mirbase reveals that it contains the sequence of the mature mirn xtr-mir-449. Below. Structure-based clustering reveals several collections of hairpin structures. These clusters contain already known mirns, but also closely related novel mirn precursor candidates. Primary sequences of this example vary considerably, but secondary structure motives are significantly well conserved. Novel ncrn candidates (red circle) with a positive RNmicro classification are prime candidates for novel mirns.

9 Duplicated RN genes in Teleost Fish enomes kb, all 20 1kb, only novel frequency (number of genomically clustered pairs) e kb, all e+05 10kb, all e kb, only novel e kb, only novel e e+05 pairwise locarna distance (structural diversity of genomically clustered pairs) Fig. 5. Distribution of structure distances between pairs of adjacent high confidence (p RNz > 0.9) RNz hits with a maximum distance of 1kb, 3kb and 10 kb, respectively. The left column shows histograms for all genomic clusters of RNz predictions, the right column is restricted to clusters containing putatively novel structures that are not included at Ensembl-48. hits that form genomic clusters. The distribution is clearly bi-modal, consisting of a random bulk and a small set of signals with very similar structures. While fairly restrictive, the data indicate that there are at least a hand full of novel structural classes of RNs that tend to cluster in the same genomic location. 4. Orthologs and Paralogs BLST search (E-value < 10 5 ) shows that 944 (8 %) of the fugu RNz predictions have sequence similarity with highly conserved non-coding vertebrate elements (hne) 38, of which 246 candidates have some annotation. 541 loci are not only alignable with vertebrates (human, mouse, dog, chicken, shark using NcDNlign), the extended alignments are still classified as structured RNs by RNz, see Tab 4. Within this set of highly conserved structures, the estimated FDR significantly reduces to only 7-11 %, and only two sequences are classified as coding by P. 164 of these sequences have homologs within the set of 997 high-confidence highly conserved predictions of the prior mammalian RNz screen 8. Our data thus provide at least 377 additional well-conserved ncrn candidates that were not detected by

10 10 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler the earlier survey. ccording to the S Table Browser, at least 250 teleostean ncrn candidates (2 %) have sequence similarity with untranslated regions (TRs) of human protein-coding genes (5 TRs: 136, 3 TRs: 162, some match both types of TRs). Regarding the set of tetrapod-conserved RNz predictions, 140 (26 %) out of 541 candidates cover human intronic sequences and 181 match human TRs (5 TRs: 112, 3 TRs: 116). Due to the lack of a reliable fugu TR annotation we analysed the 1 kb flanking region of the 5 and 3 boundaries of Ensembl s fugu protein-coding genes. Interestingly, the 5 flanking regions (3 flanking regions) contain 1505 (1448) RNz hits of which 94 % (93 %) are potential cis-regulatory signals of unknown function. mong the tetrapod-conserved RNz hits, 114 candidates reside within 1 kb up or down-stream of protein-coding genes, 61 of which are not annotatable. 100 locus10696 locus9541 locus6551 locus2629 locus9506 locus10194 locus9571 locus7375 locus2588 locus7517 locus6029 locus6543 locus2336 locus7479 locus3142 locus3115 locus582 locus8997 locus4825 locus1557 locus6152 locus4998 locus5587 locus2587 locus1706 locus379 locus330 locus10877 locus7486 locus4124 locus396 locus6147 locus5441 locus3669 locus2484 locus1759 locus159 locus9550 locus1474 locus10358 locus1666 locus6922 locus5674 locus3398 locus204 locus7918 locus6826 locus6457 locus10947 locus2158 locus10077 locus2497 locus9103 locus11044 locus5866 locus11444 locus8890 locus8383 locus11390 locus9501 locus8476 locus7780 locus3603 locus11338 locus1793 locus10256 locus8858 locus2034 locus11023 locus9830 locus7471 locus9589 locus8305 locus3794 locus9608 locus3466 locus6773 locus7450 locus1250 locus7903 locus8003 locus857 locus5412 locus6883 locus946 locus6955 locus1894 locus9125 locus579 locus4149 locus4356 locus3543 locus10152 locus3721 locus1396 locus10101 locus394 locus261 locus8211 locus4168 locus626 locus3922 locus7549 locus5302 locus4339 locus3172 locus9894 locus3921 locus4468 locus8734 locus6036 locus6606 locus2919 locus7287 locus4603 locus6133 locus2052 locus1386 locus6157 locus3798 locus1581 locus8311 locus3126 locus4063 locus3262 locus5213 locus2605 locus8272 locus6877 locus1422 locus4476 locus8698 locus10306 locus8195 locus5394 locus9396 locus6192 Ensembl48 mirns RNmicro mirn candidates let 7a let 7i,h cluster let 7d cluster locus9394 locus6542 locus2597 locus7614 locus9797 locus3000 locus7355 locus5922 locus7202 locus4074 locus3844 locus1575 locus592 locus30 locus1117 locus244 locus91 locus7973 locus899 locus10122 locus4955 locus8859 locus10 locus6349 locus221 locus4907 locus9142 locus7437 locus5988 locus4471 locus4128 locus1938 locus5392 locus2871 locus850 locus5177 locus9445 locus5863 locus10923 locus1784 locus9213 locus2426 locus5358 locus5265 locus9789 locus1690 locus11273 locus9472 locus3434 locus10029 locus4936 locus8057 locus8631 locus7201 locus4939 locus4914 locus4416 locus256 locus10398 locus1419 locus4889 locus6521 locus4112 locus11465 locus2170 locus5229 locus8897 locus512 locus432 locus287 locus164 locus112 locus7715 locus10399 locus807 locus727 locus4665 locus4664 locus105 locus8439 locus1115 locus9738 locus565 locus10993 locus2954 locus7726 locus10595 locus9609 locus3766 locus6614 locus3966 locus5572 locus4078 locus1164 locus8158 locus4782 locus854 locus7902 locus3013 locus4571 locus4264 locus4055 locus9690 locus7203 locus356 locus216 locus11009 locus4095 locus658 locus7556 locus7107 locus1819 locus10788 locus7342 locus5435 locus5057 locus7717 locus1301 locus5871 locus9231 locus1934 locus6682 locus6312 locus4930 locus2539 locus1027 locus9849 locus7854 locus1493 locus9460 locus4934 locus5977 locus3098 locus3995 locus11491 locus2201 locus9266 locus6658 locus9566 locus3273 locus6581 locus10444 locus8592 locus4142 locus4072 locus8670 locus7303 locus602 locus10763 locus3183 locus2536 locus8449 locus8018 locus6070 locus6289 locus454 locus5428 locus2185 locus10182 locus9805 locus6000 locus2736 locus11201 locus1393 locus10070 locus5556 locus3448 locus9123 locus5953 locus2507 locus9729 locus11161 locus1282 locus10239 locus2425 locus9333 locus7012 locus10607 locus11145 locus1685 locus808 locus1291 locus10831 locus3637 locus11117 locus7057 locus10109 locus8583 locus3171 locus8896 locus5952 locus8675 locus7230 locus9470 locus6571 locus7370 locus2437 locus1084 locus4438 locus1751 locus3982 locus753 locus750 locus11406 locus1344 locus11193 locus742 locus2878 locus2577 locus10153 locus5207 locus6761 locus520 locus8200 locus6181 locus7736 locus2631 locus4230 locus1464 locus9847 locus5214 locus2309 locus2454 locus2780 locus1445 locus909 locus874 locus5913 locus9376 locus543 locus3313 locus9257 locus1657 locus6403 locus1268 locus10023 locus4087 locus6625 locus3753 locus11432 locus509 locus11257 locus10918 locus7411 locus5103 locus10021 locus4377 locus9016 locus10513 locus496 locus951 locus2769 locus1254 locus9493 locus5348 locus640 locus5351 locus3536 locus10411 locus9945 locus7802 locus3150 locus9153 locus5756 locus6404 locus6498 locus1599 locus8176 locus3955 locus3558 locus695 locus95 locus9222 locus6001 locus547 locus10963 locus421 locus5897 locus7910 locus3746 locus3188 locus207 locus10979 locus7551 locus6428 locus11413 locus9884 locus11230 locus2130 locus6171 locus3825 locus10610 locus7309 locus6187 locus7619 locus1312 locus3621 locus8713 locus6668 locus466 locus5527 locus3972 locus2207 locus6142 locus2624 locus9685 locus3863 locus10108 locus1475 locus10149 locus5341 locus9619 locus7058 locus10946 locus9027 locus2152 locus5339 locus10688 locus902 locus7970 locus1936 locus1637 locus6593 locus11428 locus5634 locus8030 locus11004 locus9353 locus10081 locus3604 locus7434 locus6143 locus4446 locus3627 locus8579 locus4802 locus4387 locus3779 locus3148 locus8333 locus6375 locus11487 locus8360 locus11081 locus4335 locus8800 locus4776 locus1113 locus3947 locus10578 locus8409 locus6051 locus10148 locus8823 locus2682 locus8597 locus8180 locus6717 locus5563 locus5370 locus9699 locus7640 locus2127 locus3567 locus982 locus580 locus9058 locus3444 locus8282 locus11335 locus7579 locus9379 locus9015 locus333 locus2297 locus265 locus6008 locus457 locus11463 locus4781 locus4511 locus10180 locus6329 locus3937 locus10322 locus9736 locus8380 locus2628 locus5689 locus4605 locus4728 locus997 locus2582 locus1205 locus4809 locus11404 locus10167 locus10057 locus4486 locus2550 locus604 locus5991 locus5613 locus7522 locus3147 locus10913 locus2179 locus950 locus8037 locus4989 locus10543 locus3410 locus3327 locus6892 locus4340 locus8571 locus8050 locus4943 locus2776 locus11040 locus8070 locus7399 locus3105 locus7209 locus1684 locus5031 locus7859 locus6311 locus3329 locus2690 locus1844 locus11066 locus3980 locus6418 locus4371 locus1912 locus90 locus8949 locus2422 locus1843 locus6771 locus3467 locus1359 locus7769 locus1099 locus2688 locus11011 locus4082 locus7753 locus10290 locus9220 locus2968 locus10007 locus10006 locus3967 locus8555 locus7507 locus7097 locus4510 locus9247 locus2191 locus7554 locus10676 locus3166 locus1076 locus10223 locus5520 locus7565 locus7349 locus5399 locus11260 locus6149 locus3217 locus9094 locus7786 locus8072 locus5715 locus3799 locus1265 locus10438 locus7795 locus9712 locus2300 locus8259 locus3034 locus7622 locus4507 locus4506 locus4505 locus6473 locus5044 locus9160 locus6339 locus5277 locus1654 locus2273 locus1114 locus5558 locus5096 locus2749 locus967 locus10493 locus388 locus11048 locus34 locus6488 locus1455 locus6861 locus1488 locus9235 locus4400 locus433 locus10771 locus6137 locus9462 locus2891 locus7110 locus6090 locus4187 locus2019 locus8213 locus5764 locus2744 locus10341 locus6277 locus7131 locus3163 locus588 locus6702 locus585 locus187 locus5854 locus3915 locus7361 locus2447 locus3270 locus2056 locus6825 locus728 locus8729 locus9682 locus238 locus9084 locus5736 locus3653 locus7460 locus6456 locus5118 locus3768 locus2382 locus11012 locus7668 locus5544 locus11099 locus9415 locus7296 locus2271 locus9551 locus5762 locus8624 locus3250 locus8895 locus7162 locus6243 locus3308 locus1495 locus8183 locus3971 locus986 locus10519 locus3723 locus11442 locus6660 locus9276 locus6619 locus5415 locus1294 locus11340 locus7605 locus8121 locus1075 locus9623 locus7770 locus9159 locus108 locus7788 locus1192 locus8242 locus6463 locus10501 locus936 locus4866 locus1394 locus2745 locus1329 locus674 locus354 locus10740 locus7333 locus10165 locus9088 locus8408 locus5072 locus3497 locus1858 locus3385 locus1501 locus8887 locus7677 locus47 locus8720 locus9804 locus1665 locus7811 locus8625 locus4323 locus8531 locus6764 locus4954 locus5231 locus9859 locus5763 locus11412 locus8198 locus4411 locus6636 locus5775 locus5098 locus4666 locus10273 locus9390 locus4553 locus5338 locus4041 locus11371 locus10388 locus10864 locus8248 locus9962 locus5968 locus747 locus10402 locus2232 locus6370 locus5275 locus4489 locus1872 locus11403 locus4692 locus8440 locus5474 locus9240 locus3864 locus6933 locus4860 locus8310 locus8309 locus8308 locus2014 locus9076 locus6153 locus939 locus583 locus11538 locus9241 locus8602 locus2033 locus10888 locus6937 locus6509 locus3599 locus9018 locus3733 locus2164 locus1194 locus351 locus377 locus7783 locus4567 locus3487 locus1668 locus4952 locus2902 locus11422 locus3764 locus7752 locus7147 locus4404 locus3520 locus2117 locus11345 locus823 locus5604 locus4739 locus3574 locus764 locus1441 locus10653 locus2061 locus9441 locus8451 locus1634 locus4883 locus9651 locus7660 locus5740 locus8871 locus3956 locus3878 locus2335 locus10221 locus4406 locus8955 locus7808 locus5152 locus10886 locus6994 locus682 locus8785 locus3235 locus9892 locus9005 locus504 locus3977 locus10054 locus6669 locus9723 locus9138 locus7400 locus10454 locus1449 locus7525 locus4182 locus822 locus10817 locus7316 locus4618 locus4523 locus10760 locus10011 locus4440 locus8812 locus1335 locus9798 locus2315 locus1920 locus2793 locus1095 locus6893 locus2826 locus4089 locus11229 locus8657 locus5793 locus3578 locus10320 locus1239 locus506 locus11143 locus8635 locus860 locus7335 locus7033 locus1103 locus10807 locus3296 locus10145 locus7976 locus8591 locus7145 locus4118 locus1330 locus7764 locus5676 locus4000 locus1443 locus3973 locus1655 locus9874 locus4480 locus1874 locus552 locus7031 locus2030 locus74 locus10779 locus6027 locus3671 locus6103 locus405 locus954 locus8353 locus5578 locus11306 locus11299 locus4212 locus8868 locus6377 locus7424 locus5536 locus4857 locus8748 locus817 locus10529 locus9987 locus7863 locus10017 locus6905 locus365 locus247 locus63 locus10602 locus2801 locus349 locus11355 locus10927 locus9640 locus6522 locus2332 locus6513 locus11111 locus8043 locus3890 locus11384 locus741 locus11222 locus4983 locus9258 locus423 locus11069 locus9644 locus10729 locus763 locus5419 locus2794 locus2488 locus2483 locus5688 locus4536 locus3714 locus7588 locus4349 locus4600 locus1913 locus7836 locus6890 locus7029 locus6544 locus3683 locus3367 locus8489 locus9972 locus8368 locus8130 locus3789 locus3665 locus9466 locus863 locus10665 locus8922 locus6337 locus5383 locus2112 locus431 locus310 locus11162 locus3337 locus4591 locus367 locus7766 locus5856 locus1709 locus4987 locus1955 locus3968 locus7360 locus6739 locus1740 locus8289 locus6653 locus1853 locus4614 locus7803 locus5573 locus1198 locus1066 locus825 locus10567 locus4027 locus11267 locus8959 locus9654 locus7181 locus3647 locus1860 locus6085 locus3523 locus4652 locus3654 locus2948 locus8328 locus7664 locus1107 locus5608 locus4730 locus5562 locus11029 locus4484 locus7371 locus10625 locus10549 locus10201 locus7794 locus7298 locus911 locus11153 locus1878 locus8437 locus2135 locus2664 locus9606 locus8524 locus7724 locus6234 locus5357 locus7043 locus6810 locus6901 locus3044 locus314 locus4542 locus4534 locus4271 locus7566 locus4458 locus4305 locus577 locus11300 locus1060 locus7923 locus5345 locus5248 locus8405 locus490 locus9802 locus2301 locus8519 locus879 locus9784 locus3861 locus9476 locus6259 locus6656 locus9905 locus8572 locus11307 locus9848 locus9324 locus11523 locus206 locus10451 locus7079 locus6654 locus9352 locus10271 locus7136 locus8784 locus2683 locus2094 locus6477 locus3152 locus3336 locus1904 locus8900 locus9982 locus7961 locus5720 locus1758 locus410 locus9610 locus3673 locus5183 locus7315 locus7560 locus5699 locus7791 locus4780 locus1922 locus10863 locus7843 locus3257 locus9727 locus9408 locus1136 locus6985 locus3939 locus6088 locus6703 locus1356 locus545 locus11220 locus8144 locus9260 locus1201 locus9820 locus7621 locus4674 locus6434 locus3774 locus3138 locus3081 locus10460 locus7035 locus10331 locus4723 locus2119 locus9140 locus7926 locus10463 locus4414 locus8536 locus8418 locus7271 locus3835 locus2485 locus320 locus7269 locus6423 locus1167 locus2757 locus3743 locus7787 locus3045 locus2093 locus11269 locus197 locus5696 locus3731 locus11072 locus4530 locus7090 locus9392 locus6791 locus10997 locus10140 locus6374 locus8803 locus3761 locus9900 locus9579 locus9475 locus6584 locus5533 locus3745 locus3490 locus2445 locus1947 locus9636 locus7633 locus10139 locus3727 locus3330 locus10258 locus7938 locus5331 locus4341 locus10034 locus7922 locus10668 locus7327 locus7982 locus3852 locus5967 locus3454 locus1766 locus798 locus3184 locus3174 locus2910 locus3495 locus2475 locus8188 locus5393 locus1644 locus3660 locus11499 locus7262 locus8243 locus6040 locus4927 locus4911 locus2435 locus1595 locus9288 locus7148 locus5728 locus7264 locus624 locus175 locus6323 locus2101 locus5945 locus3644 locus922 locus462 locus10041 locus2653 locus4963 locus2181 locus1921 locus7966 locus6380 locus1596 locus426 locus179 locus10168 locus8491 locus5722 locus680 locus6306 locus10235 locus192 locus4846 locus11522 locus3021 locus407 locus10840 locus6073 locus10977 locus8539 locus9283 locus3534 locus1357 locus1042 locus5821 locus5107 locus576 locus2384 locus1252 locus616 locus8026 locus566 locus7278 locus7174 locus390 locus5521 locus10404 locus10189 locus4342 locus8763 locus4021 locus6672 locus803 locus318 locus4311 locus1828 locus1491 locus277 locus11100 locus1975 locus6765 locus6074 locus1316 locus7667 locus43 locus7778 locus1511 locus3189 locus20 locus7150 locus8142 locus2514 locus992 locus7609 locus4374 locus4502 locus2685 locus2983 locus1457 locus9735 locus1331 locus8420 locus10304 locus609 locus8548 locus667 locus9944 locus4061 locus7383 locus6354 locus11214 locus7591 locus7674 locus5065 locus6626 locus5875 locus98 locus2048 locus52 locus8792 locus4688 locus3384 locus3099 locus7544 locus3192 locus3173 locus3740 locus3739 locus3738 locus10324 locus5997 locus5730 locus9424 locus472 locus4319 locus10701 locus8044 locus9785 locus5365 locus8779 locus7425 locus6361 locus7611 locus4375 locus3439 locus1327 locus2623 locus10204 locus5814 locus4874 locus2240 locus6665 locus1111 locus11200 locus8036 locus993 locus9297 locus759 locus36 locus7077 locus4836 locus7403 locus518 locus11400 locus487 locus6053 locus125 locus6054 locus3476 locus324 locus1611 locus11121 locus8302 locus714 locus5675 locus2215 locus8076 locus598 locus8075 locus2515 locus4239 locus1178 locus62 locus1150 locus2242 locus1646 locus1296 locus7319 locus1822 locus6044 locus5407 locus448 locus11241 locus352 locus3801 locus8777 locus4540 locus10940 locus5056 locus10413 locus756 locus589 locus33 locus4693 locus4393 locus2504 locus1056 locus2817 locus10671 locus4742 locus3207 locus3029 locus2761 locus5540 locus5310 locus9243 locus5138 locus8757 locus1314 locus1598 locus362 locus3426 locus2638 locus704 locus10953 locus2148 locus9314 locus3076 locus7831 locus6278 locus1010 locus2626 locus8807 locus989 locus9439 locus8950 locus2506 locus1597 locus9725 locus2160 locus6287 locus6286 locus1369 locus1011 locus3909 locus7652 locus4851 locus4848 locus870 locus6869 locus2667 locus11385 locus7185 locus1532 locus3948 locus3050 locus2392 locus1897 locus833 locus3446 locus3436 locus9411 locus1070 locus3181 locus11328 locus983 locus10468 locus3078 locus581 locus4903 locus11363 locus9030 locus10370 locus7312 locus2972 locus2512 locus699 locus10177 locus2788 locus8105 locus2784 locus816 locus6043 locus5058 locus1887 locus9130 locus9129 locus7965 locus10452 locus4815 locus881 locus11032 locus3539 locus665 locus801 locus697 locus259 locus1351 locus2615 locus9303 locus2558 locus2241 locus157 locus10647 locus972 locus4975 locus746 locus5424 locus7222 locus5583 locus2988 locus794 locus5606 locus5478 locus271 locus5030 locus486 locus10037 locus8317 locus10231 locus9891 locus10049 locus7962 locus73 locus3602 locus403 locus8122 locus953 locus10187 locus7947 locus3137 locus10328 locus4183 locus302 locus8502 locus4429 locus530 locus3505 locus2401 locus10130 locus2790 locus2733 locus2657 locus10878 locus7760 locus5603 locus6505 locus4875 locus336 locus7796 locus4114 locus2495 locus793 locus1942 locus5835 locus7141 locus2114 locus5250 locus11013 locus9544 locus1943 locus10183 locus5328 locus2098 locus2441 locus834 locus5873 locus1227 locus5597 locus1293 locus7963 locus6499 locus10756 locus1069 locus8593 locus5983 locus980 locus7921 locus10079 locus943 locus6760 locus6733 locus3125 locus7746 locus3442 locus10883 locus10867 locus5781 locus5091 locus7662 locus4985 locus4852 locus2778 locus3859 locus4885 locus4599 locus4354 locus4259 locus2208 locus887 locus2395 locus10150 locus942 locus3511 locus824 locus8422 locus5090 locus2189 locus3393 locus9126 locus8458 locus2885 locus1969 locus5234 locus859 locus6615 locus2702 locus9048 locus141 locus4663 locus705 locus757 locus329 locus10998 locus7578 locus7158 locus3145 locus10949 locus7649 locus7436 locus1801 locus661 locus9542 locus2230 locus8129 locus6421 locus9319 locus8918 locus7016 locus2524 locus7905 locus9039 locus8561 locus3949 locus3591 locus10881 locus11259 locus3039 locus6650 locus10951 locus5609 locus10000 locus6818 locus6631 locus4878 locus10911 locus7534 locus1787 locus8941 locus3488 locus6369 locus903 locus10919 locus1315 locus869 locus864 locus196 locus9443 locus7745 locus7943 locus6471 locus8391 locus8314 locus10334 locus3765 locus6777 locus2693 locus1238 locus7294 locus8826 locus5815 locus1195 locus10453 locus92 locus7240 locus6291 locus9151 locus7559 locus4976 locus6622 locus1603 locus10679 locus3445 locus11393 locus734 locus8157 locus4196 locus5319 locus3211 locus1846 locus8537 locus4490 locus1446 locus7925 locus7146 locus7250 locus5335 locus8135 locus5966 locus1680 locus4457 locus7797 locus1609 locus1460 locus6997 locus6469 locus2730 locus6071 locus46 locus6379 locus5308 locus1888 locus11086 locus1645 locus6249 locus10678 locus7014 locus6313 locus3744 locus5982 locus4459 locus10938 locus7801 locus11208 locus521 locus5919 locus2126 locus7844 locus5981 locus5918 locus2128 locus436 locus3582 locus2398 locus1531 locus11268 locus1459 locus7001 locus7339 locus6330 locus2914 locus10986 locus1643 locus11327 locus4847 locus5532 locus1196 locus8856 locus1202 locus736 locus7553 locus6789 locus5827 locus1285 locus5965 locus5870 locus5872 locus5433 locus10229 locus404 locus6651 locus6251 locus4503 locus2651 locus978 locus389 locus8768 locus1604 locus1237 locus639 locus8134 locus3675 locus7650 locus8366 locus5016 locus3583 locus5018 locus3742 locus7596 locus5017 locus3748 locus11466 locus2088 locus3228 locus979 locus508 locus6038 locus5980 locus11360 locus2129 locus8912 locus4977 locus4519 locus2131 locus10033 locus3767 locus2132 locus6031 locus10141 locus9840 locus5019 locus2123 locus11464 locus1055 locus1620 locus971 locus8765 locus1503 locus449 locus254 locus224 locus6119 locus1968 locus7037 locus7975 locus901 locus715 locus525 locus5841 locus134 locus571 locus620 locus286 locus7755 locus940 locus4240 locus3363 locus8059 locus3670 locus498 locus177 locus10980 locus4606 locus9486 locus2433 locus1231 locus524 locus300 locus689 locus359 locus9 locus3096 locus8905 locus6671 locus1890 locus11085 locus223 locus97 locus6586 locus5421 locus8701 locus1845 locus3051 locus1571 locus3146 locus2035 locus1570 locus7504 locus1880 locus7273 locus1937 locus3264 locus1777 locus5842 locus4993 locus7261 locus1712 locus6754 locus11303 locus3679 locus4842 locus6 locus11224 locus4368 locus4353 locus1544 locus9607 locus1564 locus7838 locus5678 locus654 locus9963 locus1221 locus7829 locus1549 locus7817 locus3267 locus4765 locus9929 locus6858 locus8432 locus6487 locus4347 locus11278 locus3441 locus10970 locus613 locus8223 locus1363 locus6882 locus11106 locus5632 locus4947 locus8764 locus2480 locus1217 locus10553 locus5955 locus3114 locus8392 locus5128 locus4351 locus1159 locus10737 locus955 locus340 locus7984 locus8648 locus7765 locus5706 locus10850 locus10214 locus6747 locus5174 locus6125 locus3095 locus8725 locus4627 locus4909 locus4166 locus10785 locus9294 locus4671 locus10813 locus186 locus9042 locus6325 locus3409 locus6364 locus3334 locus5436 locus5373 locus8719 locus7845 locus3187 locus904 locus4569 locus2949 locus1414 locus4472 locus2161 locus3880 locus2547 locus773 locus8875 locus10038 locus4753 locus3935 locus9936 locus5353 locus1122 locus10136 locus4360 locus7172 locus8417 locus6449 locus1008 locus11454 locus4257 locus9136 locus5886 locus10069 locus7432 locus2545 locus11484 locus2554 locus9569 locus1676 locus2823 locus7408 locus6856 locus6371 locus5295 locus8867 locus4426 locus8861 locus6688 locus1905 locus10634 locus3351 locus10086 locus7564 locus4764 locus7907 locus1399 locus549 locus3155 locus8042 locus11337 locus2042 locus4427 locus2403 locus278 locus2149 locus5500 locus10215 locus9440 locus5321 locus2016 locus9872 locus7503 locus2353 locus8998 locus5374 locus2276 locus809 locus4253 locus19 locus11394 locus2689 locus5320 locus4059 locus3809 locus9269 locus4680 locus6356 locus5005 locus3154 locus11528 locus6344 locus10019 locus4129 locus538 locus10380 locus1737 locus10884 locus3086 locus5288 locus51 locus39 locus9128 locus2895 locus7732 locus4204 locus1663 locus10654 locus4167 locus970 locus8852 locus1228 locus610 locus10838 locus4938 locus7582 locus8474 locus6840 locus4626 locus6536 locus2473 locus4329 locus2220 locus4073 locus5202 locus4795 locus408 locus9530 locus5707 locus5580 locus9210 locus5592 locus1494 locus10776 locus8222 locus1400 locus7493 locus985 locus9434 locus7685 locus6790 locus5934 locus4541 locus3516 locus2278 locus7737 locus4391 locus4908 locus2673 locus6112 locus7111 locus6221 locus3300 locus2155 locus4629 locus2865 locus2627 locus8235 locus4308 locus7130 locus1200 locus2751 locus7700 locus7083 locus1402 locus587 locus342 locus414 locus2662 locus413 locus228 locus27 locus11383 locus128 locus9327 locus1029 locus8965 locus5401 locus1061 locus9223 locus6786 locus5337 locus6646 locus628 locus11283 locus5379 locus1914 locus5347 locus3989 locus1421 locus6224 locus456 locus3581 locus1642 locus10677 locus8394 locus5431 locus2595 locus11325 locus10981 locus5488 locus2137 locus9113 locus3983 locus1423 locus1208 locus9947 locus8118 locus11113 locus574 locus1985 locus4409 locus3331 locus3369 locus1083 locus7288 locus7116 locus5691 locus1232 locus8329 locus7249 locus5268 locus3882 locus1741 locus7125 locus6694 locus7326 locus4290 locus3130 locus7442 locus2934 locus9612 locus4130 locus7362 locus5423 locus4882 locus10586 locus3210 locus9604 locus9943 locus9809 locus2811 locus8232 locus3332 locus10102 locus1230 locus522 locus335 locus299 locus10396 locus1997 locus8169 locus5961 locus10617 locus9952 locus2000 locus5276 locus1902 locus3143 locus2373 locus9172 locus7328 locus9357 locus7067 locus2156 locus1552 locus10060 locus3421 locus4657 locus7532 locus635 locus8013 locus3326 locus8775 locus7462 locus2901 locus8191 locus4584 locus4873 locus2617 locus7380 locus7260 locus9488 locus7311 locus4015 locus9184 locus7515 locus1255 locus1102 locus568 locus567 locus366 locus267 locus57 locus2696 locus3141 locus2517 locus1050 locus236 locus189 locus40 locus10975 locus327 locus8737 locus1125 locus570 locus3477 locus6268 locus5311 locus3474 locus3340 locus1593 locus987 locus482 locus10407 locus10882 locus8501 locus6148 locus11536 locus10666 locus10685 locus10681 locus10467 locus2124 locus10546 locus10045 locus3902 locus2785 Fig. 6. luster-tree of chosen high-scoring teleostean ncrn candidates. The tree comprises 4585 nodes, of which 2293 are leaves. Known fugu mirns as provided by Ensembl-48 are indicated in green (inner circle), RNmicro-predicted mirn candidates are marked in red (outer circle). ertain subtrees containing typical mirn clusters are drawn in blue. Encouragingly, the majority of known mirn clusters is recovered (red+green). Overall, there is substantial evidence for novel mirn(-like) classes of ncrn structures in teleostean genomes (red only).

11 Duplicated RN genes in Teleost Fish enomes 11 Table 4. Phylogenetic conservation of fugu RNz predictions. For all RNz hits p>0.5 we list the number of elements with blastn-detected homologs (left part) and the number of hits that again were classified as structured (RNz) when aligned with the sequences from human, mouse, dog, chicken, or shark (right part). By considering both approaches, we overall obtain 1581 tetrapodconserved ncrn candidates. blastn hit positive RNz predictions annotation annotation enome all yes (%) no (%) all yes (%) no (%) ny non-teleost (19) (81) (43) 311 (57) Homo sapiens (22) 767 (78) (44) 270 (56) allus gallus (21) 833 (79) (45) 251 (55) anis familiaris (22) 785 (78) (44) 264 (56) Mus musculus (23) 701 (77) (45) 235 (55) allorhinchus milii (25) 457 (75) (53) 167 (47) Petromyzon marinus (21) 148 (79) Drosophila melanogaster (66) 11 (34) aenorhabditis elegans 8 5 (63) 3 (37) In contrast, only a tiny fraction of our structured RN candidates seems to be conserved within any of the available invertebrate genomes. gain, note that this observation is misleading because the best-conserved house keeping ncrns, in particular trns and snrns, were removed from the input data. From an evolutionary perspective, an important issue is the fate of duplicated non-coding RN signals. Since the ancestral vertebrate genome already went through two rounds of whole-genome duplications, we have to expect up to eight paralogs in teleost genomes. For the case of microrns we demonstrated in previous work that duplicated ncrns frequently have survived the genome duplication(s) 39. We therefore compared the RNz predictions at sequence level and list them by their copy number in Tab. 5. The repeat-masked RNs are not informative for this purpose because their copy numbers are highly variable between relatively closely related species, see supplemental data. The fraction of predictions with more than eight copies seems to contain repetitive or pseudo-genic elements rather than correctly identified, evolutionary duplicated ncrns. The fraction of unannotated hits grows with larger number of copies. One explanation for this observation might be that these sequences diverged by accumulating mutations after they got duplicated, but simultaneously preserve their secondary structure, so that it becomes more and more difficult to reliably annotate them comparatively at the sequence level, while they still remain detectable by algorithms that incorporate structural features, e.g. RNz or LocRN. Interestingly, the mutual distance obtained by structure-based clustering between duplicated ncrn

12 12 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler Table 5. Distribution of paralogous fugu RNz hits. Paralogs are obtained by BLST searches (E-value < 1e-3) of the fugu RNz candidate sequences against themselves. They are compared with each other to exclude likely pseudogenes. We observe duplicated loci, 964 of which have two to eight copies. Furthermore, we provide the number of RNz hits conserved between fugu and human, which occur more often in fugu than in human (and the other way round). s an example, 135 ncrn candidates, present as single copy in human, occur more than once in fugu. onversely, 81 candidates, not duplicated in fugu, appear multi-copied in human. # copies > 8 p> annotated unknown teleost-specific in tetrapods in human fugu > human human > fugu p> annotated unknown teleost-specific in tetrapods fugu > human human > fugu candidates increases with the number of loci that belong to the cluster (see supplement). This could be explained by a duplication/deletion mechanism in which cluster members are destroyed at random by mutation as the exact copy number is not under strong selection. omparing the number of ncrns that occur at most eight times in fugu with the copy number in human (cp. Tab. 5) reveals that in 188 cases teleosts contain more ncrn copies than tetrapods, of which 74 are of unknown function. In turn, tetrapods contain the higher number of copies in only 89 cases. These data indicate that, with the most notable exception of microrns 39, additional copies of non-coding RNs are rarely retained in the aftermath of the fish-specific genome duplication. The loss appears to be more extensive than for protein-coding genes, where at least about paralogs arising from the fish specific genome duplication have been reported in the genomes of fugu and tetraodon 40. It is conceivable, however, that duplicated ncrns have diverged so far that they are not recognized as paralogs by blast-based methods. To illustrate the fate of ncrns subsequent to duplication in more detail, we estimated the densities of the bivariate distribution of sequence versus structural similarity over duplicated and randomly chosen pairs of sequences (Fig. 7). Duplicated loci can be assigned to three distinct groups (Fig. 7 ): those, where both sequence and structure are nearly identical (p1); a set with significant sequence di-

13 Duplicated RN genes in Teleost Fish enomes 13 vergence but negligible structural differences (p2), and a set which largely overlaps with the background distribution of randomly chosen pairs (p3). The ostensible discrepancy between the E-value of the initial BLST search to identify duplicated pairs and the mean pairwise identity of the lustalw alignment used here is a consequence of scoring differences between local and global alignments: subsequences of both loci are maintained with high similarity (which is what blast focuses on), whereas overall the loci may be quite highly divergent, both in terms of sequence and structure. This might be interpreted as a process that maintains some local functional features, presumably those that are required to determine an RN type, e.g. a protein binding site for a guide RN, whereas the functional role can diverge quickly upon duplication. related behavior can be observed for duplicated mir- Ns, which mainly make up for the p2 peak (cp. Suppl. Fig. 4). pon duplication, mirns largely maintain their precursor structure, but diverge on the sequence level the affiliation to a particular functional class is maintained whereas the functional role, the target specificity, changes. s expected, duplicated snor- Ns do not share such strict constraints, they are found at any hotspot position of the density plots. Similarly, the non-annotatable loci appear also widely spread throughout the landscape. No other annotated classes of ncrns are found in the set of duplicated pairs. The small number of ncrn candidates that have two paralogs produced by the fish-specific genome duplication severely limits attempts to detect structured RNs by comparing paralogous regions of the same genome. We have tested this in a preliminary study using paralogous sequences from the fugu genome as input for RNz. With this ansatz, we recovered only 283 of the 454 fugu ncrns known at the time (Ensembl-45). Despite the inherent limitations of comparative genomics with a single genome due to massive loss of duplicated genes, we feel that further methodological improvements are worthwhile; these will focus in particular on increasing the sensitivity of detecting paralogous regions. 5. Discussion We have reported here on an unbiased survey for evolutionary conserved structured ncrns in the currently available genomes of teleosts. s in other metazoan animals, we find evidence for several thousand structured RN motifs of which only a small fraction can be annotated. Due to the large evolutionary distances among teleosts compared to mammals, our RNz screen has a decreased sensitivity. The absolute value of structured elements thus cannot be fairly compared with the much larger number of predictions for mammalian genomes 8,9. Furthermore, the overwhelming majority of the signals is specific to teleosts, i.e., almost no homologous sequences can be identified in invertebrates. Nevertheless, we have identified several hundred previously unannotated candidates that are shared between teleosts and tetrapods. onversely, the majority of those RNz hits that have homologous sequences in other vertebrates can still be recognized as structured

14 14 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler Fig. 7. Distribution of structure distances for duplicated and all ncrn candidates. The figure illustrates the density plot of the distribution of all pairwise LocRN distances of putatively duplicated pairs with recognizable sequence similarity (red curve) and the background distribution of randomly selected pairs (black curve) for all reasonable LocRN alignments, i.e. positive alignment scores (), and the complete distribution excluding 25 outliers (B). Subsequent to duplication events, the majority of multi-copy genes preserves its secondary structure, however, a substantial fraction of genes displays highly diverged secondary structures, comparable to the distance of random pairs. (): The bivariate density of mean pairwise identity versus LocRN distance of pairs of duplicated ncrn candidates displays three distinct peaks, corresponding to genes which are highly similar at both sequence and secondary structure level (p1), genes with high structural similarity and diverged sequences (p2), and genes which show a degree of divergence at sequence and structure level (p3), comparable to the background distribution of random pairs (p4), see (D). In contrast to (), (B), and (), where all possible pairs have been considered as background distribution, (D) includes only randomly chosen pairs to reduce the computation time for lustalw alignments. RNs at the expanded phylogenetic range. There is strong evidence for the existence of previously undescribed structurally defined ncrn families from structure-based clustering, see Fig. 5. Following the FSD, we observe that very few ncrns retain recognizable

15 Duplicated RN genes in Teleost Fish enomes 15 duplicates. Indeed, only a small minority of structured ncrns appears in a copy number between 2 and 8. The overwhelming majority are single copy loci, while at the same time about 12 % of the loci are accounted for by multi-copy gene families (note that the latter number is an underestimate, since it excludes for instance trn genes). It appears, therefore, that with the exception of a few RN classes, most notably microrns, large-scale duplication events do not lead to a corresponding increase in the ncrn repertoire at least as far as RNs are concerned that depend on a well-defined structure. One immediate implication is that comparative approaches within the same genome, i.e., comparisons between paralogous regions, will have very limited sensitivity for ncrn discovery. 6. Supplement Machine readable sequence and annotation files of the Takifugu rubripes RNz predictions, supplemental text, figures and tables are available at 7. cknowledgments This work has been funded, in part, by the erman DF under the auspieces of the Bioinformatics Initiative BIZ-6/1-2 and the SPP1174 Deep Metazoan Phylogeny and by the 6th Framework Programme of the European nion (SYNLET). Qiang LI s visit to Leipzig in fall 2006 was supported in part by the Vereinigung von Förderern und Freunden der niversität Leipzig e.v. and by grants to Prof. Bailin HO (Fudan niversity, Shanghai, hina). We are grateful to Tanja esell and Stefan Washietl for their offer to use SISSIz before it was published. References 1. The ENODE Project onsortium, Identification and analysis of functional elements in 1% of the human genome by the ENODE pilot project, Nature 447 (2007) P. arninci, onstructing the landscape of the mammalian transcriptome., J Exp Biol 210 (2007) M. Pheasant, J. S. Mattick, Raising the estimate of functional human sequences, enome Res 17 (2007) P. Kapranov, J. heng, S. Dike, D. Nix, R. Duttagupta,. T. Willingham, P. F. Stadler, J. Hertel, J. Hackermüller, I. L. Hofacker, I. Bell, E. heung, J. Drenkow, E. Dumais, S. Patel,. Helt,. Madhavan,. Piccolboni, V. Sementchenko, H. Tammana, T. R. ingeras, RN maps reveal new RN classes and a possible function for pervasive transcription, Science 316 (2007) N.. Lau, E.. Lai, Diverse roles for RN in gene regulation, enome Biol 6 (2005) J. S. Mattick, I. V. Makunin, Non-coding RNs, Hum Mol enet 15 (2006) R F. Bompfünewerer,. Flamm,. Fried,. Fritzsch, I. L. Hofacker, J. Lehmann, K. Missal,. Mosig, B. Müller, S. J. Prohaska, B. M. R. Stadler, P. F. Stadler,

16 16 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler. Tanzer, S. Washietl,. Witwer, Evolutionary patterns of non-coding rnas, Th Biosci 123 (2005) S. Washietl, I. L. Hofacker, M. Lukasser,. Hüttenhofer, P. F. Stadler, Mapping of conserved RN secondary structures predicts thousands of functional noncoding RNs in the human genome, Nat Biotechnol 23 (2005) S. Washietl, J. S. Pedersen, J. O. Korbel,. Stocsits,. R. ruber, J. H. et al., Structured RNs in the ENODE selected regions of the human genome, enome Res 17 (2007) E. Torarinsson, M. Sawera, J. Havgaard, M. Fredholm, J. orodkin, Thousands of corresponding human and mouse genomic regions unalignable in primary sequece contain common RN structure, enome Res 16 (2006) Meyer, Y. Van de Peer, From 2R to 3R: evidence for a fish-specific genome duplication (FSD), BioEssays 27 (2005) K. D. row, P. F. Stadler, V. J. Lynch,. T. memiya,. P. Wagner, The fish specific Hox cluster duplication is coincident with the origin of teleosts, Mol Biol Evol 23 (2006) K. Missal, D. Rose, P. F. Stadler, Non-coding RNs in iona intestinalis, Bioinformatics 21 Suppl 2 (2005) ii77 ii K. Missal, X. Zhu, D. Rose, W. Deng,. Skogerbo, R. hen, P. F. Stadler, Prediction of structured non-coding RNs in the genomes of the nematodes aenorhabditis elegans and aenorhabditis briggsae, J Exp Zoolog B Mol Dev Evol 306B (2006) D. Rose, J. Hackermüller, S. Washietl, S. Findeiß, K. Reiche, J. Hertel, P. F. Stadler, S. J. Prohaska, omputational RNomics of drosophilids, BM enomics 8 (2007) S. Steigele, W. Huber,. Stocsits, P. F. Stadler, K. Nieselt, omparative analysis of structured RNs in S. cerevisiae indicates a multitude of different functions, BM Biology 5 (2007) D. Rose, J. Hertel, K. Reiche, P. F. Stadler, J. Hackermüller, NcDNlign: Plausible multiple alignments of non-protein-coding genomic sequences, enomics, in press. 18. S. Washietl, I. L. Hofacker, P. F. Stadler, Fast and reliable prediction of noncoding RNs, Proc Natl cad Sci S 102 (2005) S. Washietl, I. L. Hofacker, onsensus folding of aligned sequences as a new measure for the detection of functional RNs by comparative genomics, J Mol Biol 342 (2004) T. Babak, B. J. Blencowe, T. R. Hughes, onsiderations in the identification of functional RN structural elements in genomic alignments, BM Bioinformatics 8 (2007) T. esell, S. Washietl, Dinucleotide controlled null models for comparative RN gene prediction., BM Bioinformatics 9 (2008) L. Kong, Y. Zhang, Z. Ye, X. Liu, S. Zhao, L. Wei,. ao, P: assess the proteincoding potential of transcripts using sequence features and support vector machine, Nucleic cids Res 35 (2007) W345 W The thanasius F. Bompfünewerer onsortium, RNs everywhere: genome-wide annotation of structured RNs, J Exp Zoolog B Mol Dev Evol 308 (2007) S. riffiths-jones, S. Moxon, M. Marshall,. Khanna, S. R. Eddy,. Bateman, Rfam: annotating non-coding RNs in complete genomes, Nucleic cids Res 33 (2005) D121 D Liu, B. Bai,. Skogerbo, L. ai, W. Deng, Y. Zhang, D. Bu, Y. Zhao, R. hen, NONODE: an integrated knowledge database of non-coding RNs, Nucleic cids

17 Duplicated RN genes in Teleost Fish enomes 17 Res 33 (2005) D112 D S. riffiths-jones, mirbase: the microrn sequence database, Methods Mol Biol 342 (2006) P. W.. Hsu, H. Huang, S. Hsu, L. Lin,. Tsou,. Tseng, P. F. Stadler, S. Washietl, I. L. Hofacker, mirnmap: genomic maps of microrn genes and their target genes in mammalian genomes, Nucleic cids Res 34 (2006) D135 D M. Szymanski, V.. Erdmann, J. Barciszewski, Noncoding RNs database (ncr- Ndb), Nucleic cids Res 35 (2007) D162 D T. M. Lowe, S. R. Eddy, trnscan-se: program for improved detection of transfer RN genes in genomic sequence, Nucleic cids Res 25 (1997) J. Hertel, I. L. Hofacker, P. F. Stadler, snoreport: omputational identification of snorns with unknown targets, Bioinformatics 24 (2008) J. Hertel, P. F. Stadler, Hairpins in a haystack: Recognizing microrn precursors in comparative genomics data, Bioinformatics 22 (2006) e197 e202, ISMB 2006 contribution. 32. S. Will, K. Reiche, I. L. Hofacker, P. F. Stadler, R. Backofen, Inferring noncoding RN families and classes by means of genome-scale structure-based clustering, PLoS omput Biol 3 (2007) e W. Ritchie, M. Legendre, D. autheret, RN stem-loops: to be or not to be cleaved by RNse III, RN 13 (2007) Y. ltuvia, P. Landgraf,. Lithwick, N. Elefant, S. Pfeffer,. ravin, M. J. Brownstein, T. Tuschl, H. Margalit, lustering and conservation patterns of human micror- Ns, Nucleic cids Res 33 (2005) Quezada,. Navarro, R. San Martin, M. lvarez,. Molina, M. I. Vera, enomic organization of nucleolin gene in carp fish: evidence for several genes, Biol Res 39 (2006) Mosig, M. uofeng, B. M. R. Stadler, P. F. Stadler, Evolution of the vertebrate Y RN cluster, Th Biosci 126 (2007) J. Perreault, J. P. Perreault,. Boire, Ro-associated Y RNs in metazoans: Evolution and diversification, Mol Biol Evol 24 (2007) Woolfe, M. oodson, D. K. oode, P. Snell,. K. McEwen, T. Vavouri, et al., Highly conserved non-coding sequences are associated with vertebrate development, PLoS Biol 3 (2005) e J. Hertel, M. Lindemeyer, K. Missal,. Fried,. Tanzer,. Flamm, I. L. Hofacker, P. F. Stadler, The Students of Bioinformatics omputer Labs 2004 and 2005, The expansion of the metazoan microrn repertoire, BM enomics 7 (2006) O. Jaillon, J. ury, F. Brunet, J. Petit, N. Stange-Thomann, E. Mauceli, et al., enome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype, Nature 431 (2004)

18 18 D. Rose, J. Jöris, J. Hackermüller, K. Reiche, Q. LI, P.F. Stadler Dominic Rose received his Diploma in omputer Science from. Leipzig in 2006; he is now working on his PhD dissertation in Bioinformatics, focussing on ncrn detection and annotation. Julian Jöris is studying omputer Science focused on Bioinformatics at. Leipzig. Jörg Hackermüller received his PhD in hemistry from. Vienna in fter post doctoral positions at the Novartis Institutes for Biomedical Research, Vienna and at the Fraunhofer Institute for ell Therapy and Immunology, Leipzig, he is currently leading the RNomics group at Fraunhofer IZI. Kristin Reiche received her PhD in omputer Science in 2007 focussing on ncrn detection and annotation. She is now a staff scientist with the RNomics group at Fraunhofer IZI. Qiang LI is a PhD student focussing on plant ncrn detection at the T-Life Research enter, working with Prof. Bailin HO at Fudan niversity, Shanghai Peter F. Stadler received his PhD in hemistry from. Vienna in 1990 and then worked as ssistant and ssociate Professor for Theoretical hemistry at the same School. In 2002 he moved to Leipzig as Full Professor for Bioinformatics. Since 1994 he is External Professor at the Santa Fe Institute. He helped to found the RNomics group at FH IZI in 2005.

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Comparing Genomes! Homologies and Families! Sequence Alignments!

Comparing Genomes! Homologies and Families! Sequence Alignments! Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: A Zebrafish Example Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: An Example: Zebrafish The goal Solutions Annotation and Nomenclature: An

More information

Searching for Noncoding RNA

Searching for Noncoding RNA Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Biased amino acid composition in warm-blooded animals

Biased amino acid composition in warm-blooded animals Biased amino acid composition in warm-blooded animals Guang-Zhong Wang and Martin J. Lercher Bioinformatics group, Heinrich-Heine-University, Düsseldorf, Germany Among eubacteria and archeabacteria, amino

More information

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny Phylogenetics and chromosomal synteny of the GATAs 1273 GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny CHUNJIANG HE, HANHUA CHENG* and RONGJIA ZHOU* Department

More information

RNA Abstract Shape Analysis

RNA Abstract Shape Analysis ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

MicroRNA innovations

MicroRNA innovations MicroRN innovations and the origin of novel cell types. Tanzer,,3, P.F. Stadler,,4,.P.Wagner 3 Bioinformatics roup, Department of ompute Science, niversity of Leipzig, ermany Institute for Theoretical

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Towards a Comprehensive Annotation of Structured RNAs in Drosophila

Towards a Comprehensive Annotation of Structured RNAs in Drosophila Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13 Visit to BPRC Adres: Lange Kleiweg 161, 2288 GJ Rijswijk Utrecht CS à Den Haag CS 9:44 Spoor 9a, arrival 10:22 Den Haag CS à Delft 10:28 Spoor 1, arrival 10:44 10:48 Delft Voorzijde à Bushalte TNO/Lange

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/312/5780/1653/dc1 Supporting Online Material for The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene Laurent Duret,* Corinne Chureau,

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

The sex-determining locus in the tiger pufferfish, Takifugu rubripes. Kiyoshi Asahina and Yuzuru Suzuki

The sex-determining locus in the tiger pufferfish, Takifugu rubripes. Kiyoshi Asahina and Yuzuru Suzuki The sex-determining locus in the tiger pufferfish, Takifugu rubripes Kiyoshi Kikuchi*, Wataru Kai*, Ayumi Hosokawa*, Naoki Mizuno, Hiroaki Suetake, Kiyoshi Asahina and Yuzuru Suzuki Fisheries Laboratory,

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Genomics. NcDNAlign: Plausible multiple alignments of non-protein-coding genomic sequences

Genomics. NcDNAlign: Plausible multiple alignments of non-protein-coding genomic sequences Genomics 92 (2008) 65 74 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno NcDNAlign: Plausible multiple alignments of non-protein-coding genomic sequences

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Evolutionary dynamics of conserved. non-coding DNA elements: Big bang. or gradual accretion? Sujai Kumar

Evolutionary dynamics of conserved. non-coding DNA elements: Big bang. or gradual accretion? Sujai Kumar Evolutionary dynamics of conserved non-coding DNA elements: Big bang or gradual accretion? Sujai Kumar Master of Science School of Informatics University of Edinburgh 2007 Abstract Background Previous

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Zhiqiang Li 1 and Peter Z. Revesz 1,a 1 Department of Computer Science, University of Nebraska-Lincoln, Lincoln, NE,

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

Browsing Genomic Information with Ensembl Plants

Browsing Genomic Information with Ensembl Plants Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial

More information

Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature

Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature Supplementary file for: Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature Chang-Bum Jeong a,b,#, Bo-Mi Kim a,#, Hye-Min Kang a, Ik-Young Choi c, Jae-Sung

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Pyrobayes: an improved base caller for SNP discovery in pyrosequences Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

BMC Evolutionary Biology

BMC Evolutionary Biology BMC Evolutionary Biology This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. Divergence in cis-regulatory

More information

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier Master Biomedizin 2018 1) UCSC & UniProt 2) Homology 3) MSA 4) 1 12 a. All of the sequences in file1.fasta (https://cbdm.uni-mainz.de/mb18/) are homologs. How many groups of orthologs would you say there

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

Computational Structural Bioinformatics

Computational Structural Bioinformatics Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

Adaptive Evolution of Conserved Noncoding Elements in Mammals

Adaptive Evolution of Conserved Noncoding Elements in Mammals Adaptive Evolution of Conserved Noncoding Elements in Mammals Su Yeon Kim 1*, Jonathan K. Pritchard 2* 1 Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America,

More information

Information Theoretic Distance Measures in Phylogenomics

Information Theoretic Distance Measures in Phylogenomics Information Theoretic Distance Measures in Phylogenomics Pavol Hanus, Janis Dingel, Juergen Zech, Joachim Hagenauer and Jakob C. Mueller Institute for Communications Engineering Technical University, 829

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and counterclockwise for the inner row, with green representing coding

More information

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis 18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis An organism arises from a fertilized egg cell as the result of three interrelated processes: cell division, cell

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

1 Introduction. Abstract

1 Introduction. Abstract CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate

More information

Prediction of Structured Non-Coding RNAs in the Genomes of the Nematodes Caenorhabditis elegans and Caenorhabditis briggsae

Prediction of Structured Non-Coding RNAs in the Genomes of the Nematodes Caenorhabditis elegans and Caenorhabditis briggsae JOURNAL OF EXPERIMENTAL ZOOLOGY (MOL DEV EVOL) 306B:379 392 (2006) Prediction of Structured Non-Coding RNAs in the Genomes of the Nematodes Caenorhabditis elegans and Caenorhabditis briggsae KRISTIN MISSAL

More information

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics Chapter 18 Lecture Concepts of Genetics Tenth Edition Developmental Genetics Chapter Contents 18.1 Differentiated States Develop from Coordinated Programs of Gene Expression 18.2 Evolutionary Conservation

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information

RNAscClust clustering RNAs using structure conservation and graph-based motifs

RNAscClust clustering RNAs using structure conservation and graph-based motifs RNsclust clustering RNs using structure conservation and graph-based motifs Milad Miladi 1 lexander Junge 2 miladim@informatikuni-freiburgde ajunge@rthdk 1 Bioinformatics roup, niversity of Freiburg 2

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

NcDNAlign: Plausible Multiple Alignments of Non-Protein-Coding Genomic Sequences

NcDNAlign: Plausible Multiple Alignments of Non-Protein-Coding Genomic Sequences NcDNAlign: Plausible Multiple Alignments of Non-Protein-Coding Genomic Sequences Dominic Rose a, Jana Hertel a, Kristin Reiche a, Peter F. Stadler a,b,c, Jörg Hackermüller d, a Bioinformatics Group, Department

More information

BIOINFORMATICS. PILER: identification and classification of genomic repeats. Robert C. Edgar 1* and Eugene W. Myers 2 1 INTRODUCTION

BIOINFORMATICS. PILER: identification and classification of genomic repeats. Robert C. Edgar 1* and Eugene W. Myers 2 1 INTRODUCTION BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 1 PILER: identification and classification of genomic repeats Robert C. Edgar 1* and Eugene W. Myers 2 1 195 Roque Moraes Drive, Mill Valley, CA, U.S.A., bob@drive5.com.

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information