Processed Small RNAs in Archaea and BHB Elements

Size: px
Start display at page:

Download "Processed Small RNAs in Archaea and BHB Elements"

Transcription

1 University of Leipzig Bioinformatics Master s Program in Bioinformatics Master s Thesis Processed Small RNAs in Archaea and BHB Elements submitted by Sarah Berkemer March 16, 2015 Supervisor Prof. Peter F. Stadler Advisor Dr. Christian Höner zu Siederdissen Reviewers Prof. Peter F. Stadler Dr. Christian Höner zu Siederdissen

2

3 Acknowledgement I would like to thank Peter F. Stadler for supervising me with this work but also for the support, the motivating goals and for enhancing possibilities of distinct workplaces in the world. I also want to thank Christian Höner zu Siederdissen who advised, helped and supported me a lot, especially for reviewing in really short time and his (almost) 24/7 answering skype account open for many questions. Thanks to Petra Pregel and Jens Steuck for bureaucratic and technical support. A lot of the work was done at the TBI in Vienna, where I could spend two months. Thanks a lot to Ivo L. Hofacker who invited me to Vienna and the TBI table soccer team for good discussions and a great time. I want to thank Fabian Amman and Sebastian Will for support and help in topics I didn t know a lot about before and for patiently answering all my questions. I would like to thank my collegues and friends who supported, motivated and also distracted me when I needed a break. I also want to thank a lot Sven and my family being always available when needed, giving me mental and material support.

4

5 Abstract Bulge-helix-bulge (BHB) elements guide the enzymatic splicing machinery that in Archaea excises introns from trnas, rrnas from their primary precursor, and accounts for the assembly of piece-wise encoded trnas. This processing pathway renders the intronic sequences as circularized RNA species. Although archaeal transcriptomes harbor a large number of circular small RNAs, it remains unknown whether most or all of them are produced through BHB-dependent splicing. In this thesis the approach of two genome-wide surveys of BHB elements of a phylogenetically diverse set of archaeal species is shown and further complemented by searching for BHB-like structures in the vicinity of circularized transcripts. A result of this work shows that besides trna introns, the majority of box C/D snornas is associated with BHB elements. Not all circularized srnas, however, can be explained by BHB elements, suggesting that there is at least one other mechanism of RNA circularization at work in Archaea. Pattern search methods were unable, however, to identify common sequence and/or secondary structure features that could be characteristic for such a mechanism. The thesis is based on the work by Höner zu Siederdissen et al. [1] and its extension by Berkemer at al. [2]. The first study gives an overview of the topic, especially BHB elements and the application of programs like Infernal [3] and segemehl [4, 5] to find such secondary structure motifs. The second study refines the models such that the search can be done more specifically. Additionally, more data sets were included in the workflow to search and evaluate further results, in particular the inclusion of both, the previously considered intron-splicing BHB elements, and an additional class of box C/D snorna-splicing elements.

6

7 Contents 1. Introduction 1 2. Biological Background Archaea RNA Processing Splicing Mechanism RNAs in Archaea Splicing in Archaea Circularization Circularized introns Examples of circular RNA in Archaea Bulge-Helix-Bulge (BHB) elements Structure of BHB elements Location of BHB elements Computational Methods Mapping Covariance Models Pattern Search Workflow Known and Putative BHB Elements Circular transcripts Evolutionary Conservation Element-based search Pattern Discovery Results and Discussion Results Known box C/D snornas with BHB elements Circularized RNAs with BHB elements Circular RNAs without BHB elements Discussion Conclusion and Outlook 43 A. Appendix 45 I

8 Contents List of Figures 62 List of Tables 67 II

9 1. Introduction Since the discovery of Archaea in the 1970s and their definition by Woese et al. in 1990 [6], much has been done concerning sequencing of archaeal genomes and their analysis. Before, Archaea were classified into the kingdom of Prokaryotes together with Bacteria. However, the archaeal species share traits with Eukaryotes as well as with Bacteria. In this way, Archaea are now understood as one of the three domains of life, clearly distinguished from Eukarya and Bacteria. Even though they seem to be in between Eukarya and Bacteria when looking at their genomics and molecular traits, there exist theories which cluster Archaea together with Eukarya into one domain, supporting the theory that Eukarya seem to have evolved from archaeal ancestors [7]. The small non-coding RNAs (srnas) of Archaea are much less understood than their eubacterial counterparts. At least in part this is a consequence of the comparably much smaller number of fully sequenced genomes since the large evolutionary distances between them hamper homology-based annotation. Archaea share ribosomal RNAs (rrnas), transfer RNAs (trnas) and the RNA components of RNase P and the signal recognition particle (7S RNA) with the other two domains of life. Although a large number of other srnas has been described for individual species, only three RNA classes are unambiguously recognizable within this diversity: Archaea and Eubacteria share CRISPR-Cas adaptive immune systems [8], and two classes of guide RNAs direct chemical modifications of rrnas and other non-coding RNAs in both Eukarya and Archaea [9]. Both the archaeal box C/D and the box H/ACA RNPs contain core proteins clearly homologous to those of the eukaryotic snornps, reviewed in [10, 11]. Hence they are often referred to as snornas even though Archaea do not have a nucleolus. Thus, in this work the nomenclature is maintained. Box C/D srnps have a dimeric architecture [12] throughout the Archaea. Like their eukaryotic counterparts they catalyze 2 -O-ribose methylation at target sites determined by the box C/D snorna [9]. These RNAs have a length of about nucleotides and feature two functionally essential kink-turns associated with the C/D and C /D sequence motifs that are characteristic for the RNA class [13]. The box H/ACA snornas guide the formation of pseudouridine [9, 14]. Their canonical secondary structure consists of a single stem-loop structure. In contrast, eukaryotic snornas usually consist of two such stem-loops, each of which addresses an individual target. Beyond the canonical forms, recently several pseudouridylation guide RNAs with divergent secondary structures have been reported [15]. Circularized forms of small RNAs are abundant in some Archaea [16]. Some box C/D snornas seem to be present predominantly or possibly exclusively as circular RNAs [17, 16, 18]. This is also true for assorted other small RNA species, among them the 5S rrna [16]. The biosynthesis 1

10 1. Introduction of most of these circular RNAs remains unknown. understood as a consequence of enzymatic splicing. Only in a few select cases it is Archaea completely lack a spliceosomal splicing machinery. Enzymatic splicing, however, is a rather common feature in archaeal RNA processing. Mechanistically, it is closely related to trna splicing in Eukarya. A specific endonuclease recognizes and cleaves the so-called bulge-helix-bulge (BHB) structure, a specific ligase then joins the exons and circularizes the intron [19, 20, 21, 22]. The prime example of BHB splicing is the removal of trna introns. In contrast to Eukarya, archaeal trnas may have multiple introns [23, 24] and the same mechanism implements a form of transsplicing that composes trnas from two or three independently encoded fragments [25, 26, 27, 28, 29, 30]. The maturation of the ribosomal RNA operon proceeds with the help of two BHB elements that form over long distances and guide formation of circularized precursors of the 16S and 23S rrna, respectively [31, 32]. At least one pre-mrna (CBF5, the archaeal homolog of dyskerin) contains an intron with a BHB element in many crenarchaeal species [33]. Although many of the box C/D snor- NAs in several archaeal species appear as circularized RNA molecules in the cell [34], this is known to be the consequence of BHB-guided splicing in only a single case, namely the box C/D snorna processed from a long intron of the trna-trp precursor in Pyrococcus species [35]. The BHB elements of trna introns and rrnas are well-conserved across the Crenarchaeota phylum [36] and have been studied in detail [36, 37, 38, 39, 40, 41, 16, 34]. They fall into three classes originally defined in [36]. This thesis is based on two studies by Höner zu Siederdissen et al. [1] and Berkemer et al. [2] which explore mainly two interrelated questions: (1) Can BHB-elementdependent splicing explain all or at least most of the observed circularized or permuted small RNAs, and (2) to what extent can BHB elements be used in their own right as means of detecting novel small RNAs and/or likely sites of RNA processing. For the first question, the focus is based in particular on box C/D snornas because the members of this abundant RNA class are often circularized. The second question is of practical relevance since in many cases RNA-seq data do not provide decisive information. (a) Circularized RNAs are depleted in most RNA-seq protocols unless specifically enriched e.g. by RNase R treatment [16]. (b) Spliced trnas cannot be detected in cases where an unspliced paralog is present in the same genome [18]. (3) Circularized introns e.g. of trnas are often too short to be detectable by sequencing in their own right. For instance 5 out of 8 introns listed in [34] have a length of 26 nt or less. This work consists of 6 chapters separated in Introduction 1, Biological Background 2, Computational Methods 3, Workflow 4, Results 5 and Conclusion 6. The following chapter, Chap. 2, describes the biological background focused on kinds of RNA and splicing mechanism in Archaea compared to the other two domains, namely Eukarya and Bacteria. Furthermore, the structure and location of BHB elements serving as a recognition motif for enzymatic splicing in Archaea is explained. Chapter 3 then presents the computational methods used to analyze the given data. 2

11 Thus, the programs are explained and reasons are given for having chosen them. In the next chapter, Chap. 4, the steps of the analysis are enumerated and sketched, including analyzing the data, building the models and evaluating the results. This workflow includes the data which was used, the way the data was analyzed and a description of how the programs were applied to the data. Furthermore, the chapter describes the search for homologous sequences and further sequence or secondary structure patterns. In Chapter 5, results of the analysis are shown and explained. The results are separated into two parts, namely a subset of sequences showing evidence for a flanking BHB element, and sequences were no BHB element was found. The latter sequences were analyzed further and results are shown. Additionally, a short discussion about the results is included in Chap. 5. The last chapter, Chap. 6 gives a conclusion about the work and an outlook about future work. The material and results of this thesis are summarized in two publications. The first paper titled Comparative detection of processed small RNAs in Archaea. was written in 2014 Christian Höner zu Siederdissen, Sarah Berkemer, Fabian Amman, Axel Wintsche, Sebastian Will, Sonja J. Prohaska and Peter F. Stadler and is published in the proceedings of the IWBBio 2014 (International Work-Conference on Bioinformatics and Biomedical Engineering). The second publication titled Processed Small RNAs in Archaea and BHB Elements. was written in 2015 by Sarah J. Berkemer, Christian Höner zu Siederdissen, Fabian Amman, Axel Wintsche, Sebastian Will, Ivo L. Hofacker, Sonja J. Prohaska and Peter F. Stadler and is submitted to Genomics and Computational Biology. 3

12

13 2. Biological Background This chapter gives an introduction to the domain Archaea in general, unique traits found in archaeal species but also similarities in comparison with the other two domains, Bacteria and Eukarya [6]. Furthermore, RNA processing, especially splicing mechanisms, are explained. By now, the archaeal splicing mechanism is thought to always produce a circularized intron. There are different types of RNA molecules which undergo the splicing procedure. Even though circularized RNA molecules are found rarely in nature, they seem to be abundant in Archaea [42]. The second part of this chapter focuses on BHB elements, also called bulge-helix-bulge elements, which are a secondary structure element closely related to the archaeal splicing mechanism. Thus, pre-rna molecules form this bulge-helix-bulge motif which acts as a recognition element for the splicing enzymes [36]. By now, no typical sequence motifs related to BHB elements are found and it is known, that BHB elements are quite flexible in structure and length which makes it more complicated to search and locate BHB elements Archaea Since the discovery of the archaeal species in the 1970s [6], numerous studies contributed to gain a more detailed and deeper understanding of the archaeal domain and their role during the evolution of species. Originally, Archaea were thought to belong to the kingdom of Prokaryotes. Based on protein sequences and structural characteristics, Woese et al. proposed the three domains of life [6]. Additionally, first archaeal species were found and classified into the evolutionary tree of life as it can be seen in Figure 2.1. Archaea share characteristics with Bacteria as well as with Eukarya but one can find several traits which are unique for the archaeal species as well. Table 2.1 gives short definitions for the three domains of life as defined by Woese et al. [6]. However, their classification might change again based on current insights gained through sequencing technologies and computational methods which show evidence for a classification of Archaea and Eukaryotes into one domain [7]. Common Traits of Archaea and Bacteria Single cellular organisms without a nucleus are called prokaryotic. Before distinguishing Bacteria and Archaea based on their genetics, they were classified together into the kingdom of Prokaryotes regarding their phenotypes [6]. Another common trait of Archaea and Bacteria is the shape of their genome. Archaeal and bacterial genomes, apart from a few exceptions, consist of one chromosome which is circularized inside the 5

14 2. Biological Background Figure 2.1.: Tree of life (a) as developed by Woese et al. [6] when defining the three domains of life, Bacteria, Archaea and Eukarya and evolutionary relations between them. Panel b shows how organisms evolved. Arrows indicate the direction of evolution whereas for Archaea and Gram-positive Bacteria a coevolution is assumed, indicated by the double-headed arrow. The dashed lines show putative symbiotic events of archaeal and bacterial cells which led to eukaryotic cells. Monoderms and Diderms indicate the number of membranes, thus one or two which bound the prokaryotic cells. Here, CM stands for cell membrane, CW for cell wall, OM for outer membrane and PE for periplasm. The figure is taken from Gupta [43]. cell soma [6, 44]. The genomes lengths and the number of protein-coding genes vary between the species but Bacteria and Archaea both seem to have their genes arranged into operons [44], regulating gene transcription and expression. As well as Bacteria, Archaea express so called CRISPR RNAs which act as an immune system in prokaryotic organisms being a protection against foreign plasmids or phages [45]. CRISPR RNAs are composed of repetitive sequences with short spacer sequences in between. They are often associated to cas proteins which are related to CRISPR RNAs, together called the CRISPR/Cas system. Common Traits of Archaea and Eukarya Even though single cellular organisms can also be found within the domain of Eukarya, a nucleus can be found in nearly all the eukaryotic species whereas archaeal species lack a nucleus. Still, the translation and transcription processes in Eukarya and Archaea are similar. An exception are archaeal RNA-polymerase and ribosomes which differ from the eukaryotic ones but still resemble more their eukaryotic relatives than the ones in Bacteria [46]. Beside several genetic traits which are common or very similar between Archaea and Eukarya, it has been discovered that both, archaeal and 6

15 2.1. Archaea Domain Eucarya [Greek adjective ɛˆυ for good or true and Greek noun καρυoν for nut or kernel, referring to the nucleus]: cells eukaryotic; cell membrane lipids predominantly glycerol fatty acyl diesters; ribosomes containing an eukaryotic type of rrna. Domain Bacteria [Greek noun βακτηριoν for small rod or staff ]: cells prokaryotic; membrane lipids predominantly diacyl glycerol diesters; ribosomes containing a (eu)bacterial type of rrna. Domain Archaea [Greek adjective ˆαρχαιoς for ancient, primitive]: cells prokaryotic; membrane lipids predominantly isoprenoid glycerol diethers or diglycerol tetraethers; ribosomes containing an archaeal type of rrna. Kingdom Euryarchaeota (Archaea) [Greek adjective ɛˆυρυς for broad, wide, spacious; Greek adjective ˆαρχαιoς for ancient, primitive]: relatively broad spectrum of niches occupied by these organisms and their varied patterns of metabolism; ribosomes containing an euryarchaeal type of rrna. Kingdom Crenarchaeota (Archaea) [Greek noun κρηνη for spring, fount; ˆαρχαιoς for ancient, primitive]: ostensible resemblance of this phenotype to the ancestor (source) of the domain Archaea; ribosomes containing a crenarchaeal type of rrna. Table 2.1.: Definitions for the three domains of life and two archaeal kingdoms by Woese et al. in 1990 [6]. eukaryotic DNA is associated with histones. Archaeal species just use the core part of the histone complex whereas Eukaryotes have two additional subunits bound to the outer part of the DNA-histone complex [44]. Another important common trait among Eukarya and Archaea is the existence of snornas (small nucleolar RNAs) [48], a group of ncrnas which are short in sequence length and guide RNA modifications. Two families of snornas, as they are called in Eukaryotes, are box C/D and box H/ACA snornas. They guide RNA methylations and other RNA modifications and are characterized by their sequence motifs. Box C/D snornas show conserved motifs named boxes C (5 PuUGAUGA3 ) and D (5 CUGA3 ) which fold into two kink-turn structural motifs. They also contain a less conserved copy of the motifs which are called boxes C and D. Boxes C and D can be found near the start and end of the sequences whereas boxes C and D form the central part of the sequence. Box H/ACA snornas consist of conserved motifs called box H (ANANNA, where N stands for any nucleotide) and box ACA (ACA, always found 3 nucleotides away from the 3 end). Box C/D and H/ACA snornas also contain antisense elements or guide regions which serve as the binding region to guide RNA modifications, as depicted in Figure 2.2. Target molecules can be 16s and 23s rrna but also other types of RNAs. Especially rrnas are in need for RNA modifications guided by small RNAs to ensure their stability [49]. 7

16 2. Biological Background Figure 2.2.: Schematic secondary structure (a) of a box C/D snorna showing the locations of box motifs whereas box C (UGAUGA) and box D (CNGA) are highly conserved and box C and box D are less conserved. Between the box motifs guide regions are located used to bind rrna molecules for modification. A box H/ACA snorna (b) has conserved box H (ANANNA) and box ACA (ACA) motifs and guide regions to bind rrna molecules (taken from [47]). Unique traits of Archaea (and close relatives) Regarding the sequences of the 16s rrnas of diverse prokaryotic species, Woese et al. discovered that the 16s rrna found in a subgroup of the species differed significantly from the remaining data set [50]. Based on this discovery, Woese could define the three domains of life which consist of Archaea, Bacteria and Eukarya [6] and replaced the theory of the two kingdoms, the Prokaryotes and Eukaryotes. The 16s rrna is a ribosomal RNA used for protein production. It occurs in prokaryotic organisms and supports the production of several vital proteins. Therefore, a mutation in the 16s rrna is mostly lethal. Still, Woese et al. found different kinds of 16s rrna inside the prokaryotic kingdom which led to the formation of two different domains of life. Another trait distinguishing Archaea from Bacteria and Eukaryotes is the structure of their membranes. Since many archaeal species can be found in extreme environments, the molecular structure of archaeal cell membranes differs from other species due to stability reasons. As depicted in Figure 2.3, archaeal membranes include Ether bonds linking the head of the lipids to the tail. For Bacteria and Eukarya, Ester bonds fulfill this function [51]. Additionally, in some archaeal as well as bacterial species the membranes consist of just a single lipid layer instead of a double lipid layer. In lipid bilayers, the membrane 8

17 2.1. Archaea Figure 2.3.: Membrane lipids of Archaea and Bacteria [51]. The main difference between the lipids is the kind of bonds linking the backbone with the fatty acid to form a membrane lipid. In Bacteria as well as Eukarya, this is done by Ester bonds whereas in Archaea, Ether bonds are used due to stability reasons since many archaeal species can be found in extreme environments. consists of two phospholipid molecules which are mostly subdivided into an hydrophilic head and a hydrophobic tail. The membrane is formed in a way such that the tails of the two molecules point into the inside of the membrane, the heads form the outer edges. In this way the inner part is protected from water whereas the heads are formed to interact with water molecules. In many archaeal and several bacterial species such as Gram-positive Bacteria, the membrane consists just of one lipid layer. The phospholipids consist of one inner fatty acid chain which is hydrophobic. The inner part of the molecule is protected by two hydrophilic heads at each side of the fatty acid chain. In this way, the membrane still has the property of being able to interact with water molecules at the outer edge but is more stable in case of extreme conditions [51]. This common trait between several archaeal species and Gram-positive Bacteria led to the assumption of a close evolutionary relationship among them. There are a number of competing theories since most archaeal species are immune against antibiotics produced mainly by Gram-positive Bacteria [43]. This would support the archaeal ancestors evolving from gram-positive bacterial species due to selective pressure and adaption to extreme environments, a theory which is supported by the fact that many archaeal species can be found in environments being very hot or cold, sulfurous, highly saline, acidic, or alkaline water. Extremeophile Archaea belong to one of four main physiological groups: halophiles, thermophiles, alkaliphiles, and acidophiles [52]. Still, there exist also some archaeal species living in mild or 9

18 2. Biological Background moderate conditions [53]. Being able to adapt to extreme or unusual living conditions - such as hot or acidic environments but also adapting to neighbours with an affection to antibiotics - can also be seen when looking at their metabolism. Some archaeal species are able to perform methanogenesis, thus being able to produce methane [44]. Figure 2.4.: Different hypotheses for the tree of life [7]. The three-domains theory (a) was first suggested by Woese et al. [6]. In the Eocyte hypothesis (b), Eukarya are classified within the Archaea which leads again to just 2 domains of life [7] however, distinct from the Prokaryotes-Eukaryotes classification. Additionally, the species of the TACK superphylum classified as closely related species are depicted here. The TACK species play an important role regarding distinct evolutionary hypothesis. Depending on the hypothesis, TACK species are close relatives to Eukaryotes or Euryarchaeota. Evolutionary Relationships Since the discovery of Archaea by Woese et al. [6], Archaea are assumed to play an important role in the evolutionary development of species. With the detection of more and more archaeal species and their genetic traits, their evolutionary relations to Bacteria and Eukarya get more and more complex. In recent studies, a new superphylum called TACK was proposed by Guy et al. [54]. This superphylum includes 4 archaeal species, namely Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota (for definitions, see Table 2.2), which are closely related to each other but on a different branch in the phylogenetic tree than the Euryarchaeota. In fact, the Euryarchaeota are the focus of another discussion. With the definitions of new phyla among the archaeal domain and new sequencing technologies, new archaeal species can be found and classified into the phylogenetic tree [55] as depicted in Fig In this work, small RNAs in five different archaeal species will be analyzed, namely Methanopyrus kandleri, Sulfolobus acidocaldarius, Sulfolobus solfataricus, Nanoarchaeum equitans and Ignicoccus hospitalis. As it can be seen in the phylogenetic tree, Fig. 2.5, M. kandleri and N. equitans are classified to the Euryarchaeota, whereas M. kandleri 10

19 2.1. Archaea Thaumarchaeota: Reside in both aquatic and terrestrial environments; recently proposed phylum of abundant chemolithoautotrophic ammoniaoxidizers that play an important role in biogeochemical cycles such as the nitrogen cycle and the carbon cycle. Aigarchaeota: species of the Hot Water Crenarchaeotic Group I (HWCGI); one single species identified by now (Ca. Caldiarchaeum subterraneum), which was identified in a microbial mat at a geothermal stream of a subsurface goldmine; its genome contains some intriguing eukaryotic features Crenarchaeota: well-characterized archaeal phylum first described by Woese [6]; comprising mostly (acido)thermophilic anaerobes, although some aerobic and micro-aerophilic species exist; Crenarchaeal cells come in various shapes, ranging from coccoid to rod-shaped/filamentous, and they mostly rely on respiration for generation energy Korarchaeota: a group of deepbranching Archaea with ultra-thin, needleshaped small cells; geographically restricted to terrestrial and marine thermal environments Table 2.2.: Summary of the definitions of the TACK superphylum by Guy et al. [54] belongs to the group of Methanopyrales and N. equitans to the Nanoarchaeota phylum. In vivo, N. equitans lives in a symbiotic manner with I. hospitalis such that collecting genomic data of one always includes also the second species. I. hospitalis as well as S. solfataricus and S. acidocaldarius belong to the Crenarchaeota, one of the phyla inside the TACK superphylum. Thus now, there are also new possibilities for redefining the relations between the three domains of life. One recently published theory is the theory of the 2 domains of life [7, 56], since here, Eukaryotes are classified within the domain of Archaea, such that Eukaryotes are closely related to the species of the TACK superphylum. The Euryarchaeota are then the next relative to the group of Eukaryotes and TACK species. This can be seen in Figure 2.4. This theory is based on the assumptions that current phylogenetic methods are more accurate than the ones Woese et al. used in 1990 and before [6]. At the same time reasonable evolutionary causes set the focus on the 2 domains theory [7, 56] even though none of the theories was shown to be significantly more reasonable. 11

20 2. Biological Background Figure 2.5.: Phylogenetic tree of life for archaeal species in 2011 (taken from [55]) 12

21 2.2. RNA Processing 2.2. RNA Processing RNA processing seems to be an essential task in all known organisms. Even Bacteria process their RNAs, but they seem to have less RNA processing in total in comparison to Archaea and Eukarya [48]. A lot of different pathways and targets for RNA processing have been identified. RNA processing includes synthesizing and digesting of RNA molecules but also their modification. Such modifications are done to stabilize the RNA molecule and protect it from digestion. This can be done by adding short sequences at its ends as done with mrna molecules. But also circularizing the molecule will lead to a more stable structure. Another important task in RNA modification is the splicing procedure. Here, one or more parts of the molecule are cut out (introns) and the remaining subsequences (exons) are ligated together such that they form again a single RNA sequence. In this way the exon as well as the intron can further act as an RNA molecule [48]. Known ways of splicing and the splicing mechanism in Archaea will be explained in the following subsections. 1. Self-splicing (autocatalytic) group I and II introns Splicing is done due to a transposition mechanism by RNA molecules without further help of proteins. The mechanisms are similar and the transposition is called homing for group I introns or retrohoming for group II introns since here, an RNA intermediate is involved instead of just DNA molecules. Self-splicing introns act as ribozymes (RNA with enzymatic activity). The mobile elements can be found in Eukaryotes and Bacteria, as well as viruses whereas group II introns do not occur in eukaryotic nuclear genomes [57]. Self-splicing introns do not exist in Archaea except a few cases of group II introns reported in Euryarchaeota [58]. 2. Spliceosomal introns The spliceosome is a highly flexible and dynamic ribonucleoprotein (RNP) complex composed of five snrnps and several proteins. Eukaryotic pre-mrna splicing is done with the spliceosome [59]. 3. trna and archaeal introns (enzymatic splicing) trna splicing exists in all domains of life. Whereas in Archaea and Eukarya, trnas are spliced by enzymes [60], bacterial trnas are self-splicing [58]. Archaeal and eukaryal trna endonucleases are closely related to each other [58]. At least a second enzyme, a specific ligase is also included in the trna splicing mechanism. In Archaea, not only trnas are spliced due to this mechanism but also ncrnas and mrnas are known to undergo enzymatic splicing [49]. For Eukaryotes, there exist a few cases, where protein-coding genes are spliced enzymatically [61, 62]. Table 2.3.: Summary of known splicing pathways. A classification of splicing mechanism into the domains of life is shown in Fig

22 2. Biological Background Type Eukarya Bacteria Archaea Group I y y n Group II y y y* Splicesosomal y n n Enzymatic y n y Table 2.4.: Splicing mechanism in the domains of life. In Eukarya, all splicing mechanisms can be found (y). In Bacteria, just self-splicing is possible. Only enzymatic splicing is done in Archaea, except for a few species where self-splicing introns of group II were reported(*) [58] Splicing Mechanism Four different splicing mechanisms are known today. Whereas all mechanisms exist in Eukarya, Bacteria and Archaea are known to use just two of them. As shown in Tab. 2.3 and 2.4, only self-splicing introns (group I and II) can be found in Bacteria [57]. In Archaea, mainly enzymatic splicing is done except in a few cases where selfsplicing group II introns were reported [58]. For Eukarya, most mrnas are spliced by the spliceosome, mainly noncoding RNAs are self-splicing and trnas underlie the enzymatic splicing mechanism. For all cases, a small number of exceptions are known [61, 62]. In contrast to Eukarya where enzymatic splicing is mainly used for trnas, Archaea are known to also splice ncrnas, rrnas and mrnas enzymatically RNAs in Archaea Even though a plethora of different RNA molecules exist, just the ones which are important for this work will be described here. As RNAs are an essential part in the molecular genetics of organisms, they also play an important role in phylogenetic studies. As described in Section 2.1, Archaea share common traits with Eukarya as well as Bacteria. Besides mrna, trna and rrna which are part of the transcription and translation processes, snornas and CRISPR RNAs were found in archaeal species. Box C/D and H/ACA snornas are two families inside the class of snornas and can also be found in Eukarya. So called antisense elements inside the snorna sequence are responsible for the binding of box C/D or H/ACA snornas to their target molecules which are mainly other RNAs. Box H/ACA snornas are barely found in Archaea [49]. However, box C/D snornas are abundant in most of the archaeal species, e.g. in Methanopyrus kandleri 126 different C/D box snornas were found, in Nanoarchaeum equitans which has a small but dense genome, still 22 C/D box snornas could be detected [63]. By now, it is known that many C/D box snornas are often located in intergenic regions, sometimes overlapping the 3 or 5 flanking ORFs [17]. Another important ncrna in Archaea is the CRISPR RNA. This kind of RNA molecule is part of the prokaryotic immune system and can be found in prokaryotic organisms (see also Section 2.1). 14

23 2.2. RNA Processing A B C Figure 2.6.: BHB element scheme. Splicing occurs at the splicing junctions (indicated by arrows) on substrate A. The splicing junctions are located within two bulges linked by a helix forming the BHB motif. After the splicing process exon B and intron C are ligated such that the intron is circularized Splicing in Archaea Since at first Bacteria and Archaea were clustered together in the kingdom of Prokaryotes, they were assumed to use similar splicing mechanisms such as self-splicing. By sequencing and analyzing the 16s RNA in Archaea and Bacteria, Woese et al. could show that Archaea are an own domain of life [6]. Subsequently, traits being distinct among Archaea and Bacteria were discovered, such as the splicing mechanisms [64]. As explained in Table 2.3, four different pathways for the splicing mechanism are known. In Eukaryotes, each of those mechanisms can be found. In Archaea just one splicing mechanism, enzymatic splicing, is known today except in a few cases of group II introns [58]. This mechanism is based on two enzymes, the specific endonuclease and a ligase. In known cases of splicing in Archaea, the exon as well as the intron is ligated after the splicing. This leads to circularized introns which seem to be abundant in archaeal species [21]. This machinery is assumed to be responsible for the splicing of different kinds of RNAs e.g. the maturation of mrna, trna and rrna which also includes 16s and 23s rrna [19, 20, 32]. It could be shown that the endonuclease uses a certain recognition element which seems to be expressed only as a secondary structure element instead of showing a typical sequence motif [36, 41, 1]. The so called BHB motif, meaning bulge-helix-bulge motif, is located in the region around the splicing sites, as depicted in Figure 2.6. The following subsection will deal with the subject of circularized introns and BHB motifs. Another mechanism related to splicing which exists in at least several archaeal species is the so called trans-splicing mechanism. Here, two or more RNA molecules first ligate to each other before the splicing mechanism is executed, as shown in Figure 2.7. In this way, different RNA-halves can be joined to form several combinations and so lead 15

24 2. Biological Background Figure 2.7.: Cis-splicing is the process where an intron is spliced out of an RNA molecule formed by a single continuous RNA sequence. For trans-splicing, two or more RNA (sub)sequences are ligated to form a further distinct RNA structure. Then the intron is spliced out such that the it contains subsequences of former distinct RNA sequences [25]. to diverse RNA molecules. This seems to be a common practice for organisms with a relative small genome and is mostly studied for trna molecules, also called split transfer RNA [25, 26, 27, 28, 29, 40]. In total, most of the archaeal splicing mechanism was discovered based on the splicing of trna molecules. A good summary and comparison between archaeal and eukaryotic can be found in the work by Yoshihisa in 2014 [42] Circularization Circularized RNAs can be found in all domains, including Eukarya, Bacteria, Archaea and Viruses. For the majority of circularized RNAs it is assumed that they are functional even if their function was not found yet [65]. However, circularized RNAs seem to occur very rarely in nature [16]. It is known that many circularized RNAs are a product of the splicing procedure, but appear also as genomes (viruses) or permuted trnas [40]. In Archaea it was thought that circularized RNA molecules are a product of every splicing process which is caused by the BHB element as a recognition element in the secondary structure of the pre-rna [36]. But in recent studies, circularized RNA molecules without evidence for a BHB element were found [1]. So BHB dependent splicing is not the only splicing mechanism existent in Archaea Circularized introns Even though circularization of RNA molecules also occurs in viral genomes and other contexts, this work focuses on circularized RNA molecules which are product of a splicing procedure. In Archaea it is thought that all splicing mechanisms have a circularized intron as an output since this increases stability, being essential when living in extreme environments where most of the archaeal species can be found in (see Subsection 2.1). Not much is known about the splicing process and the ligation thereafter in Archaea. In Brooks et al. the structure for a DNA and a putative RNA ligase was revealed [66]. Here, they could not detect the substrate of the ligase but it seems to be able to also 16

25 2.3. Circularization ligate double stranded structure, which occurs in secondary structure conformations of RNA molecules. Circularized introns can be found in RNA-Seq data based on the splicing sites and reads that do not map linearly on the reference genome. In Chapter 3, finding circularized introns is explained in more detail. Figure 2.8.: Circularization mechanism for ribosomal RNAs in two archaeal species (taken from [67]). Bold lines indicate the mature rrnas and thin lines indicate precursor sequences. Red arrows indicate the splice sites inside the bulges of the BHB elements. After the BHB-dependent splicing process the intron and the exon are further processed, indicated by purple arrows Examples of circular RNA in Archaea Based on numerous examples, it is assumed that most of the splicing processes in Archaea result in a circularized intron. For many cases, the functions for either the exon or the intron is known. In only a single case, namely the box C/D snorna processed from a long intron of the trna-trp precursor in Pyrococcus species [35] the function of the exon as well as the intron is known. For other archaeal box C/D snornas, it is known that most of them are circularized and are located in an intronic region [68]. Typical examples for circularized introns in Archaea are trna introns since pre-trnas undergo a splicing mechanism where at least one but up to three introns are spliced out. Then the ends of the exon, thus the trna molecule, and the ends of the intron are ligated together such that the intron is circularized and the exon can fold into the typical cloverleaf structure of a trna [36, 38, 39, 40, 41, 16, 69]. Figure 2.8 shows the splicing procedure of ribosomal RNAs in two archaeal species. 17

26 2. Biological Background Here, the splicing and circularization of the pre-rna is one step in the maturation process of the ribosomal RNA. Both introns are spliced in a BHB-dependent manner and result in circularized introns containing the RNA sequence. A further splicing step reveals the mature rrna molecules [67] Bulge-Helix-Bulge (BHB) elements The Bulge-Helix-Bulge (BHB) element is mainly formed by its secondary structure [36]. As of now, no common sequence motifs in sequences with a BHB element could be found yet [1]. Recent studies showed that most trna sequences having a circularized intron are also spliced based on the BHB dependent mechanism [28, 42, 49]. Even though, regarding certain splicing events, no BHB elements have been found which leads to the proposal for a second splicing mechanism being expressed in Archaea [1]. Figure 2.9.: Different trnas with BHB elements in S. tokodaii (taken from [36]). The pretrna-lys (A) forms an hbhbh motif at the canonical position. The splicing sites are indicated by the black arrows inside the bulges. After splicing, a 25-nt intron and the mature trna can be found. For cases B and C, splicing also occurs at the canonical position but the BHB element is expressed only as hbh element. Black arrows indicate the splice sites in the bulges whereas the bulge next to the anticodon lies inside the hairpin loop and thus, cannot be detected by looking at the molecule s secondary structure annotation. The anticodon is indicated by a frame around its three nucleotides. 18

27 2.4. Bulge-Helix-Bulge (BHB) elements Structure of BHB elements As depicted in Figure 2.6, the BHB motif consists of two bulges enclosing a short helix. The splice junctions between the exon and the intron are inside the bulges. After the splicing process, both ends of the intron and the exon are ligated again such that, in this case, a trna molecule and a circularized intron are formed [16]. Marck and Grosjean give a more detailed view on the BHB element [36]. Here, as shown in Figure 2.9, it can be seen that in many cases two more helices are formed. One of the helices lies in the exon, the other one can be found lying entirely in the intron. Due to this reason, Marck and Grosjean proposed to rename the BHB element and call it hbhbh element whereas h and h are the outer helices in intron and exon, B are the bulges and H is the main helix between the bulges [36]. However, in their paper, Marck and Grosjean recognize that there exist BHB elements which miss one of the outer helices such that elements as hbhb or BHBh are formed [36]. Furthermore, the secondary structure part which belongs to a BHB element is short and very flexible in sequence lengths. It is assumed that the bulges mostly have a length of 3 to 4 nucleotides but the helices lengths may vary between 3 to 10 nucleotides [1]. Recently, BHB elements with just one bulge, so either HBh or hbh, were found, which increases their flexibility even more [28]. Figure 2.10.: Comparison of intron positions in archaeal and eukaryotic trnas (taken from [42]). The colors encode the positions in certain groups of organisms whereas the canonical position can be found in most of the archaeal and eukaryotic species. For some cases, several introns can be found in a single pre-trna, including the canonical position and one to three introns at non-canonical positions. 19

28 2. Biological Background Location of BHB elements The location of BHB elements and thus, the splicing sites in archaeal genomes are mainly known for trnas. Even though it is known that splicing and BHB elements exist in other kinds of archaeal RNA [1] more effort has to be invested in characterizing these splicing sites as well as intron and exon structures. At the same time, it was proposed that there might be another recognition element or responsible mechanism for the splicing procedure than the BHB element [1]. It is probable that all of the trna molecules are spliced based on the BHB dependent mechanism [28, 42, 49, 41]. The trna itself is encoded in the exon. The intron is spliced out but nothing is known about its function yet. Just in one case it was announced that a box C/D snorna is encoded in the trna intron [42]. Also in Eukaryotes, trna splicing is known. Here, most of the introns are spliced out at the canonical position, meaning between nucleotide 37 and 38 of the trna sequence. Even though, other non-canonical splicing positions were found. In Archaea, splicing at canonical and non-canonical positions exist, but a lot more splicing events at noncanonical positions were discovered. This is depicted in Figure Splicing events at the canonical position favor the full BHB element, thus the element expressed by hbhbh [42, 28]. At non-canonical positions all variations of BHB elements are found, including half BHB elements, thus shapes as hbh or HBh [42, 28]. Further studies about trnas showed that there exist trna molecules with more than one intron [39] and split trna or permuted trna [40] or even combinations thereof [42]. Split trnas are those trnas participating in trans splicing events. Here, two or more trna-halves are joined together and then spliced. This process was first discovered in Nanoarchaeum equitans [25]. Permuted trnas which exist with or without intron form a short loop bridging the 3 and 5 ends and in this way, the 3 half is positioned upstream of the 5 half [42]. All these types of trnas are proven to form mature and functional trnas in archaeal organisms. Regarding the box C/D snornas in Archaea, it is known, that these RNAs are encoded inside the intron. In this case, the function of the exon is not known [42]. C/D box snornas are found in intergenic regions in the archaeal genomes, sometimes at 3 or 5 ends [17]. These RNA molecules are known to be circularized in the organism [34] and several show evidence for a BHB secondary structure element being responsible for the splicing process [1]. 20

29 3. Computational Methods BHB elements serve as recognition elements for enzymatic splicing. Methods to find known and new splicing sites flanked by BHB elements include mapping and analyzing of RNA-Seq data using segemehl [4, 5] as well as building covariance models with Infernal [3]. In this chapter the methods and programs which were used to search for putative BHB elements are described. Figure 3.1.: The order of start and end position of the circularized intron will differ in the reads and in the reference genome (taken from [16]). A read mapping in a non-linear way to the reference genome is an indication for a circularized RNA Mapping Recent technologies such as RNA sequencing protocols make it possible to sequence the transcriptome of a cell in a high-throughput manner. As the sequenced transcriptome will include numerous different kinds of RNA molecules, analyzing RNA-Seq data will reveal new insights into the transcriptome [18]. During the splicing process a pre-rna molecule is cut into at least two distinct RNA 21

30 3. Computational Methods molecules. These can also be detected in the transcriptome. Thus, the data also includes numerous circularized introns. Further RNase P treatment of the sample will digest all linear RNA molecules and so enrich the relative amount of circularized RNAs [16]. The reads are then sequenced. The resulting data set is analyzed with mapping protocols which assemble the reads or directly map them onto a reference genome. Here, segemehl [4, 5] was used as the mapping program since it provides functionality to find split reads using the --split option when running the program. The search for split reads is complicated by the fact that circular RNAs are sequenced in a way that the read will have a start and an end point which is probably different from the start and end points of the sequence region in the reference genome, namely the splice junctions, as depicted in Fig Additionally, the circularized RNA is not expressed in one single read but can be rejoined using several reads. In this way, circularized RNAs can be found by comparing the reads to the reference genome. If the assembled reads do not map linearly to the reference genome, but map in a non linear way, then this might be a split read belonging to a circularized RNA molecule [16], Fig After the mapping a remapping step follows using the program lack [70]. The remapping is done based on the splicing sites. In this way, multiple splicing sites can be found and previously unmapped reads might be mapped afterwards increasing the total number of mapped reads. Figure 3.2.: Reads mapping onto a circularized introns can be detected if the mapping can be done in a non linear way (taken from [16]). If several reads can be mapped in this way, a splice junctions is assumed to be flanked by the reads. A third program of the segemehl suite transrealign [4, 5] is used to extract the split reads including the length of the intron, information about circularization or multiple splicing sites and putative errors. To detect splice junctions, pairs of reads which probably flank a splice junction are connected internally by a link. The links are labeled by the corresponding read coverage. Due to multiple splicing, repeats or errors, a single read can be connected to several other reads. This is shown in Fig. 22

31 3.1. Mapping chr Pos A Pos B splits:x:y:z:c:p 0 + Table 3.1.: Reads A and B flank a splice junction on chromosome chr, whereas Pos A is the rightmost position in A and Pos B the leftmost position in B being the start and end position of the splice junction. The variables x, y and z stand for read coverages. Here, transrealign outputs a number x of reads covering the splice junction in total, so both splicing sites. The number y of reads is the coverage of just the left splicing site, z of just the right splicing site. After the numbers for read coverages, two variables are reported which indicate if a splice junction is a circularized intron (C) or a normal one (N). The letter P stands for passed. It is possible that one read can participate in several splice junctions with distinct partners. Here, either a multiple (M) splice junction is reported instead of a circular or normal one or in case of too many different read pairs, no clear splice junctions can be found. Then, the letter F for failed is reported instead of the letter P. Fig. 3.3 shows several possible splice junctions between distinct pairs of reads. The number 0 usually indicates a score, which is not used here. The sign + indicates the strand where the splice junction is located Assume, reads A and B flank a splice junction. However, the program also predicts reads A and B, A and B as well as A and B as potential read pairs flanking further splice junctions. The most probable splice junction can be found by taking into account the numbers x, y and z which indicate the read coverage for the subsequences. Depending on their relation and total values the most probable splice junction can be detected. The output pattern of transrealign for each splice junction and a more detailed explanation is given in Fig 3.1. A' A y' y x A'B x x x AB' AB'' z z' z'' B B' B'' Figure 3.3.: Concept of segemehl and transrealign when looking at the splice junctions. A splice junction between A and B is reported (bold lines). Thus, the numbers x, y and z indicating the coverage of both splice junctions, just the left one and just the right one, are included in the output. The numbers x, y ad z possibly differ strongly, depending on how many distinct read pairs flanking splice junctions were found (depicted by dashed lines). 23

32 3. Computational Methods consensus structure: guide tree: 2 3 BEGL 5 4 MATP MATP 7 13 MATR MATP MATL 10 MATL 11 MATL 12 MATL 13 END 14 ROOT 1 MATL 2 MATL 3 BIF 4 BEGR MATL MATP MATP MATL MATP MATL MATL MATL 23 END 24 Figure 3.4.: Consensus secondary structure used to build a covariance model. The guide tree is build based on the consensus secondary structure and forms the underlying data structure for the covariance model (taken from [71]). The nodes in the tree stand for different states. MATP stands for pair, MATL and MATR for the single sequence on the left or right side, BIF means bifurcation, ROOT indicates the root node of the tree, BEGL and BEGR show the beginning of the secondary structure after a bifurcation node on the left or the right side and END indicates a terminal node in the grammar, thus the end of the subsequence parsed in this branch of the guide tree Covariance Models BHB elements are quite flexible in length and distribution of the secondary structure elements. Overall however, their length is limited to a relatively small number of nucleotides and base pairs. Additionally, no sequence motifs were found meaning that only the secondary structure is conserved. However, to find homologous sequences, searching for secondary structure and sequence similarities at the same time seems to be most efficient. Regarding the flexibility of BHB elements, the program Infernal [71] was chosen for their identification. Infernal is based on covariance models (CM), a special case of stochastic context-free grammars (SCFG). Context-free grammars (CFG) are able to recognize palindrome-like structures such as RNA secondary structures. In this way, a CFG can evaluate if a secondary structures for an RNA sequence is valid. In the stochastic CFG, state transitions are used to model how probable certain transitions are and emission probabilities model the possible emissions of this state. This is similar to Hidden Markov Models (HMM). Both probabilities are position dependent parameters and depend on the state. To check 24

33 3.2. Covariance Models for validity of a secondary structure, a parse tree is built. The nodes in the parse tree model states such as pair, single strand left or right, deletion, bifurcation, start and end. There exist an exponential number of parse trees for one nucleotide sequence and this problem is solved in polynomial time via dynamic programming [72, 73]. A covariance model is built based on a multiple sequence alignment with consensus secondary structure annotation. Using several additional rules, a unique parse tree for the consensus secondary structure is built. This parse tree is called a guide tree. The nodes in a guide tree have eight distinct types of nodes, similar to those of the general parse tree. An RNA secondary structure and the corresponding guide tree can be seen in Fig Based on the guide tree, a covariance model is built by extending each emitting state (node) of the guide tree, thus excluding bifurcation and end state (see Fig. 3.5). As in a multiple sequence alignment not every sequence perfectly fits to the consensus, insertions and deletions are included to the model. In each state of the guide tree, several substates are modelled, such as a match state, an insertion state and a deletion state. Transitions with transition probabilities as well as emissions and emission probabilities are added, too [72, 73]. "split set" inserts "split set" inserts "split set" insert MP 12 ML 13 MR 14 D 15 IL 16 IR 17 MP 18 ML 19 MR 20 D 21 IL 22 IR 23 MR 24 D 25 MATP 6 MATP 7 MATR 8 Figure 3.5.: Part of the inner structure of a CM (taken from [74]). MATP6, MATP7 and MATR8 are three nodes of the guide tree indicating two matched pairs and one single sequence on the right side. Inside each node are several states e.g. a match state, a deletion state and states for single strand on each side, where split set denotes that exactly one of these states has to be taken. If an insertion is possible an insertion state is linked in between whereas the insertion can happen on the left or the right side. Links between states are labeled with transition probabilities. In this way, the whole guide tree is extended to become a covariance model. IR S 1 IL 2 IR 3 ML 4 D 5 IL 6 ML 7 D 8 IL 9 B 10 S 11 MP 12 ML 13 MR 14 D 15 IL 16 IR 17 MP 18 ML 19 MR 20 D 21 IL 22 IR 23 MR 24 D 25 IR 26 MP 27 ML 28 MR 29 D 30 IL 31 IR 32 ML 33 D 34 IL 35 ML 36 D 37 IL 38 ML 39 D 40 IL 41 ML 42 D 43 IL 44 E 45 S 46 IL 47 ML 48 D 49 IL 50 MP 51 ML 52 MR 53 D 54 IL 55 IR 56 MP 57 ML 58 MR 59 D 60 IL 61 IR 62 ML 63 D 64 IL 65 MP 66 ML 67 MR 68 D 69 IL 70 IR 71 ML 72 D 73 IL 74 ML 75 D 76 ROOT 1 MATL 2 MATL 3 BIF 4 BEGL 5 MATP 6 MATP 7 MATR 8 MATP 9 MATL 10 MATL 11 MATL 12 MATL 13 END 14 BEGR 15 MATL 16 MATP 17 MATP 18 MATL 19 MATP 20 MATL 21 MATL 22

34 3. Computational Methods To build a covariance model out of a multiple sequence alignment, the program cmbuild is used which is part of the Infernal suite. A CM is calibrated using cmcalibrate which applies the CM to sampled sequences and calculates probabilities for finding a certain type of secondary structure. This is important to be able to output e-values when using Infernal on a test set [71]. The original multiple sequence alignments, can be extended using cmalign. The program tries to align a given sequence based on the consensus secondary structure to the multiple sequence alignment which was used to build the CM. To scan whole genomes cmsearch is used, while for sequence databases cmscan is applied [71] Pattern Search For BHB elements, the conservation of the secondary structure motif is known, but by now, no sequence conservation patterns or other small motifs were found. In addition to circularized sequences with BHB element, also sequences were found which do not show evidence for a BHB motif even though they occur next to a circularization junction [1]. To find a putative splicing recognition element, MEME [75] and MEME-SP [76] were used to search for distinct small secondary structure motifs or sequence patterns. The program MEME searches for conserved sequence motifs. The search is based on position-dependent probabilities for each letter. The probabilities are stored in a matrix. The patterns are gap free, therefore, if gaps exist in the input sequence, MEME splits the sequences in several parts and thus, as well the motifs. As an output MEME shows the motifs which were found as well as the number of sequences, mismatches and locations [75]. Fig. 3.6 shows a typical box C/D snorna motif found with MEME. Figure 3.6.: Typical sequence motif for a box C/D snorna with C box motif TGATGA and D box motif CNGA [75]. Box C and box D motifs can be very similar to C and D box motifs but are less conserved and thus, show more variance in the sequence motifs. To search for secondary structure motifs, MEME-SP [76] was used. MEME-SP uses the same expectation maximization framework as MEME, but learns from sequences that, 26

35 3.3. Pattern Search in addition, are annotated with secondary structure profiles. These specify, for each position of each input sequence, the probability that the base is contained in a stem, an interior loop, or a bulge. This secondary structure information is computed for all locally stable suboptimal structures that can be formed, as hybridization structures form the sequence surrounding the circularization sites. The suboptimal hybridization structures are computed by RNAduplex [77] with parameter -e 5 limiting the energy band of interest to the 5 kcal/mol above the most stable structure. 27

36

37 4. Workflow BHB elements were searched by looking at secondary structure motifs around circularized introns as well as identifying new srnas due to the detection of splicing junctions flanked with BHB elements. In this chapter, the steps done during the analysis are described. The workflow is sketched in Fig. 4.1, including data sets, methods and the evaluation step. The computational methods are described in more detail in Chap. 3. The results recovered during the evaluation can be found in the following chapter, Chap. 5. This chapter is an extended version of the methodological section in Berkemer et al. [2]. Figure 4.1.: Workflow of the genome-wide survey for BHB elements and circular or spliced RNA. From a cache of known circular RNA sequences as well as known box C/D snornas two curated multiple-sequence alignments (MSA1 and MSA2) were created that serve as the basis for a stochastic Infernal covariance model. These are evaluated against RNA-seq data and putative BHB elements found in several of archaeal genomes. Known circularized RNA without BHB elements were further subjected to sequence and structure motif discovery using MEME and MEME-SP. 29

38 4. Workflow 4.1. Known and Putative BHB Elements BHB elements have been described extensively for trnas [36] and rrnas. The alignments were extended with BHB elements from ref. [36] and with trna associated BHB elements from ref. [34]. Most archaeal trnas, including the spliced ones, but with the notable exception of split and permuted trnas are readily identified by trnascan-se [78]. Therefore trnascan-se -A was used for all genomes to complete trna data. A multiple sequence alignment, MSA1, was manually built from sequences with known BHB elements. It emphasizes the well-conserved secondary structure pattern, Figure 4.2(top). To investigate the box C/D snornas a second alignment was constructed, MSA2, from 15 Nanoarchaeum equitans snorna sequences with unambiguous box C and box D motifs annotated in ref. [68]; the construction of the alignment was done in a semi-automatic fashion. Since these short RNAs (length nt) exist as circularized molecules, putative BHB elements similar to those of trna introns should be detectable overlapping both circularizing junctions if they are indeed processed in the same manner as trnas. Therefore 15 nt flanking sequence was included, Figure 4.2(bottom). In addition the known box C/D snorna sequences of Sulfolobus solfataricus [16, 79], Sulfolobus acidocaldarius [80, 16], Nanoarchaeum equitans [68], and Methanopyrus kandleri [34] were retrieved together with 15 nt flanking sequence. For the construction of MSA2 LocARNA [81] was used. This tool simultaneously infers an alignment and a consensus secondary structure from a set of unaligned RNA sequences taking into account sequence similarity, structure similarity, and thermodynamics [82]. To include further knowledge, in the case C and D boxes as well as the kink turn motifs of the snornas, a constrained LocARNA alignment was generated anchored at annotated columns of an initial manual alignment Circular transcripts Known circular srnas were compiled from Sulfolobus solfataricus [16, 79], Sulfolobus acidocaldarius [80, 16], Nanoarchaeum equitans [68], and Methanopyrus kandleri [34]. Publicly available RNA-seq data of S. acidocaldarius [83, 84], Nanoarchaeum equitans and Ignicoccus hospitalis [68], Sulfolobus solfataricus [79] and M. kandleri [34] were mapped and analyzed as outlined in [18]. In brief, read data sets for each species were pooled, the reads were quality-trimmed and then mapped to the corresponding genome using segemehl [4, 5] with the option --splits that forces reads that do not acceptably match as a single substring to be split and matched across splice sites. It was required that a seed of at least 11 nt maps to each side of the split. Aiming to find putative splicing sites, a remapping was done using lack [70], another program provided by the segemehl suite. The statistics of the mappings are summarized in Table 4.1. The transrealign tool, a component of the segemehl suite [5], was used to identify the circularizing splice junctions. A circularization site was retained if transrealign reported a backsplicing junction and a threshold of 5 reads covering the splice junction. 30

39 4.2. Circular transcripts Figure 4.2.: Consensus multiple alignment of BHB structural elements. MSA1 (top) is built from trnas with introns from [34] (block A) and [36] (block B). MSA2 below comprises box C/D snornas from N. equitans [68]. The two cleavage sites in the bulges (labeled B1 and B2 and marked by BBB at the bottom) are indicated by arrows. Base pairs are denoted by [,]. The three helical regions are highlighted and labelled H, h, and g, resp. 31

something about srna in archaea

something about srna in archaea something about srna in archaea or: Processed Small RNAs in Archaea and BHB Elements Sarah Berkemer Bioinformatics Vienzig Archaea? Sarah Berkemer (Bioinformatics Vienzig) BHB elements in Archaea 2 / 23

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Introduction to molecular biology. Mitesh Shrestha

Introduction to molecular biology. Mitesh Shrestha Introduction to molecular biology Mitesh Shrestha Molecular biology: definition Molecular biology is the study of molecular underpinnings of the process of replication, transcription and translation of

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Biology 112 Practice Midterm Questions

Biology 112 Practice Midterm Questions Biology 112 Practice Midterm Questions 1. Identify which statement is true or false I. Bacterial cell walls prevent osmotic lysis II. All bacterial cell walls contain an LPS layer III. In a Gram stain,

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Archaeal Cell Structure. Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display.

Archaeal Cell Structure. Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 4 Archaeal Cell Structure Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 4.1 A typical Archaeal Cell 2 Archaea Highly diverse with respect to morphology,

More information

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name. Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

SNORNAS HOMOLOGY SEARCH

SNORNAS HOMOLOGY SEARCH S HOMOLOGY SEARCH Stephanie Kehr Bioinformatics University of Leipzig Herbstseminar, 2009 MOTIVATION one of most abundand group of ncrna in eucaryotic cells suprisingly diverse regulating functions SNORNAS

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Archaea Ancient Oddities

Archaea Ancient Oddities Archaea Ancient Oddities Death in Yellowstone If the waters of Yellowstone are so deadly, can anything survive in them? Animal Bones in hot spring Dead Trees in Mammoth Hot Springs Yes, but how? A History

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Bio Microbiology - Spring 2010 Study Guide 18

Bio Microbiology - Spring 2010 Study Guide 18 Bio 230 - Microbiology - Spring 2010 Study Guide 18 Archaea Kingdom Crenarchaeota: mainly hyperthermophiles Kingdom Euryarchaeota: methanogens, halophiles, Thermoplasma & Archaeoglobus Kingdom Korarchaeota:

More information

Dr Mike Dyall-Smith. Archaea: Main points. Archaea: Discovery. Archaea: Discovery. Discovery of the Archaea. Lecture: Archaeal diversity

Dr Mike Dyall-Smith. Archaea: Main points. Archaea: Discovery. Archaea: Discovery. Discovery of the Archaea. Lecture: Archaeal diversity Lecture: Archaeal diversity Dr Mike Dyall-Smith Haloarchaea Research Lab., Lab 3.07 mlds@unimelb.edu.au Reference: Microbiology (Prescott et al., 6th). Chapter 20. Archaea: Main points Discovery of a third

More information

15.2 Prokaryotic Transcription *

15.2 Prokaryotic Transcription * OpenStax-CNX module: m52697 1 15.2 Prokaryotic Transcription * Shannon McDermott Based on Prokaryotic Transcription by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes

Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes 9 The Nucleus Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes Explain general structures of Nuclear Envelope, Nuclear Lamina, Nuclear Pore Complex Explain movement of proteins

More information

BIOLOGY STANDARDS BASED RUBRIC

BIOLOGY STANDARDS BASED RUBRIC BIOLOGY STANDARDS BASED RUBRIC STUDENTS WILL UNDERSTAND THAT THE FUNDAMENTAL PROCESSES OF ALL LIVING THINGS DEPEND ON A VARIETY OF SPECIALIZED CELL STRUCTURES AND CHEMICAL PROCESSES. First Semester Benchmarks:

More information

I. Archaeal cell structure. (Chap 2 pg , Supplemental notes 3, 5)

I. Archaeal cell structure. (Chap 2 pg , Supplemental notes 3, 5) Thurs, Jan 23, 2003 I. Archaeal cell structure. (Chap 2 pg. 450-453, Supplemental notes 3, 5) The Archaea are a diverse group of prokaryotic organisms that are very different from bacteria and from eucaryotes.

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

Translation Part 2 of Protein Synthesis

Translation Part 2 of Protein Synthesis Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation

More information

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES.

SPECIES OF ARCHAEA ARE MORE CLOSELY RELATED TO EUKARYOTES THAN ARE SPECIES OF PROKARYOTES. THE TERMS RUN AND TUMBLE ARE GENERALLY ASSOCIATED WITH A) cell wall fluidity. B) cell membrane structures. C) taxic movements of the cell. D) clustering properties of certain rod-shaped bacteria. A MAJOR

More information

Outline. Viruses, Bacteria, and Archaea. Viruses Structure Classification Reproduction Prokaryotes Structure Reproduction Nutrition Bacteria Archaea

Outline. Viruses, Bacteria, and Archaea. Viruses Structure Classification Reproduction Prokaryotes Structure Reproduction Nutrition Bacteria Archaea Viruses, Bacteria, and Archaea Chapter 21 Viruses Structure Classification Reproduction Prokaryotes Structure Reproduction Nutrition Bacteria Archaea Outline The Viruses The Viruses Viruses are noncellular

More information

Biology I Fall Semester Exam Review 2014

Biology I Fall Semester Exam Review 2014 Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,

More information

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell. RNAs L.Os. Know the different types of RNA & their relative concentration Know the structure of each RNA Understand their functions Know their locations in the cell Understand the differences between prokaryotic

More information

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2 Cellular Neuroanatomy I The Prototypical Neuron: Soma Reading: BCP Chapter 2 Functional Unit of the Nervous System The functional unit of the nervous system is the neuron. Neurons are cells specialized

More information

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Number of questions TEK (Learning Target) Biomolecules & Enzymes Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

The nature of genomes. Viral genomes. Prokaryotic genome. Nonliving particle. DNA or RNA. Compact genomes with little spacer DNA

The nature of genomes. Viral genomes. Prokaryotic genome. Nonliving particle. DNA or RNA. Compact genomes with little spacer DNA The nature of genomes Genomics: study of structure and function of genomes Genome size variable, by orders of magnitude number of genes roughly proportional to genome size Plasmids symbiotic DNA molecules,

More information

Classifying Prokaryotes: Eubacteria Plasma Membrane. Ribosomes. Plasmid (DNA) Capsule. Cytoplasm. Outer Membrane DNA. Flagellum.

Classifying Prokaryotes: Eubacteria Plasma Membrane. Ribosomes. Plasmid (DNA) Capsule. Cytoplasm. Outer Membrane DNA. Flagellum. Bacteria The yellow band surrounding this hot spring is sulfur, a waste product of extremophilic prokaryotes, probably of the Domain Archaea, Kingdom Archaebacteria. Bacteria are prokaryotic cells (no

More information

Chapter 17. From Gene to Protein. Biology Kevin Dees

Chapter 17. From Gene to Protein. Biology Kevin Dees Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting

More information

UE Praktikum Bioinformatik

UE Praktikum Bioinformatik UE Praktikum Bioinformatik WS 08/09 University of Vienna 7SK snrna 7SK was discovered as an abundant small nuclear RNA in the mid 70s but a possible function has only recently been suggested. Two independent

More information

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University Microbial Diversity Yuzhen Ye (yye@indiana.edu) I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University Contents Microbial diversity Morphological, structural,

More information

Oceans: the cradle of life? Chapter 5. Cells: a sense of scale. Head of a needle

Oceans: the cradle of life? Chapter 5. Cells: a sense of scale. Head of a needle Oceans: the cradle of life? Highest diversity of life, particularly archae, bacteria, and animals Will start discussion of life in the ocean with prokaryote microorganisms Prokaryotes are also believed

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

DNA Technology, Bacteria, Virus and Meiosis Test REVIEW

DNA Technology, Bacteria, Virus and Meiosis Test REVIEW Be prepared to turn in a completed test review before your test. In addition to the questions below you should be able to make and analyze a plasmid map. Prokaryotic Gene Regulation 1. What is meant by

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes Last Lecture: Bacillus anthracis 1. Introduction 2. History 3. Koch s Postulates Today s Lecture: 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes 3. Phylogenetics I. Basic Cell structure: (Fig.

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

CHAPTER 3. Cell Structure and Genetic Control. Chapter 3 Outline

CHAPTER 3. Cell Structure and Genetic Control. Chapter 3 Outline CHAPTER 3 Cell Structure and Genetic Control Chapter 3 Outline Plasma Membrane Cytoplasm and Its Organelles Cell Nucleus and Gene Expression Protein Synthesis and Secretion DNA Synthesis and Cell Division

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Honors Biology Fall Final Exam Study Guide

Honors Biology Fall Final Exam Study Guide Honors Biology Fall Final Exam Study Guide Helpful Information: Exam has 100 multiple choice questions. Be ready with pencils and a four-function calculator on the day of the test. Review ALL vocabulary,

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA) BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA) http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Quiz answers Kinase: An enzyme

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions

Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions Microbiology - Problem Drill 04: Prokayotic & Eukaryotic Cells - Structures and Functions No. 1 of 10 1. Eukaryote is a word that describes one of two living cell classifications. The word comes from Greek

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Section 7. Junaid Malek, M.D.

Section 7. Junaid Malek, M.D. Section 7 Junaid Malek, M.D. RNA Processing and Nomenclature For the purposes of this class, please do not refer to anything as mrna that has not been completely processed (spliced, capped, tailed) RNAs

More information

Define: Alleles. Define: Chromosome. In DNA and RNA, molecules called bases pair up in certain ways.

Define: Alleles. Define: Chromosome. In DNA and RNA, molecules called bases pair up in certain ways. Alleles Chromosome In DNA and RNA, molecules called bases pair up in certain ways. How do the bases A, C, G, T, and U match up in DNA? How about RNA? Summarize the cell process called protein synthesis!

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Gene Control Mechanisms at Transcription and Translation Levels

Gene Control Mechanisms at Transcription and Translation Levels Gene Control Mechanisms at Transcription and Translation Levels Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 9

More information

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

This is a repository copy of Microbiology: Mind the gaps in cellular evolution. This is a repository copy of Microbiology: Mind the gaps in cellular evolution. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/114978/ Version: Accepted Version Article:

More information

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 REVIEW: Signals that Start and Stop Transcription and Translation BUT, HOW DO CELLS CONTROL WHICH GENES ARE EXPRESSED AND WHEN? First of

More information

Creating a Dichotomous Key

Creating a Dichotomous Key Dichotomous Keys A tool used that allows users to determine the identity of unknown species Keys consist of a series of choices, where the user selects from a series of connected pairs Each pair of choices

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

2011 The Simple Homeschool Simple Days Unit Studies Cells

2011 The Simple Homeschool Simple Days Unit Studies Cells 1 We have a full line of high school biology units and courses at CurrClick and as online courses! Subscribe to our interactive unit study classroom and make science fun and exciting! 2 A cell is a small

More information

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Chapter 19. Microbial Taxonomy

Chapter 19. Microbial Taxonomy Chapter 19 Microbial Taxonomy 12-17-2008 Taxonomy science of biological classification consists of three separate but interrelated parts classification arrangement of organisms into groups (taxa; s.,taxon)

More information

The RNA World and the Origins of Life. B. Balen

The RNA World and the Origins of Life. B. Balen The RNA World and the Origins of Life B. Balen Cell origin and evolution Cell is structural, functional and reproduction unit of life Similarities: DNA genetic material surrounded by membranes the same

More information

Overview of Cells. Prokaryotes vs Eukaryotes The Cell Organelles The Endosymbiotic Theory

Overview of Cells. Prokaryotes vs Eukaryotes The Cell Organelles The Endosymbiotic Theory Overview of Cells Prokaryotes vs Eukaryotes The Cell Organelles The Endosymbiotic Theory Prokaryotic Cells Archaea Bacteria Come in many different shapes and sizes.5 µm 2 µm, up to 60 µm long Have large

More information

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression 13.4 Gene Regulation and Expression THINK ABOUT IT Think of a library filled with how-to books. Would you ever need to use all of those books at the same time? Of course not. Now picture a tiny bacterium

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

CHAPTER : Prokaryotic Genetics

CHAPTER : Prokaryotic Genetics CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering

More information

Chapter 12. Genes: Expression and Regulation

Chapter 12. Genes: Expression and Regulation Chapter 12 Genes: Expression and Regulation 1 DNA Transcription or RNA Synthesis produces three types of RNA trna carries amino acids during protein synthesis rrna component of ribosomes mrna directs protein

More information

Flow of Genetic Information

Flow of Genetic Information presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Regulation of Transcription in Eukaryotes

Regulation of Transcription in Eukaryotes Regulation of Transcription in Eukaryotes Leucine zipper and helix-loop-helix proteins contain DNA-binding domains formed by dimerization of two polypeptide chains. Different members of each family can

More information

Laith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First

Laith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First Laith AL-Mustafa Protein synthesis Nabil Bashir 10\28\2015 http://1drv.ms/1gigdnv 01 First 0 Protein synthesis In previous lectures we started talking about DNA Replication (DNA synthesis) and we covered

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

It s a Small World After All

It s a Small World After All It s a Small World After All Engage: Cities, factories, even your own home is a network of dependent and independent parts that make the whole function properly. Think of another network that has subunits

More information

Comparing Prokaryotic and Eukaryotic Cells

Comparing Prokaryotic and Eukaryotic Cells A prokaryotic cell Basic unit of living organisms is the cell; the smallest unit capable of life. Features found in all cells: Ribosomes Cell Membrane Genetic Material Cytoplasm ATP Energy External Stimuli

More information

Procesamiento Post-transcripcional en eucariotas. Biología Molecular 2009

Procesamiento Post-transcripcional en eucariotas. Biología Molecular 2009 Procesamiento Post-transcripcional en eucariotas Biología Molecular 2009 Figure 6-21 Molecular Biology of the Cell ( Garland Science 2008) Figure 6-22a Molecular Biology of the Cell ( Garland Science 2008)

More information

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes. I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes. A. Chemistry of Life B. Cells 1. Water How do the unique chemical

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Review Article Ribonucleoproteins in Archaeal Pre-rRNA Processing and Modification

Review Article Ribonucleoproteins in Archaeal Pre-rRNA Processing and Modification Archaea Volume 2013, Article ID 614735, 14 pages http://dx.doi.org/10.1155/2013/614735 Review Article Ribonucleoproteins in Archaeal Pre-rRNA Processing and Modification W. S. Vincent Yip, 1 Nicholas G.

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Molecular Biology - Translation of RNA to make Protein *

Molecular Biology - Translation of RNA to make Protein * OpenStax-CNX module: m49485 1 Molecular Biology - Translation of RNA to make Protein * Jerey Mahr Based on Translation by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative

More information

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life The Tree of Life Chapter 26 Origins of Life The Earth formed as a hot mass of molten rock about 4.5 billion years ago (BYA) -As it cooled, chemically-rich oceans were formed from water condensation Life

More information

Bacterial Genetics & Operons

Bacterial Genetics & Operons Bacterial Genetics & Operons The Bacterial Genome Because bacteria have simple genomes, they are used most often in molecular genetics studies Most of what we know about bacterial genetics comes from the

More information

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order

More information

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes Chapter 16 Lecture Concepts Of Genetics Tenth Edition Regulation of Gene Expression in Prokaryotes Chapter Contents 16.1 Prokaryotes Regulate Gene Expression in Response to Environmental Conditions 16.2

More information

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1: Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1: Biochemistry: An Evolving Science Tips on note taking... Remember copies of my lectures are available on my webpage If you forget to print them

More information

MCB 110. "Molecular Biology: Macromolecular Synthesis and Cellular Function" Spring, 2018

MCB 110. Molecular Biology: Macromolecular Synthesis and Cellular Function Spring, 2018 MCB 110 "Molecular Biology: Macromolecular Synthesis and Cellular Function" Spring, 2018 Faculty Instructors: Prof. Jeremy Thorner Prof. Qiang Zhou Prof. Eva Nogales GSIs:!!!! Ms. Samantha Fernandez Mr.

More information