DOOR: a prokaryotic operon database for genome analyses and functional inference

Size: px
Start display at page:

Download "DOOR: a prokaryotic operon database for genome analyses and functional inference"

Transcription

1 Briefings in Bioinformatics, 2017, 1 8 doi: /bib/bbx088 Software Review DOOR: a prokaryotic operon database for genome analyses and functional inference Huansheng Cao*, Qin Ma*, Xin Chen and Ying Xu Corresponding author: Ying Xu, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA. Tel xyn@uga.edu *These authors contributed equally to this work. Abstract The rapid accumulation of fully sequenced prokaryotic genomes provides unprecedented information for biological studies of bacterial and archaeal organisms in a systematic manner. Operons are the basic functional units for conducting such studies. Here, we review an operon database DOOR (the Database of prokaryotic OpeRons) that we have previously developed and continue to update. Currently, the database contains computationally predicted operons in 2072 complete genomes. In addition, the database also contains the following information: (i) transcriptional units for 24 genomes derived using publicly available transcriptomic data; (ii) orthologous gene mapping across genomes; (iii) 6408 cisregulatory motifs for transcriptional factors of some operons for 203 genomes; (iv) Rho-independent terminators for 2072 genomes; as well as (v) a suite of tools in support of applications of the predicted operons. In this review, we will explain how such data are computationally derived and demonstrate how they can be used to derive a wide range of higherlevel information needed for systems biology studies to tackle complex and fundamental biology questions. Key words: DOOR; operon database; transcription unit; genomic organization; gene regulation Introduction The rapid advancement of genome-scale sequencing techniques in the past decade has made it possible to have a microbial genome sequenced in a highly efficient and affordable manner. It has now become routine to have such genomes sequenced by individual labs. To date, 6917 complete genomes and draft genomes have been sequenced and organized into the NCBI Genome database. In addition, 5415 complete prokaryotic genomes and draft genomes are stored at the JGI Integrated Microbial Genome and Microbiome Samples (JGI IMG/M) database ( [1]. Over 175 million genes have been computationally predicted in the sequenced prokaryotic genomes, with 171 million (97.7%) being protein-encoding genes and 4 million (2.3%) RNA genes. These genomic data have enabled studies of a wide range of complex and previously unsolvable questions in a systematic manner, such as inferring and modeling the dynamic behaviors of regulatory networks in prokaryotic organisms. These studies become possible because of the unique property of prokaryotes, functionally closely related genes tend to be organized into operons, i.e. clusters of genes arranged in tandem on the same strand of a genome, which share a common promoter and terminator [2]. It is noteworthy to point out that the concept of operon has evolved since the discovery and characterization of the lac operon in Escherichia coli by French scientists Jacob and Monod in 1960 [2]. Based on the original definition, an extra technical criterion has been added to operon: operons encoded in a genome do not overlap with each other [3, 4]. This simplified definition Huansheng Cao is a Postdoctoral Researcher in the Department of Biochemistry and Molecular Biology and Institute of Bioinformatics at the University of Georgia. His research interests include microbial systems biology. Qin Ma is an Assistant Professor in the Department of Agronomy, Horticulture and Plant Science at South Dakota State University. He is interested in bioinformatic algorithms/tools development in transcriptional regulation. Xin Chen is an Assistant Professor in the Center for Applied Mathematics at Tianjin University, China; his research interests include microbial and plant genome analysis. Ying Xu is the Regents-GRA Eminent Scholar Chair Professor in the Department of Biochemistry and Molecular Biology and Institute of Bioinformatics of the University of Georgia. His interests include microbial and cancer systems biology. Submitted: 2 May 2017; Received (in revised form): 13 June 2017 VC The Author Published by Oxford University Press. All rights reserved. For Permissions, please journals.permissions@oup.com 1

2 2 Cao et al. has enabled computational scientists to predict operons based on genomic sequence(s) alone [5]. Therefore, all the complete prokaryotic genomes have their operons predicted. However, the downside is: these computationally predicted operons only represent a small subset of all the operons encoded in a genome. With the transcriptomic data becoming increasingly available for prokaryotes, researchers have realized that an operon may have different variations in terms of its component genes expressed under different conditions, termed transcriptional units (TUs) [6]. Clearly, full elucidation of all the TUs of an operon is considerably more challenging than sequence-based prediction of operons, as they are condition-dependent. Currently, there are no published algorithms for TU prediction based on genomic sequences alone, as such capability needs to be able to accurately identify promoter terminator pairs that are used under varying conditions. TUs are now being identified through analyses of transcriptomic data collected under a few conditions only for a small number of genomes [7]. The current version of DOOR (the Database of prokaryotic OpeRons) consists of both types of operons: operons predicted based on sequences and limited TUs identified using transcriptomic data [8]. Because of serving as the basic functional and TUs as well as the backbones of TUs, operons are important elements for the study of gene regulation and functional adjustment to different environments. For example, some important pathways tend to be organized into a few transcriptionally co-regulated operons [9, 10]. Through these tightly co-expressed operons, the regulatory motifs can be identified, which in turn will reveal global regulatory circuits. Besides regulation of operons on a genome scale, comparative studies of these functional units can reveal pathway conservation and operon evolution among prokaryotes [11, 12], and provide more accurate prediction of orthologous genes across different prokaryotic genomes [11], compared with gene-centric methods for ortholog mapping. Furthermore, the positions of the operons in a genome and their functional association provide a unique and practical angle to demonstrate the global organizing principle in these organisms [13]. In this article, we review the status of the DOOR database in terms of its basic content, as well as a suite of tools that enable the user to fully use the predicted operons and identified TUs for functional studies of a prokaryotic organism. Some major applications through some of the third-party work that has used DOOR are summarized. On this basis of DOOR and DOORsupporting tools, we will introduce operon regulation, evolution and organization on local and global scale in prokaryotes. The entire contents covered are illustrated in Figure 1. The DOOR database Operon databases and DOOR Owing to this fundamental importance, several operon databases have been developed and deployed for public service, including OperonDB [14], ProOpDB [15], ODB [16], rrndb [17] and DOOR [8, 18]. The first DOOR database was developed in Its operon prediction algorithm looks for a maximal sequence of adjacent genes on the same DNA strand without disruption on the opposite strand such that each pair of the adjacent genes has conserved neighboring relationship in some other genomes; their intergenic distances are within some threshold; and are functionally related [8, 18]. This simple working definition of operon has made the prediction both reliable and stable. Specifically, the core of the algorithm is a classification method. The program classifies each pair of adjacent genes into two classes: in or not in the same operon, using five features: the intergenic distance, conservation level of the two genes in the same neighborhood across other genomes, functional relatedness measured using phylogenetic distances between the two genes, the ratio between the lengths of the two genes and frequencies of certain predefined DNA motifs in their intergenic region [5]. Among these features, the intergenic distance is the highest discerning feature in predicting if a pair of adjacent genes is in the same operon. The training of the classifier was done on experimentally validated operons in a few organisms, including E. coli and Bacillus subtilis. The operon prediction pipeline is given in Figure 2. Two separate classifiers are trained and used depending on whether the target genome has a substantial number of experimentally identified operons. For those with such identified operons, a nonlinear decision tree-based classifier is used to make the prediction. When tested on B. subtilis and E. coli genomes, the prediction program achieved accuracy levels at 90.2 and 93.7%, respectively [5]. For genomes without such operons, a linear logistic classifier is found to be most reliable, with the minimum false-positive rates [5]. When tested on E. coli and B. subtilis without applying their operon information, this predictor has prediction accuracies at 84.6 and 83.3%, respectively [7]. The prediction program of DOOR was ranked as the best operon predictor by an independent review in 2013 [19] in terms of prediction accuracy [18]. The first version of DOOR had operons for 675 complete prokaryotic genomes, totaling operons, predicted using the above algorithms. The DOOR database supports the following tools to enable a user to use its operons for functional studies: (i) a search tool allowing a user to find operons by gene names and operon size in desired genomes; (ii) a prediction capability for cis-regulatory binding sites using a combination of two predictors MEME [20] and CUBIC [21], which enables a user to search for potentially co-regulated operons in a genome; (iii) general statistics about operons encoded in each genome along with the relevant literature; and (iv) an operonwiki to facilitate interactions between a user and the developer [18]. DOOR2: more genomes and transcriptomic data With the rapid increase in the complete prokaryotic genomes and the growing popularity of DOOR, we developed an updated version of the database, DOOR2, in 2013 [8]. Two major changes are made in this version. First, the number of genomes is increased to 2072, three times of the original size, which consists of 1939 complete bacterial genomes and 133 complete archaeal genomes, covering 2205 chromosomes and 1645 plasmids [8]. For these genomes, multigene operons is predicted, averaging 583 such operons per chromosome and 24 operons per plasmid, along with single-gene operons. In addition, the database has 6408 verified cis-regulatory binding motifs for transcription factors spanning 203 genomes, and Rho-independent terminators for 2072 genomes. Furthermore, the database has pairs of conserved operons across different genomes. Functionally, housekeeping biological processes such as cell division, translation, electron transport chain for aerobic/anaerobic respiration, ATP production and thiamine biosynthesis have the highest percentage of conserved operons, as exemplified in Figure 3. Given the increasing availability of RNA-seq data for complete genomes, DOOR2 also includes such data from the public domain, which can be used to elucidate TUs encoded in a

3 DOOR 3 Figure 1. The DOOR operon database and information derivable from the database. (A) A schematic for consecutive operons in two genomes (blue and magenta), which are further grouped into uber-operons. (B) Predicted cis-regulatory motifs for operons. (C) TUs for DOOR operons, derived based on available transcriptomic data. (D) Co-expressed genes and operons predicted through bi-clustering analyses. (E) New knowledge regarding an organizing principle that governs the global arrangement of operons in a genome. Figure 2. The computational pipeline for operon prediction in DOOR. genome. Currently, DOOR2 contains predicted TUs for 24 genomes, which we expect to increase significantly in the next release of the database. Like the previous version, DOOR2 provides the following tools in support of basic and advanced applications of its operon information: (i) a predictor for TUs based on given operons and RNA-seq data of a specified genome, in support of inference of the dynamic variants of the operons under specified conditions; (ii) a computer program for mapping orthologous genes across genomes, in support of analyses of operon evolution and biological pathway mapping across genomes; (iii) a search capability, developed using MySQL, to enable a user to retrieve information stored in the database, including operons, TUs, cis-regulatory motifs and terminators for individual operons and conserved operons across multiple genomes using queries containing terms such as species name, operon ID and/or gene name; (iv) operon predictors as described in the previous section, which allows a user to predict operons in a genome that is not currently in the DOOR2 database; and (v) a genome browser is also developed to support visualization of selected operons along with TU structures under multiple conditions, if such data are available [8]. For operon prediction in a specified genome, DOOR2 requires three input files: a file containing gene locations in the target genome (*.gff or *.gtf file), protein sequences (*.faa file) and the nucleotide sequence (*.fna file) of the entire genome, where the three files are used to derive the genomic features needed to train our classification model (Figure 2). For output, DOOR2 automatically sends an to the user once the computing job is done, which contains a link to the results page, containing various information as shown in Figure 4. DOOR2 has been widely applied by researchers in a variety of areas. Overall, three types of technical applications have been frequently used by users of the database and tools: operon data retrieval, operon prediction for newly sequenced genomes and utility tools for new information discovery (Figure 5). These applications also span a wide range of research areas. For example, a specific application area is in biomedical research. In this regard, DOOR is usually used for prediction or retrieval for operons involved in virulence [22] or regulation [23]. Besides these individual third-party applications, we introduce the general applications of our DOOR-based tools in further deriving biology knowledge from available prokaryotic genomes in the following sections. From genomic structures to functional inference With the rapid accumulation of large quantities of gene expression data generated for various prokaryotes, researchers can study the functionalities of the predicted operons in a systematic manner. Here, we highlight a few examples to demonstrate how functional analyses can be done through integrated

4 4 Cao et al. Figure 3. The main biological functions of conserved operons in E. coli. The scatterplot shows the cluster representatives based on the GO terms functional similarities, where each bubble represents a GO Biological Function and overlapping bubbles are for overlapping GO functions (e.g. the electron transport chain for aerobic/anaerobic respiration). The size of a bubble represents the level of a term in the GO s functional hierarchy, specifically the higher a term in the hierarchy, the larger the bubble, and the color of a bubble indicates the P-value. analyses of operons and gene expression data, including TU prediction, cis-regulatory motif prediction and analysis and co-expression analyses of operons (Figure 1B E). TU prediction based on gene expression data We have developed a program, SeqTU, for predicting TUs based on provided RNA-seq data (Figure 1B and C) [6]. SeqTU predicts the genomic boundaries of TUs based on two features measuring the RNA-seq expression patterns across a target genome: expression-level continuity and variance. Intuitively, two consecutive genes in the same TU should not (i) have substantial gaps in the expression profile across the intergenic region between the two genes, based on the definition of a TU; and (ii) have large variations in their gene expression levels across the two coding regions. SeqTU predicts a pair of consecutive genes on the same genomic strand to be in the same TU if both conditions are met [6]. A total of 2590 TUs have been predicted based on available RNA-seq using SeqTU [6, 7] and stored in DOOR2. In total of 44% of these TUs have multiple genes. Figure 4B shows one example of multiple TUs of the same E. coli operon. SeqTU has a training program that allows users to train an organism-specific TU predictor based on user-provided gene expression data. The software is now part of the DOOR2 package [8]. Prediction and application of cis-regulatory motifs We developed a computational algorithm BOBRO for the prediction of cis-regulatory binding sites in prokaryotic genomes [24] and implemented it as a Web server DMINDA [25, 26]. This Webbased tool has been integrated into the DOOR2 package in support of cis-regulatory motif prediction for identified operons and TUs (Figure 1B). Such a capability has proved to be essential for elucidation of transcription regulatory networks encoded in a prokaryotic genome. The cis-motif prediction program runs as follows: it first assesses the possibility of each position in a promoter region to be the start of a cis-regulatory motif, and it then distinguishes the actual conserved motifs from the accidental ones based on the concept of motif closure [24]. These two steps make BOBRO substantially sensitive and selective in conserved motif identification against a noisy background.

5 DOOR 5 Figure 4. Displays of computational results by DOOR. (A) A screenshot of a display window. (B) A display of TUs, with the red bars representing genes, the first row of the blue bars representing multigene operons and the following rows of blue bars being TUs under different conditions. (C) A display of validated or predicted transcription factor binding sites (the left bottom) and Rho-independent terminators (on the right). (D) conserved operons. Figure 5. The usage of DOOR and DOOR2, measured by the number of citations, and the types of applications of the DOOR information by third-party publications. A key application of such a prediction capability is to identify operons or TUs that share conserved cis-regulatory binding sites in their promoter regions, for prediction of transcriptionally coregulated and possibly functionally related operons. These operons or TUs constitute a regulon regulated by a transcription factor that binds to such conserved motifs, which may be involved in functionally associated pathways in response to specific conditions. To further support such applications, we have integrated a new feature: identification of co-occurring cisregulatory motifs in the same promoter region, into the motif prediction package. This feature enables elucidation of coregulated operons/tus by multiple transcription factors [27, 28].

6 6 Cao et al. Figure 6. (A) Co-expressed genes/operons in two bi-clusters identified in E. coli under a stationary growth condition. (B) Co-expression networks among bi-clusters. Green nodes represent genes in bi-cluster #1 and red nodes for genes in bi-cluster #2. The larger a node, the higher its degree of presence in the co-expression network, and the thicker an edge, the higher the co-expression level. Co-expression analyses of operons A complementary approach to inference of transcriptionally coregulated operons and TUs to the above motif-based method is through identification of co-expressions of such operons. Knowing that transcriptional co-regulation of operons/tus are generally condition-dependent, we only aim to find operons/ TUs that are co-expressed under some but not necessarily all conditions. This requires a class of more powerful clustering techniques than the traditional clustering methods, namely, biclustering algorithms, also referred to as two-dimensional clustering [29, 30]. These techniques should be able to detect groups of operons/tus, whose correlation (co-expression) exists under (to be identified) subsets of all conditions. We have previously developed a bi-clustering algorithm QUBIC using a graph-theoretical approach [31 33] (Figure 1D). Compared with similar bi-clustering programs, QUBIC solves the general bi-clustering problems more efficiently, and is capable to solving problems with tens of thousands of genes and up to thousands of conditions in a few minutes of the CPU time on a desktop computer. This program will be integrated into the next version of the DOOR database (under development). Figure 6 shows an example of two large bi-clusters (Figure 6A) and their levels of co-expressions (Figure 6B). Operon evolution and global genomic organization Here, we show four examples to illustrate how operon data can be used to derive higher-level and more complex information encoded in prokaryotic genomes. Uber-operons: the footprint of operon evolution By studying conservation among groups of operons that are functionally related, we found that some unions of operons have conserved gene list across genomes, with each such union of operons called a uber-operon [34]. We have previously developed an algorithm for uber-operon prediction through identifying two groups of functionally related operons in a target genome versus a set of reference genomes, whose gene sets are conserved between the target and the reference genomes [12]. Using this algorithm, we have predicted 158 uber-operons containing 1830 genes (40% of all genes) in E. coli K12 against 90 reference genomes [12]. Interestingly, the level of functional relatedness among genes in the same uber-operons is higher than that among genes in the same regulons but lower than that among genes in the same operons in E. coli K12. The functional relatedness is measured based on closeness of KEGG pathways and path distance of gene ontology (GO) terms of the genes [35]. These different levels of functional relatedness between gene scopes suggest that genes in the same uber-operons are functionally related. As the component operons in the same uberoperons may not necessarily be transcriptionally co-regulated, uber-operons provide a novel way to derive functionally associated operons, such as pathways, independent of regulons [12], each of which represents a collection of transcriptionally coregulated operons. Orthologous gene identification based on operons Accurate identification of orthologous genes across genomes is the most basic step in comparative genomics but remains to be solved. We developed a computer tool GOST [11] for orthologous gene mapping across prokaryotic genomes. Compared with all the earlier methods that solely rely on gene sequence similarity, GOST identifies orthologous genes among homologous genes with an additional criterion: orthologous genes between two genomes should have homologous working partners in their respective genomes. Two genes in a genome are considered as working partners if they belong to a common operon or uberoperon [34, 35]. Figure 7 shows a few examples of accurately identified orthologous genes across two genomes [12]. Pathway mapping based on conservation of operons across genomes Accurate mapping of known biological pathways from one prokaryotic organism to another is an important issue when annotating newly sequenced prokaryotic genomes. Existing methods based on gene mapping through sequence homology often leads to incorrect results because sequence similarities alone do not contain sufficient information for correct identification of orthologs. We have developed an algorithm, P-MAP, for pathway mapping across genomes of different prokaryotic species, which integrates both sequence similarity and genomic synteny information defined by operons [36, 37]. Specifically, the algorithm identifies orthologous genes across genomes among

7 DOOR 7 Summary and outlook Operons and TUs are the basic functional units in prokaryotes. Their accurate identification serves as the basis of a wide range of biological studies of prokaryotes. We have developed and maintained an operon database DOOR in the past decade as our service to the research community of bacterial and archaeal organisms. Here, we reviewed the information stored in the DOOR database along with a suite of tools associated with the database in support of operon-based analyses and information discovery. In addition, we have showed a few examples of how operons can be used to not only derive higher-level functional information encoded in a genome but also enable computational studies of fundamental principles in terms of the global organization of operons in a genome. A major challenge to us, the developer of the DOOR database, is to keep up with the rate of prokaryotic genome sequencing. We are currently working toward the next version of the DOOR database to include 7000 complete prokaryotic genomes, along with deployment of additional tools needed to mine such large collections of operons and TUs. Key Points Figure 7. Two examples of orthologous gene prediction. (A) The two red block arrows represent two orthologous genes, which are identified based on the information that they share homologous working partners, represented by the yellow block arrows, for which the popular program RBH failed to identify. (B) Two red block arrows represent two orthologous genes, identified based on the information that they share homologous working partners (the yellow block arrows). homologous genes, which also share homologous working partners. Our analysis on a number of known homologous pathways shows that using genomic synteny information as constraints can greatly improve the pathway-mapping accuracy over methods that use sequence similarity information alone [36]. DOOR is an operon database consisting of multigene operons and single-gene operons in 2072 complete genomes. To facilitate using and mining the data in DOOR, a suite of utility tools has been developed and integrated into DOOR. With the increasing availability of genome-scale transcriptomic data for prokaryotes, increasingly more TUs have been integrated into DOOR, to facilitate systems-level studies of the biology of prokaryotes. The availability of the genome-scale operons and TUs has enabled a new class of functional analyses of prokaryotes as well as fundamental research into the general principles of prokaryotic genomes. Elucidation of the global organizing principle of operons in prokaryotic genomes The availability of genome-scale operons has also made it possible to study whether the global arrangement of operons in a prokaryotic genome is arbitrarily determined or follows some organizing rules, beyond functional studies of prokaryotic genomes. We have previously conducted a series of studies on potential determinants in the global arrangement of operons in a genome [9, 13, 38]. One observation is that operons for pathways with higher frequencies of transcriptional activation tend to be more closely clustered together in the host genome [13]. Further analyses revealed that such pathways with higher frequencies of transcriptional activation tend to have their operons arranged in a fewer supercoils, the basic folding units in a prokaryotic DNA, which are typically kbps long [10]. Our postulation is that prokaryotes have evolved to minimize the total energy needed for folding and unfolding supercoils to enable transcription of operons that need to be activated throughout the life cycle of the organism. Our computational simulation analyses through randomly reshuffling a genome strongly confirmed our hypothesis [9, 13, 38]. This represents the first determinant for the global arrangements of operons in a prokaryotic genome. Funding This work was supported by a grant from the U.S. Department of Energy (grant number #DE-PS02-06ER64304). The BioEnergy Science Center (BESC) is supported by the Office of Biological and Environmental Research in the DOE Office of Science. This work was also supported by National Science Foundation/ EPSCoR Award No. IIA , the State of South Dakota Research Innovation Center, and the Agriculture Experiment Station of South Dakota State University. References 1. Chen IMA, Markowitz VM, Chu K, et al. IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res 2016;45:D Jacob F, Perrin D, Sanchez C, et al. The operon: a group of genes with expression coordinated by an operator. C R Acad Sci Paris 1960;250: Salgado H, Moreno-Hagelsieb G, Smith TF, et al. Operons in Escherichia coli: genomic analyses and predictions. Proc Natl Acad Sci USA 2000;97: Craven M, Page D, Shavlik JW, et al. A probabilistic learning approach to whole-genome operon prediction. In: Proceedings of International Conference on Intelligent Systems for Molecular

8 8 Cao et al. Biology, 2000, p USA, AAAI Press, San Diego, California. 5. Dam P, Olman V, Harris K, et al. Operon prediction using both genome-specific and general genomic information. Nucleic Acids Res 2007;35: Chou W-C, Ma Q, Yang S, et al. Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum. Nucleic Acids Res 2015;43:e Chen X, Chou W-C, Ma Q, et al. SeqTU: a web server for identification of bacterial transcription units. Sci Rep 2017;7: Mao X, Ma Q, Zhou C, et al. DOOR 2.0: presenting operons and their functions through dynamic and integrated views. Nucleic Acids Res 2014;42:D Ma Q, Xu Y. Global genomic arrangement of bacterial genes is closely tied with the total transcriptional efficiency. Genomics Proteomics Bioinformatics 2013;11: Ma Q, Yin Y, Schell MA, et al. Computational analyses of transcriptomic data reveal the dynamic organization of the Escherichia coli chromosome under different conditions. Nucleic Acids Res 2013;41: Li G, Ma Q, Mao X, et al. Integration of sequence-similarity and functional association information can overcome intrinsic problems in orthology mapping across bacterial genomes. Nucleic Acids Res 2011;39:e Che D, Li G, Mao F, et al. Detecting uber-operons in prokaryotic genomes. Nucleic Acids Res 2006;34: Yin Y, Zhang H, Olman V, et al. Genomic arrangement of bacterial operons is constrained by biological pathways encoded in the genome. Proc Natl Acad Sci USA 2010;107: Pertea M, Ayanbule K, Smedinghoff M, et al. OperonDB: a comprehensive database of predicted operons in microbial genomes. Nucleic Acids Res 2009;37:D Taboada B, Ciria R, Martinez-Guerrero CE, et al. ProOpDB: prokaryotic operon database. Nucleic Acids Res 2012;40:D Okuda S, Katayama T, Kawashima S, et al. ODB: a database of operons accumulating known operons across multiple genomes. Nucleic Acids Res 2006;34:D Stoddard SF, Smith BJ, Hein R, et al. rrndb: improved tools for interpreting rrna gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res 2015;43:D Mao F, Dam P, Chou J, et al. DOOR: a database for prokaryotic operons. Nucleic Acids Res 2009;37:D Brouwer RWW, Kuipers OP, van Hijum SAFT. The relative value of operon predictions. Brief Bioinform 2008;9: Bailey TL, Williams N, Misleh C, et al. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006;34:W369 W Olman V, Xu D, Xu Y. CUBIC: identification of regulatory binding sites through data clustering. J Bioinform Comput Biol 2003; 1: Arthur JC, Gharaibeh RZ, Mühlbauer M, et al. Microbial genomic analysis reveals the essential role of inflammation in bacteria-induced colorectal cancer. Nat Commun 2014;5: Sorek R, Cossart P. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet 2010;11: Li G, Liu B, Ma Q, et al. A new framework for identifying cisregulatory motifs in prokaryotes. Nucleic Acids Res 2011;39:e Ma Q, Zhang H, Mao X, et al. DMINDA: an integrated web server for DNA motif identification and analyses. Nucleic Acids Res 2014;42:W12 W Yang J, Chen X, McDermaid A, et al. DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses. Bioinformatics 2017, in pres. 27. Ma Q, Liu B, Zhou C, et al. An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale. Bioinformatics 2013;29: Liu B, Zhang H, Zhou C, et al. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes. BMC Genomics 2016;17: Prelic A, Bleuler S, Zimmermann P, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006;22: Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002; 18:S136 S Li G, Ma Q, Tang H, et al. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res 2009;37:e Zhou F, Ma Q, Li G, et al. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters. PLoS One 2012;7:e Zhang Y, Xie J, Yang J, et al. QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data. Bioinformatics 2017;33: Lathe WC, III, Snel B, Bork P. Gene context conservation of a higher order than operons. Trends Biochem Sci 2000;25: Wu H, Su Z, Mao F, et al. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res 2005;33: Mao F, Su Z, Olman V, et al. Mapping of orthologous genes in the context of biological pathways: an application of integer programming. Proc Natl Acad Sci USA 2006;103: Liu B, Zhou C, Li G, et al. Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses. Sci Rep 2016;6: Ma Q, Chen X, Liu C, et al. Understanding the commonalities and differences in genomic organizations across closely related bacteria from an energy perspective. Sci China Life Sci 2014;57:

DOOR: A microbial operon database for gene and genome organization and function discovery

DOOR: A microbial operon database for gene and genome organization and function discovery 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 DOOR: A microbial operon database for gene and genome organization and function discovery Huansheng Cao, Qin Ma, Xin

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Genomic Arrangement of Regulons in Bacterial Genomes

Genomic Arrangement of Regulons in Bacterial Genomes l Genomes Han Zhang 1,2., Yanbin Yin 1,3., Victor Olman 1, Ying Xu 1,3,4 * 1 Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics,

More information

Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses

Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Journal Articles Computer Science and Engineering, Department of 2016 Bacterial regulon modeling and prediction based

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION JIZHEN ZHAO, DONGSHENG CHE AND LIMING CAI Department of Computer Science, University of Georgia,

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Evaluation of the relative contribution of each STRING feature in the overall accuracy operon classification

Evaluation of the relative contribution of each STRING feature in the overall accuracy operon classification Evaluation of the relative contribution of each STRING feature in the overall accuracy operon classification B. Taboada *, E. Merino 2, C. Verde 3 blanca.taboada@ccadet.unam.mx Centro de Ciencias Aplicadas

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Computational Cell Biology Lecture 4

Computational Cell Biology Lecture 4 Computational Cell Biology Lecture 4 Case Study: Basic Modeling in Gene Expression Yang Cao Department of Computer Science DNA Structure and Base Pair Gene Expression Gene is just a small part of DNA.

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico;

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico; The EcoCyc Database Peter D. Karp, Monica Riley, Milton Saier,IanT.Paulsen +, Julio Collado-Vides + Suzanne M. Paley, Alida Pellegrini-Toole,César Bonavides ++, and Socorro Gama-Castro ++ January 25, 2002

More information

Biological Systems: Open Access

Biological Systems: Open Access Biological Systems: Open Access Biological Systems: Open Access Liu and Zheng, 2016, 5:1 http://dx.doi.org/10.4172/2329-6577.1000153 ISSN: 2329-6577 Research Article ariant Maps to Identify Coding and

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information - Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes - Supplementary Information - Martin Bartl a, Martin Kötzing a,b, Stefan Schuster c, Pu Li a, Christoph Kaleta b a

More information

Lecture 4: Transcription networks basic concepts

Lecture 4: Transcription networks basic concepts Lecture 4: Transcription networks basic concepts - Activators and repressors - Input functions; Logic input functions; Multidimensional input functions - Dynamics and response time 2.1 Introduction The

More information

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Gene expression Gene expression is a process that takes gene info and creates

More information

Microbial computational genomics of gene regulation*

Microbial computational genomics of gene regulation* Pure Appl. Chem., Vol. 74, No. 6, pp. 899 905, 2002. 2002 IUPAC Microbial computational genomics of gene regulation* Julio Collado-Vides, Gabriel Moreno-Hagelsieb, and Arturo Medrano-Soto Program of Computational

More information

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Supplemental Materials

Supplemental Materials JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, May 2013, p. 107-109 DOI: http://dx.doi.org/10.1128/jmbe.v14i1.496 Supplemental Materials for Engaging Students in a Bioinformatics Activity to Introduce Gene

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

56:198:582 Biological Networks Lecture 8

56:198:582 Biological Networks Lecture 8 56:198:582 Biological Networks Lecture 8 Course organization Two complementary approaches to modeling and understanding biological networks Constraint-based modeling (Palsson) System-wide Metabolism Steady-state

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes Chapter 16 Lecture Concepts Of Genetics Tenth Edition Regulation of Gene Expression in Prokaryotes Chapter Contents 16.1 Prokaryotes Regulate Gene Expression in Response to Environmental Conditions 16.2

More information

A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12

A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 The integration host factor regulon of E. coli K12 genome 783 A genomic-scale search for regulatory binding sites in the integration host factor regulon of Escherichia coli K12 M. Trindade dos Santos and

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

15.2 Prokaryotic Transcription *

15.2 Prokaryotic Transcription * OpenStax-CNX module: m52697 1 15.2 Prokaryotic Transcription * Shannon McDermott Based on Prokaryotic Transcription by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

86 Part 4 SUMMARY INTRODUCTION

86 Part 4 SUMMARY INTRODUCTION 86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

SUPPLEMENTARY METHODS

SUPPLEMENTARY METHODS SUPPLEMENTARY METHODS M1: ALGORITHM TO RECONSTRUCT TRANSCRIPTIONAL NETWORKS M-2 Figure 1: Procedure to reconstruct transcriptional regulatory networks M-2 M2: PROCEDURE TO IDENTIFY ORTHOLOGOUS PROTEINSM-3

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences

Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences Daisuke Ikeda Department of Informatics, Kyushu University 744 Moto-oka, Fukuoka 819-0395, Japan. daisuke@inf.kyushu-u.ac.jp

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

October 08-11, Co-Organizer Dr. S D Samantaray Professor & Head,

October 08-11, Co-Organizer Dr. S D Samantaray Professor & Head, A Journey towards Systems system Biology biology :: Biocomputing of of Hi-throughput Omics omics Data data October 08-11, 2018 Coordinator Dr. Anil Kumar Gaur Professor & Head Department of Molecular Biology

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

CHAPTER : Prokaryotic Genetics

CHAPTER : Prokaryotic Genetics CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

UNIVERSITY OF YORK. BA, BSc, and MSc Degree Examinations Department : BIOLOGY. Title of Exam: Molecular microbiology

UNIVERSITY OF YORK. BA, BSc, and MSc Degree Examinations Department : BIOLOGY. Title of Exam: Molecular microbiology Examination Candidate Number: Desk Number: UNIVERSITY OF YORK BA, BSc, and MSc Degree Examinations 2017-8 Department : BIOLOGY Title of Exam: Molecular microbiology Time Allowed: 1 hour 30 minutes Marking

More information

The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces

The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces Xizeng Mao 1, Han Zhang 1,4, Yanbin Yin 1, 2 1, 2, 3, Ying Xu 1 Computational Systems Biology

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach.

Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach. Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach Valery Kirzhner *1 & Zeev Volkovich 2 1 Institute of Evolution, University

More information

the noisy gene Biology of the Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB)

the noisy gene Biology of the Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB) Biology of the the noisy gene Universidad Autónoma de Madrid Jan 2008 Juan F. Poyatos Spanish National Biotechnology Centre (CNB) day III: noisy bacteria - Regulation of noise (B. subtilis) - Intrinsic/Extrinsic

More information

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction Lecture 8: Temporal programs and the global structure of transcription networks Chap 5 of Alon 5. Introduction We will see in this chapter that sensory transcription networks are largely made of just four

More information

This document describes the process by which operons are predicted for genes within the BioHealthBase database.

This document describes the process by which operons are predicted for genes within the BioHealthBase database. 1. Purpose This document describes the process by which operons are predicted for genes within the BioHealthBase database. 2. Methods Description An operon is a coexpressed set of genes, transcribed onto

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes

An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Journal Articles Computer Science and Engineering, Department of 206 An integrative and applicable phylogenetic footprinting

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Bi 1x Spring 2014: LacI Titration

Bi 1x Spring 2014: LacI Titration Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

56:198:582 Biological Networks Lecture 10

56:198:582 Biological Networks Lecture 10 56:198:582 Biological Networks Lecture 10 Temporal Programs and the Global Structure The single-input module (SIM) network motif The network motifs we have studied so far all had a defined number of nodes.

More information

56:198:582 Biological Networks Lecture 9

56:198:582 Biological Networks Lecture 9 56:198:582 Biological Networks Lecture 9 The Feed-Forward Loop Network Motif Subgraphs in random networks We have discussed the simplest network motif, self-regulation, a pattern with one node We now consider

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Basic modeling approaches for biological systems. Mahesh Bule

Basic modeling approaches for biological systems. Mahesh Bule Basic modeling approaches for biological systems Mahesh Bule The hierarchy of life from atoms to living organisms Modeling biological processes often requires accounting for action and feedback involving

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

RNA Synthesis and Processing

RNA Synthesis and Processing RNA Synthesis and Processing Introduction Regulation of gene expression allows cells to adapt to environmental changes and is responsible for the distinct activities of the differentiated cell types that

More information

Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability

Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability Annotation of promoter regions in microbial genomes 851 Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability VETRISELVI RANGANNAN and MANJU BANSAL*

More information

Computational Systems Biology

Computational Systems Biology Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Clusters of Orthologous Groups (COG) Open Biology Ontology (OBO) gene regulatory data custom classification schemes

Clusters of Orthologous Groups (COG) Open Biology Ontology (OBO) gene regulatory data custom classification schemes Paver is a DECODON service for the analysis and visualization of large OMICs data sets, including data from transcriptome, proteome and / or metabolome analyses. A newly developed integrated visualization

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Revisiting the Central Dogma The role of Small RNA in Bacteria

Revisiting the Central Dogma The role of Small RNA in Bacteria Graduate Student Seminar Revisiting the Central Dogma The role of Small RNA in Bacteria The Chinese University of Hong Kong Supervisor : Prof. Margaret Ip Faculty of Medicine Student : Helen Ma (PhD student)

More information

OMICS Journals are welcoming Submissions

OMICS Journals are welcoming Submissions OMICS Journals are welcoming Submissions OMICS International welcomes submissions that are original and technically so as to serve both the developing world and developed countries in the best possible

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information