Ensembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser.

Similar documents
Browsing Genomic Information with Ensembl Plants

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Browsing Genes and Genomes with Ensembl

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

training workshop 2015

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Genome Browsers And Genome Databases. Andy Conley Computational Genomics 2009

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

CS612 - Algorithms in Bioinformatics

Introduction to the EMBL-EBI Ontology Lookup Service

Genomes and Their Evolution

Introduction to Bioinformatics Integrated Science, 11/9/05

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

EBI web resources II: Ensembl and InterPro

Introduction to Bioinformatics

Computational Structural Bioinformatics

Comparative genomics: Overview & Tools + MUMmer algorithm

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Bioinformatics Exercises

Chemical Data Retrieval and Management

Araport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Computational Biology: Basics & Interesting Problems

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Hands-On Nine The PAX6 Gene and Protein

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Small RNA in rice genome

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

BIOINFORMATICS LAB AP BIOLOGY

Networks & pathways. Hedi Peterson MTAT Bioinformatics

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Host_microbe_PPI - R package to analyse intra-species and interspecies protein-protein interactions in the model plant Arabidopsis thaliana

Creating a Dichotomous Key

Introduction to Bioinformatics Online Course: IBT

Tandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

a-dB. Code assigned:

ECOL/MCB 320 and 320H Genetics

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Sequences, Structures, and Gene Regulatory Networks

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

SPRINGFIELD TECHNICAL COMMUNITY COLLEGE ACADEMIC AFFAIRS

OMICS Journals are welcoming Submissions

a-fB. Code assigned:

Cladistics and Bioinformatics Questions 2013

Genomics and bioinformatics summary. Finding genes -- computer searches

The World Bank and the Open Geospatial Web. Chris Holmes

Homology and Information Gathering and Domain Annotation for Proteins

Chapter 15 Active Reading Guide Regulation of Gene Expression

Prereq: Concurrent 3 CH

Integration of functional genomics data

Supplemental Materials

MiGA: The Microbial Genome Atlas

"Omics" - Experimental Approachs 11/18/05

A Browser for Pig Genome Data

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Synteny Portal Documentation

EMMA : ECDC Mapping and Multilayer Analysis A GIS enterprise solution to EU agency. Sharing experience and learning from the others

Write a report (6-7 pages, double space) on some examples of Internet Applications. You can choose only ONE of the following application areas:

CLASSIFICATION. Why Classify? 2/18/2013. History of Taxonomy Biodiversity: variety of organisms at all levels from populations to ecosystems.

Bioinformatics Chapter 1. Introduction

BIOAG'L SCI + PEST MGMT- BSPM (BSPM)

Chapter 18 Active Reading Guide Genomes and Their Evolution

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level.

TRANSCRIPTION VS TRANSLATION FILE

Using Bioinformatics to Study Evolutionary Relationships Instructions

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Georgia Standards of Excellence Biology

Journal of Proteomics & Bioinformatics - Open Access

BIODIVERSITY AND TAXONOMY

Introduction to Evolutionary Concepts

Student Handout Fruit Fly Ethomics & Genomics

Presentation by Julie Hudson MAT5313

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RNA- seq read mapping

Search for a location using the location search bar:

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

CLASSIFICATION UNIT GUIDE DUE WEDNESDAY 3/1

2 Genome evolution: gene fusion versus gene fission

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

SnoPatrol: How many snorna genes are there? Supplementary

Fitness constraints on horizontal gene transfer

Chapter 19 Bacteria And Viruses Prokaryotes Concept Map

Spatial Data Infrastructure Concepts and Components. Douglas Nebert U.S. Federal Geographic Data Committee Secretariat

PDF // IS BACTERIA A PROKARYOTE OR EUKARYOTE

Chapter 16: Reconstructing and Using Phylogenies

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Chapter 19: Taxonomy, Systematics, and Phylogeny

Transcription:

Paul Kersey [1] DNA & RNA Beginner 0.5 hour This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser. Learning objectives: Basic understanding of Ensembl Genomes and how it can help you to access and analyse genome-scale data Know where to find out more about the Ensembl Genomes resource What is Ensembl Genomes? Ensembl Bacteria [3], Protists [4], Fungi [5], Plants [6] and Metazoa [7] (collectively, Ensembl Genomes ) are five portals for genome-scale data, developed in close collaboration with scientific communities expert in the biology of individual species. Implemented using the Ensembl software suite for genome analysis and browsing, which was developed for the study of vertebrate genomes (described by Flicek et al. 2012, Ensembl 2012, Nucleic Acids Res. 40 D84-90 [8]), Ensembl Genomes [9] provides a powerful and consistent set of interactive and programmatic interfaces for nonvertebrate genomes, providing access to data including gene predictions, comparative analysis, variant annotation, and transcriptomic alignments. Since its establishment in 2009, the resource has grown rapidly and now contains 70 eukaryotic and 249 bacterial species. Moreover, recent improvements to the site make it easier than ever to upload your own data and to visualise and analyse it in the context of reference annotations (Figure 1). Figure 1 Ensembl Genomes entry page showing the five portals on the right-hand side. Data sources Page 1 of 10

Ensembl Genomes provides access to genome-scale data through a number of interfaces, including a web browser, a search-optimised data warehouse [10], bulk download and various programmatic interfaces. The data come from a variety of sources, including collaborators in the scientific community, publicly available data archives, and computational analysis pipelines run at the EBI and elsewhere. As the scientific scope of the project is exceptionally broad, our goal is to provide an upto-date view of core annotation as recognised by the relevent scientific communities, integrated with data from other species through the use of shared interfaces and comparative analysis. Wherever possible, we actively collaborate with community groups in the management of data and the development of services, including: WormBase [11] (for nematodes) VectorBase [12] (for invertebrate vectors of human pathogens) Gramene [13] (for plants) PomBase [14] (for the fission yeast Schizosaccharomyces pombe) PhytoPath [15] (for plant pathogens) CADRE [16] (for Aspergilli). We also show cannonical annotations from leading model organism databases such as: DictyBase [17] (for Dictyostelium discoideum) FlyBase [18] (for Drosophila) SGD [19] (for Saccharomyces cerevisiae). What can you find in Ensembl Genomes? The current Ensembl Genomes [9] portals (as of March 2012) are summarised below. They are: Ensembl Metazoa [20], Plants, Protists, Fungi and Bacteria. New genomes are added to the project with each release. Ensembl Metazoa [7] Contains data from non-chordate [21] metazoan species, including species annotated by WormBase [22], FlyBase [23] and VectorBase [24]: worms, flies and arthropod [25] vectors of human pathogens. Specifically, these include 12 genomes of the genus Drosophilia, four genomes of nematode worms and five genomes of insect pathogens (three mosquitoes, the body louse Pediculus humanus, and the black-legged tick Ixodes scapularis). Additionallly, a further seven species are included from outside these groups, ranging from other arthropoda (e.g. the honey bee) to basal metazoans such as Trichoplax adhaerens. Ensembl Plants [6] Developed jointly with Gramene (a resource for plant genomics based at the Cold Spring Harbor Laboratory) and contains data for ten flowering plants (six monocots and four dicots [26]) and one species of moss. Variation databases (recording genome-wide polymorphism [27] across populations) are provided for Arabidopsis, rice and grape. Ensembl Protists [4] Includes the genomes of a number of Apicomplexa species (including the causative agents of Page 2 of 10

malaria), Leishmania major, the slime mould Dictyostelium discoideum, two diatom genomes and four plant pathogens. Ensembl Fungi [5] Includes the genomes of model species such as Sacchromyces cerevisiae, Schizosaccharomyces pombe, and Neurospora crassa, twelve phytopathogens, and data from eight species of Aspergillus, prepared in collaboration with the Central Aspergillus Data REpository (CADRE [16]). Ensembl Bacteria [3] Covers a number of clades, each represented by multiple genomes. The bacterial clades are Escherichia/Shigella and Bacillus (home to the model organisms E. coli and B. subtilis), and five important clades of pathogenic bacteria: Borrellia, Mycobacterium, Neisseria, Staphylococcus and Streptococcus, and two genera of arthropod symbionts, Buchnera and Wolbachia. A further clade covers the archaeal genus Pyrococcus. The genomes of between 4 and 77 related strains are captured within each database, and access to inter- and intra-clade comparative genomics is provided. What can I do with Ensembl Genomes Using Ensembl Genomes data The genome is a natural entry point for many types of bioinformatics data, providing both a reference framework and a scientific context within which the results of transcriptomic or proteomic data can be interpreted. Using Ensembl Genomes [9], you can: Retrieve all or part of a genome sequence. Use the sequence alignment search tool BLAST [28] against any genome. Link to genome annotation [29] from microarray [30] results. Examine features (e.g. protein coding genes, non-protein coding genes and SNPs) in a chromosomal region. View all alternative transcripts (including variants caused by alternative splicing and promoter usage) for a gene. View positions and sequences of mrnas and proteins that align with a gene. Explore homologues and phylogenetic trees across, within and between clades. View sequence alignments and conserved regions across species. Export sequence, or create a table of gene or SNP information using BioMart (see Getting data from Ensembl Genomes [31]). Upload your own data, including next-generation sequencing reads (in common alignment formats such as BAM [32]) to view in the context of public annotation. Predict the effect of nucleotide sequence variants on genes and their products. The taxonomic spread of Ensembl Genomes allows you to view comparative genomic information at many levels, focusing either on a narrow taxonomic range (potentially as small as one bacterial species) or a pan-taxonomic focus (from microbe to man). Additionally, the provision of data for pathogenic species, their vectors and their hosts through a common interface provides a unique Page 3 of 10

resource for the study of pathogen-mediated diseases of medical and agricultural importance. Genome browser The genome browser is the main point of access for most users of Ensembl Genomes [9]. The browser displays gene structures, supporting evidence, cross-references, genome-wide assays and a range of other features. It is easily configurable and functions as a client for the DAS [33] protocol, allowing you to rapidly integrate additional data. The browser enables you to upload your own next generation RNA sequence data in common file formats (such as BAM [32]). You can also visualise your alignments alongside the reference annotation [29] (Figures 2 and 3). Figure 2 Visualising a BAM file in Ensembl Genomes. In this example, conventional EST data (green tracks) are visualised against the reference gene annotation [34], and a BAM file (showing RNA-seq [35] data) has also been visualised. Individual read alignments (grey tracks, with red ends on each read), and overall read coverage at each position (top track, grey), are shown. Page 4 of 10

Figure 3 Variant annotation in Location View. This example shows a simple visualisation of sequence variations identified at the displayed locus, colour coded by their functional effects on their most proximate gene. This is one of many views that provide access to variation data in Ensembl Genomes. Bacterial genomes Ensembl Bacteria [3] provides special graphical representations to support the correct depiction of bacterial genomes, including circular chromosomes, alternative translational initiation and polycistronic transcripts (Figure 4). Page 5 of 10

Figure 4 Ensembl Bacteria web page showing the circular navigation tool. Users can click on the round handles at the end of the selected sector (shown in red) to adjust the region shown in the 'Region in detail' window. Tools for comparative genomics The Ensembl [36] comparative genomics analyses (Ensembl Compara [37]) are applied to all domains within Ensembl and Ensembl Genomes [9]. The DNA comparison modules are suitable for use over a narrow taxonomic range, and can be used to support multiple sequence alignments and Page 6 of 10

to make predictions of ancestral DNA sequence (Figure 5). The protein comparison modules compute the evolutionary history of a gene familiy, and can be applied over much larger evolutionary distances. A 'gene tree [38]' provides a summary of this information (and the anchor point for outward links to pairwsie and multiple-sequence alignments, and orthologue [39]/paralogue [40] lists). Figure 5 Visualisation of homology relationships in Ensembl Genomes: viewing only the current gene CRP34 and its immediate orthologues. Using the view options, you can expand selected nodes to see more distant branches of the tree in more detail. Visualising genomic polymorphism The Ensembl variation schema is used to capture information from population-wide surveys of sequence polymorphism [27], and is currently populated for one mosquito, two fungi, one protist, and four plant species. Data are available for download or can be visualised in the genome browser (Figure 6). Figure 6 Visualising genomic variation in a genetic context. The figure shows the location and functional consequence of genomic polymorphisms in the context of a neighbouring gene, and the domain structure of the protein it encodes. Page 7 of 10

Getting data from Ensembl Genomes Ensembl Genomes [9] provides access through a variety of interfaces, including a graphical web browser [41] with text and sequence search, an FTP server [42], a Perl [43] interface for programmatic access, and a public mysql [44] server. Additionally, Ensembl Genomes data is provided in a search-optimised data warehouse [10] (BioMart) that provides facilities for fast, customisable download and analysis. You can link searches between different BioMarts in separate locations, allowing you to combine Ensembl Genomes data with information from external resources, and to construct searches across the genomes of different taxonomic groups. Your feedback Please tell us what you thought about this course. Your feedback is invaluable and helps us to improve our courses and thus enhance your learning experience. Get help and support on Ensembl Genomes Support There are separate help pages for each Ensembl Genomes [9] portal. For example, for Metazoa [20], the help pages and documentation can be found here [45]. To use help for a different Ensembl Genomes portal, simply replace the word 'metazoa' in the URL with the names of the other portals (bacteria, fungi, plants, protists). Any questions or comments can be sent to our helpdesk [46]. References Ensembl Genomes Kersey, P.J. et al. (2012) Ensembl Genomes 2016: more genomes, more complexity. [47] Nucleic Acids Research 44:D574-80. Kersey, P.J. et al. (2012) Ensembl Genomes: an integrative resource for genome-scale data from nonvertebrate species. [48] Nucleic Acids Research, Database Issue. Ensembl BioMarts Kinsella, R.J. et al. (2011) Ensembl BioMarts: a hub for data retrieval across taxonomic space. [49] Database : the Journal of Biological Databases and Curation [50] Collaborators Selected databases in Ensembl Genomes have been built by a number of collaborating resources, including the Central Aspergillus REsource [16] (CADRE), Gramene [13], PomBase [14], PhytoPath [15], VectorBase [12], and WormBase [11]. Page 8 of 10

Funding Ensembl Genomes is supported by the European Commission [51] within its 7th Framework Programme, under the thematic area "Infrastructures", contract number 283496 ("transplant"); by the award of BBSRC [52] grants BB/F019793/1, BB/1001077/1 and BH/H531519/1; by the award WT090548MA from the Wellcome Trust [53]; and by the Bill and Melinda Gates Foundation [54]. Contributors [1] Paul Kersey [1] EMBL-EBI Team leader Paul Kersey did his PhD at the University of Edinburgh on the molecular biology and genetics of the cell cycle of fission yeast, and subsequently worked at the MRC Human Genetics Unit. Since 1999, Paul has worked at EMBL-EBI on various databases, including UniProt, GOA, Integr8, IPI, and Genome Reviews. He currently leads the Ensembl Genomes team, which provides services for genome-scale data from non-vertebrate species, primarily through the use of the Ensembl software platform. The team integrates data from across the taxonomic space, and collaborates closely with particular groups involved in genome annotation and the development of domain-specific resources. All course materials in Train online are free cultural works licensed under a Creative Commons Attribution-ShareAlike 4.0 International license Source URL: http://www.ebi.ac.uk/training/online/course/ensembl-genomes-non-chordates-quicktour Links [1] http://www.ebi.ac.uk/training/online/trainers/pkersey [2] http://ensemblgenomes.org [3] http://bacteria.ensembl.org/index.html [4] http://protists.ensembl.org/index.html [5] http://fungi.ensembl.org/index.html Page 9 of 10

[6] http://plants.ensembl.org/index.html [7] http://metazoa.ensembl.org/index.html [8] http://www.ebi.ac.uk/citexplore/citationdetails.do?externalid=22086963&datasource=med [9] http://www.ebi.ac.uk/training/online/glossary/ensembl-genomes [10] http://www.ebi.ac.uk/training/online/glossary/data-warehouse [11] http://www.wormbase.org/ [12] http://www.vectorbase.org/ [13] http://www.gramene.org/ [14] http://www.pombase.org/ [15] http://www.phytopathdb.org/ [16] http://www.cadre-genomes.org.uk/index.html [17] http://dictybase.org/ [18] http://flybase.org/ [19] http://www.yeastgenome.org/ [20] http://www.ebi.ac.uk/training/online/glossary/metazoa [21] http://www.ebi.ac.uk/training/online/glossary/non-chordate [22] http://www.ebi.ac.uk/training/online/glossary/wormbase [23] http://www.ebi.ac.uk/training/online/glossary/flybase [24] http://www.ebi.ac.uk/training/online/glossary/vectorbase [25] http://www.ebi.ac.uk/training/online/glossary/arthropod [26] http://www.ebi.ac.uk/training/online/glossary/dicots [27] http://www.ebi.ac.uk/training/online/glossary/polymorphism [28] http://www.ebi.ac.uk/training/online/glossary/blast [29] http://www.ebi.ac.uk/training/online/glossary/annotation [30] http://www.ebi.ac.uk/training/online/glossary/microarray [31] http://www.ebi.ac.uk/training/online/course/ensembl-genomes-non-chordates-quick-tour/gettingdata-ensembl-genomes [32] http://www.ebi.ac.uk/training/online/glossary/bam [33] http://www.ebi.ac.uk/training/online/glossary/das [34] http://www.ebi.ac.uk/training/online/glossary/gene-annotation [35] http://www.ebi.ac.uk/training/online/glossary/rna-seq [36] http://www.ebi.ac.uk/training/online/glossary/ensembl [37] http://www.ebi.ac.uk/training/online/glossary/ensembl-compara [38] http://www.ebi.ac.uk/training/online/glossary/gene-tree [39] http://www.ebi.ac.uk/training/online/glossary/orthologue [40] http://www.ebi.ac.uk/training/online/glossary/paralogue [41] http://www.ensemblgenomes.org [42] ftp://ftp.ensemblgenomes.org/pub [43] http://www.ebi.ac.uk/training/online/glossary/perl [44] http://www.ebi.ac.uk/training/online/glossary/mysql [45] http://metazoa.ensembl.org/info/index.html [46] mailto:helpdesk@ensemblgenomes.org [47] http://europepmc.org/abstract/med/26578574 [48] http://europepmc.org/abstract/med/22067447 [49] http://europepmc.org/abstract/med/21785142 [50] http://www.ebi.ac.uk/training/online/glossary/curation [51] http://ec.europa.eu/index_en.htm [52] http://www.bbsrc.ac.uk [53] http://www.wellcome.ac.uk/ [54] http://www.gatesfoundation.org/pages/home.aspx Page 10 of 10