Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Size: px
Start display at page:

Download "Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:"

Transcription

1 Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat, chicken, puffer fish, zebrafish, Tetraodon Arthropods: the mosquito Anopheles gambiae and Drosophila melanogaster, honeybee Nematodes: Caenorhabditis elegans, C. briggsae Reach the home pages for each species via the generic Ensembl home page ( or bookmark a species homepage with a URL like for those species for which we have an assembly but still being annotated, we have a Preview browser (Pre!) where you can have a look at new assemblies or new species: Rat home page Additional animal genomes will be incorporated in the future, with the emphasis on those important for biomedical research or for evolutionary comparisons. At the time of writing, sequencing of the following genomes is underway, and some of these will appear in Ensembl during 2005: 36

2 Vertebrates: Rhesus macaque, opossum and Xenopus Arthropods: Aedes mosquito A key element of any genome browser is the display of genes in their chromosomal locations. For most species, Ensembl runs an automated sequence annotation pipeline and gene build to provide annotation including genome-wide gene and protein sets. There are different challenges associated with building a comprehensive gene set in different organisms. The Ensembl gene building process is discussed further later in the workshop. For species where the research community is generating comprehensive manual annotation, Ensembl incorporates those gene and protein sets instead of, or in addition too, its own automated annotation. Thus, manual annotation is displayed for some human chromosomes, alongside the Ensembl predictions, and the manually curated genome-wide gene sets for D. melanogaster and C. elegans are used in place of an Ensembl set. The additional types of annotation available will vary to some extent between species. But because annotation is stored and displayed in a consistent way for all species, your experience working with one species transfers to new species, and comparisons of genomic sequence and homologous genes and proteins between species are facilitated. Orthologues One kind of comparative analysis focuses on genes and proteins, and attempts to identify the orthologue (the same gene) in different genomes. Apart from the value of such data in evolutionary studies, it is very useful to be able to identify the equivalent of important genes in one organism (for example, human disease genes) in other organisms that provide experimentally tractable models for studying that gene. The classic model animals are now all represented in Ensembl (Drosophila, C. elegans, mouse) as well as zebrafish, and it is hoped to include other useful models as they become available. The automated identification of orthologues is made more difficult by the existence of families of closely related genes. Under such circumstances, Ensembl may show more than one potential orthologue, and the results need to be treated with caution. Of course there may really be more than one orthologue when a lineage has generated additional family members by duplication after the divergence of the two organisms under consideration. See the section below on synteny blocks for one way that Ensembl can help you to assess the orthologue pairs. In Ensembl, orthologues are identified starting with comparisons at the protein level. Allversus all BLASTP+SW (Smith-Waterman algorithm) is first used to identify those protein pairs that are reciprocal best hits between two sets of proteins that represent every gene in the two organisms. Additional putative orthologues are then sought using synteny information and the reciprocal best hits as RHS for Reciprocal Hit supported by Synteny. Where two homologous proteins are encoded by genes each located within 1 Mb of a pair of BRH, they are good candidates for being an additional orthologous pair. Currently we divide these BRH into UBRH Unique Best Reciprocal Hit and MBRH Multiple Best Reciprocal Hit the latter when have multiple but identical best hits, as it can happen if there is perfect protein sequence duplication of translated genes within a species. The same approach permits the 37

3 identification of adjacent family members that may be recently duplicated lineage-specific paralogues. Ensembl shows the information about potential orthologues on each GeneView page (and also in SyntenyView displays of synteny blocks, where these are available). The procedure has been applied to all pairs of vertebrates within Ensembl, to the two nematodes, and to the two insects Parts of a GeneView page, showing putative orthologues EnsMart lets you access and use this orthology information in a variety of ways. A set of genes can be selected that have identified orthologues in another species, and further restricted to those that share conserved upstream sequence. For any set of genes, output can include the details of any orthologues in other species and the locations of conserved upstream sequence. Protein families What if the orthologue identification procedure fails to find a pair? And what if you are interested in looking at a wider set of potential orthologues and paralogues between two species or within a wider range of organisms? One option is to look for proteins that share particular domains. Ensembl runs domain prediction programs on all its protein sets, and provides access to this information in ProteinView (for individual proteins), and in DomainView (showing all the genes in a species that share a particular InterPro domain). However, some domains are shared by a wide range of proteins that have very different functions. Ensembl s protein families are an attempt to identify clusters of functionally related proteins, among which one might expect to find most orthologues and paralogues. The family database is generated by running the Tribe-MCL sequence clustering algorithm on a set of peptides consisting of the Ensembl predictions for each Ensembl species, together with all metazoan sequences from Swiss-Prot and SPTrEMBL. On this set of peptides, an all-against-all BLASTP is run to establish similarities. Using these similarities, clusters can be established using the MCL algorithm. [For more detail of the underlying 38

4 methods, see Enright, A.J. et al. (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res. 30, ]. The efficiency of this approach permits clustering to be done within a realistic time despite the very large numbers of proteins involved (for a recent release, around half a million protein sequences were loaded and processed in less than 24 hours, using 400 CPUs) Domain and Family information on a ProteinView page Both family and domain information are shown in Ensembl GeneView and ProteinView pages, and DomainView and FamilyView pages make it easy to examine all the identified genes and proteins within a species. For families, you can also see the family members in other Ensembl species and the UniProt entries (from all metazoans) that fall in the same cluster. In the future, we plan to introduce a similar multi-species display for protein domains. EnsMart provides the means to rapidly and easily download sets of transcript or protein sequence with particular domains or from particular families, which can be very useful as starting points for alignment and phylogenetic analysis. In addition, the Ensembl database stores pre-calculated protein alignments for all members of a family, and these alignments can be displayed in JalView. 39

5 Part of a family alignment in JalView Whole genome DNA-DNA alignments The alignment of the whole DNA sequence from two organisms is computationally demanding, and the algorithms to carry it out are under active development [see for example Ureta-Vidal, A. et al. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes Nat. Rev. Genet. 4, ]. Such data are of great interest both in studies of the mechanisms of molecular evolution and in attempts to identify conserved functional sequences such as novel genes and regulatory regions. Whole genome alignments become increasingly difficult as the evolutionary distance between two organisms increases. At present, Ensembl displays pair-wise alignments within mammals (human, mouse and rat), and within nematodes (C. elegans and C. briggsae); within these species groups, separation probably occurred <100 Mya. Ensembl is experimenting with different procedures to do the alignments: at present conserved regions are by identified either by BLASTz (data obtained from UCSC Genome Bioinformatics group) or by Phusion/BLASTn (used for C. elegans and C. briggsae); this firstly runs the two genomes through Phusion, which takes unique 17mers from one genome and compares them to the second genome, creating clusters of contigs from both genomes; and then comparing contigs within clusters using Washington University's version of BLASTn (without repeat masking). The output of wublastn is postprocessed to keep only high-scoring pairs and to identify diagonals of blast alignments. Regions of these alignments that represent highly conserved regions are then selected using a filtering method devised by Jim Kent [see Schwartz, S. et al. (2003) Human-Mouse Alignments with BLASTZ Genome Res. 13, ]. A third method, translated BLAT is used to compare genomes from more evolutionarily distant species, at the amino acid level. Thus regions of similarity will be biased towards those that code for proteins, although highly conserved non-coding regions might be detected as well. You can show a number of tracks (e.g. human vs. chimpanzee, human vs. rat, human vs. mouse, human vs. chicken, human vs. Fugu, human vs. zebrafish, human vs. Tetraodon, etc) displaying the conservation within the cluster (vertebrates, arthropods and nematodes) clustered d/or Caenorhabditis elegans vs. C. briggsae) from the Compara menu in ContigView 40

6 for each comparison, showing two levels of conservation (labelled cons for BLASTz or Phusion/BLAST comparisons and high cons for highly conserved). Links make it easy to navigate back and forth to see details of the region in the two genomes and to download the sequence of regions of interest. Part of a human ContigView detailed display panel, showing whole genome alignments with mouse and rat. Further access to the conserved sequences at Ensembl is provided via DotterView displays of local alignments, while EnsMart provides a route for identifying specifically those highly conserved sequences that are located in regions upstream of pairs of orthologous genes. DotterView display showing two homologous exons Synteny blocks The identifications of segments of the genome where the order of particular genes is conserved between two species ( synteny blocks ) is of interest not only for studying the evolution of chromosome structure, but also for helping to predict and identify pairs of genes between species that are (or are not!) orthologues. Where candidate orthologues in two 41

7 species are found to be located within well-conserved synteny blocks, you can have more confidence that the pair have been correctly labelled as orthologues. Ensembl finds synteny blocks by grouping the conserved regions identified from genomic alignments. To be grouped, matches must represent the same relative orientation of chromosome sequences and must be separated by less than 100 kb. Groups that make up a block of less than 100 kb are discarded (parameters may be varied for different species). The approach requires that the species are close enough for genomic alignments to be attempted. Part of a human CytoView display of human chromosome 11, showing synteny blocks conserved on chimpanzee, mouse, rat and chicken. Ensembl shows the synteny blocks in ContigView (overview panel) and CytoView displays for mammalian and nematode comparisons. In CytoView only, the blocks provide links to display that region in the other species. In addition, SyntenyView shows all blocks on a whole chromosome, related to the conserved blocks in a second species. (SyntenyView is not available for the C. elegans - C. briggsae comparison, as the C. briggsae genome sequence has not yet been assembled into chromosomes.) Links provide navigation between blocks and between species, and the display also shows genes in the current block together with their putative orthologues in the second species. 42

8 Comparative genomics and proteomics in Ensembl - examples a) From human gene to putative mouse orthologue and to Ensembl protein family Human CFTR GeneView Link to mouse homologue Link to FamilyView (human) 43

9 b) FamilyView Alignments in JalView Family members in home Family members in other species 44

10 c) SyntenyView Human chromosome 8 surrounded by mouse chromosomes with conserved synteny blocks. Human genes in the selected block, together with their putative mouse orthologues 45

11 Comparative genomics and proteins in Ensembl - exercises Main exercise: Explore a protein family in human, mouse and rat, identify putative orthologues, and explore regions of conserved synteny. 1. Find the GeneView page for human SNX5 (Ensembl gene), and scroll down to the first Transcript/Translation Summary. 2. Take the link to the associated Protein Family. How many human genes produce proteins in this family? Are they all known genes? Are there members of the same family in mouse, rat and zebrafish? How many? What about invertebrate species? Click on one of the rat peptides and go to rat ProteinView. From there take the link to the corresponding rat FamilyView. How many rat genes are part of this family? Find your way to mouse FamilyView, and follow the link to mouse Sorting Nexin 5 (GeneView). Have a look at the section Orthologue Predictions. Follow the link to human SNX5 (this takes you back to where you started). 3. Examine the genomic context of the human and mouse genes. From human SNX5 GeneView, follow the link View gene in genomic location to ContigView. Which chromosomal region is the human gene located in? Customise the display of ContigView. Select only Ensembl Trans., mouse (Mm) cons. and high cons. and rat (Rn) cons. and high cons.; deselect all other options. Have a look at the mouse and rat conserved regions in relation to the human Ensembl transcript. Note that there the correspondence with exons, but note also that this is not perfect. Zoom in to examine in more detail. The conserved regions are probably showing grouped (a red - shows to the left of the track label). Note that pointing to a region produces a pop-up with details of and a link to that region in the other species. Click on the red - to the left of the Mm cons. track: ContigView will reload, the red + replaces the - and the hits are now ungrouped. Point to a mouse match in this track, and take the link to DotterView. Note the dots on the diagonals where exons align. Zoom in to examine a smaller region. Go back to human ContigView, point to a mouse match, and this time take the link to Jump to Mus musculus. This takes you to the corresponding display in mouse ContigView. In which chromosomal region is the gene located? Zoom and/or customise the ContigView display to focus on the mouse Snx5 transcript, and turn on the rat and human matches tracks if necessary. Compare the amount of sequence showing as matched (the same threshold Blast score is used). 46

Comparing Genomes! Homologies and Families! Sequence Alignments!

Comparing Genomes! Homologies and Families! Sequence Alignments! Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species

More information

Multiple Alignment of Genomic Sequences

Multiple Alignment of Genomic Sequences Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part

More information

Browsing Genomic Information with Ensembl Plants

Browsing Genomic Information with Ensembl Plants Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

A Browser for Pig Genome Data

A Browser for Pig Genome Data A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics Orthologue Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny Phylogenetics and chromosomal synteny of the GATAs 1273 GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny CHUNJIANG HE, HANHUA CHENG* and RONGJIA ZHOU* Department

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Exercise 1 Exploring the human MYH9 gene (a) Go to the Ensembl homepage (http://www.ensembl.org). Select Search:

More information

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?

More information

Ensembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser.

Ensembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser. Paul Kersey [1] DNA & RNA Beginner 0.5 hour This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser. Learning objectives: Basic understanding of Ensembl Genomes

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors Genes 2011, 2, 449-501; doi:10.3390/genes2030449 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Xiaoyu Chen Martin Tompa Department of Computer Science and Engineering Department of Genome Sciences

More information

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada Multiple Whole Genome Alignments Without a Reference Organism Inna Dubchak 1,2, Alexander Poliakov 1, Andrey Kislyuk 3, Michael Brudno 4* 1 Genome Sciences Division, Lawrence Berkeley National Laboratory,

More information

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi) Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Promoters and Enhancers Systematic discovery of transcriptional regulatory motifs

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Student Handout Fruit Fly Ethomics & Genomics

Student Handout Fruit Fly Ethomics & Genomics Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic

More information

Annotation of Drosophila grimashawi Contig12

Annotation of Drosophila grimashawi Contig12 Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:

More information

Frazer et al. ago (Aparicio et al. 2002), conserved long-range sequence organization has not been reported for more distantly related species. Figure

Frazer et al. ago (Aparicio et al. 2002), conserved long-range sequence organization has not been reported for more distantly related species. Figure Review Cross-Species Sequence Comparisons: A Review of Methods and Available Resources Kelly A. Frazer, 1,6 Laura Elnitski, 2,3 Deanna M. Church, 4 Inna Dubchak, 5 and Ross C. Hardison 3 1 Perlegen Sciences,

More information

Genome Browsers And Genome Databases. Andy Conley Computational Genomics 2009

Genome Browsers And Genome Databases. Andy Conley Computational Genomics 2009 Genome Browsers And Genome Databases Andy Conley Computational What is a Genome Browser Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context

More information

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: A Zebrafish Example Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: An Example: Zebrafish The goal Solutions Annotation and Nomenclature: An

More information

MegAlign Pro Pairwise Alignment Tutorials

MegAlign Pro Pairwise Alignment Tutorials MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract

More information

Comparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008

Comparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008 Comparative genomics Lucy Skrabanek ICB, WMC 6 May 2008 What does it encompass? Genome conservation transfer knowledge gained from model organisms to non-model organisms Genome evolution understand how

More information

Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE

Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE 142 Part 5 Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE Golovnina K. *1, Blinov A. 1, Chang L.-S. 2 1 Institute of Cytology and

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Biased amino acid composition in warm-blooded animals

Biased amino acid composition in warm-blooded animals Biased amino acid composition in warm-blooded animals Guang-Zhong Wang and Martin J. Lercher Bioinformatics group, Heinrich-Heine-University, Düsseldorf, Germany Among eubacteria and archeabacteria, amino

More information

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human Leo Goodstadt *, Chris P. Ponting Medical Research Council Functional Genetics Unit, University of Oxford, Department

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Background How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13 Visit to BPRC Adres: Lange Kleiweg 161, 2288 GJ Rijswijk Utrecht CS à Den Haag CS 9:44 Spoor 9a, arrival 10:22 Den Haag CS à Delft 10:28 Spoor 1, arrival 10:44 10:48 Delft Voorzijde à Bushalte TNO/Lange

More information

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy Emms and Kelly Genome Biology (2015) 16:157 DOI 10.1186/s13059-015-0721-2 SOFTWARE OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

More information

Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/312/5780/1653/dc1 Supporting Online Material for The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene Laurent Duret,* Corinne Chureau,

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Synonymous Codon Substitution Matrices

Synonymous Codon Substitution Matrices Synonymous Codon Substitution Matrices Adrian Schneider, Gaston H. Gonnet, and Gina M. Cannarozzi Computational Biology Research Group, Institute for Computational Science, ETH Zürich, Universitätstrasse

More information

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Emily Blanton Phylogeny Lab Report May 2009

Emily Blanton Phylogeny Lab Report May 2009 Introduction It is suggested through scientific research that all living organisms are connected- that we all share a common ancestor and that, through time, we have all evolved from the same starting

More information

Using Bioinformatics to Study Evolutionary Relationships Instructions

Using Bioinformatics to Study Evolutionary Relationships Instructions 3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing

More information

Alignment Strategies for Large Scale Genome Alignments

Alignment Strategies for Large Scale Genome Alignments Alignment Strategies for Large Scale Genome Alignments CSHL Computational Genomics 9 November 2003 Algorithms for Biological Sequence Comparison algorithm value scoring gap time calculated matrix penalty

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and

More information

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition Chapter for Human Genetics - Principles and Approaches - 4 th Edition Editors: Friedrich Vogel, Arno Motulsky, Stylianos Antonarakis, and Michael Speicher Comparative Genomics Ross C. Hardison Affiliations:

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

DUPLICATED RNA GENES IN TELEOST FISH GENOMES DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

Chapter 18 Active Reading Guide Genomes and Their Evolution

Chapter 18 Active Reading Guide Genomes and Their Evolution Name: AP Biology Mr. Croft Chapter 18 Active Reading Guide Genomes and Their Evolution Most AP Biology teachers think this chapter involves an advanced topic. The questions posed here will help you understand

More information

training workshop 2015

training workshop 2015 TransPLANT user training workshop 2015 Slides: http://tinyurl.com/transplant2015 Workshop on variation data EMBL-EBI Hinxton-UK 2nd July 2015 Ensembl Genomes Team Notes: This workshop is based on Ensembl

More information

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids Database searches 1 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids 2 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids (cntd) 3 DNA and protein databases SWISS-PROT

More information

Introduction to Bioinformatics Integrated Science, 11/9/05

Introduction to Bioinformatics Integrated Science, 11/9/05 1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction

More information

Sequences, Structures, and Gene Regulatory Networks

Sequences, Structures, and Gene Regulatory Networks Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Tandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb

Tandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb Overview Fosmid XAAA112 consists of 34,783 nucleotides. Blat results indicate that this fosmid has significant identity to the 2R chromosome of D.melanogaster. Evidence suggests that fosmid XAAA112 contains

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search

More information

3/8/ Complex adaptations. 2. often a novel trait

3/8/ Complex adaptations. 2. often a novel trait Chapter 10 Adaptation: from genes to traits p. 302 10.1 Cascades of Genes (p. 304) 1. Complex adaptations A. Coexpressed traits selected for a common function, 2. often a novel trait A. not inherited from

More information

Evolutionary dynamics of conserved. non-coding DNA elements: Big bang. or gradual accretion? Sujai Kumar

Evolutionary dynamics of conserved. non-coding DNA elements: Big bang. or gradual accretion? Sujai Kumar Evolutionary dynamics of conserved non-coding DNA elements: Big bang or gradual accretion? Sujai Kumar Master of Science School of Informatics University of Edinburgh 2007 Abstract Background Previous

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Handling Rearrangements in DNA Sequence Alignment

Handling Rearrangements in DNA Sequence Alignment Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome

More information

Comparative Genomics. Primer. Ross C. Hardison

Comparative Genomics. Primer. Ross C. Hardison Primer Comparative Genomics Ross C. Hardison A complete genome sequence of an organism can be considered to be the ultimate genetic map, in the sense that the heritable characteristics are encoded within

More information

Genome Sequencing & DNA Sequence Analysis

Genome Sequencing & DNA Sequence Analysis 7.91 / 7.36 / BE.490 Lecture #1 Feb. 24, 2004 Genome Sequencing & DNA Sequence Analysis Chris Burge What is a Genome? A genome is NOT a bag of proteins What s in the Human Genome? Outline of Unit II: DNA/RNA

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity

Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity Melvin Zhang Department of Computer Science National University of Singapore 13 Computing Drive, Singapore

More information