Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
|
|
- Archibald Beasley
- 5 years ago
- Views:
Transcription
1 Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat, chicken, puffer fish, zebrafish, Tetraodon Arthropods: the mosquito Anopheles gambiae and Drosophila melanogaster, honeybee Nematodes: Caenorhabditis elegans, C. briggsae Reach the home pages for each species via the generic Ensembl home page ( or bookmark a species homepage with a URL like for those species for which we have an assembly but still being annotated, we have a Preview browser (Pre!) where you can have a look at new assemblies or new species: Rat home page Additional animal genomes will be incorporated in the future, with the emphasis on those important for biomedical research or for evolutionary comparisons. At the time of writing, sequencing of the following genomes is underway, and some of these will appear in Ensembl during 2005: 36
2 Vertebrates: Rhesus macaque, opossum and Xenopus Arthropods: Aedes mosquito A key element of any genome browser is the display of genes in their chromosomal locations. For most species, Ensembl runs an automated sequence annotation pipeline and gene build to provide annotation including genome-wide gene and protein sets. There are different challenges associated with building a comprehensive gene set in different organisms. The Ensembl gene building process is discussed further later in the workshop. For species where the research community is generating comprehensive manual annotation, Ensembl incorporates those gene and protein sets instead of, or in addition too, its own automated annotation. Thus, manual annotation is displayed for some human chromosomes, alongside the Ensembl predictions, and the manually curated genome-wide gene sets for D. melanogaster and C. elegans are used in place of an Ensembl set. The additional types of annotation available will vary to some extent between species. But because annotation is stored and displayed in a consistent way for all species, your experience working with one species transfers to new species, and comparisons of genomic sequence and homologous genes and proteins between species are facilitated. Orthologues One kind of comparative analysis focuses on genes and proteins, and attempts to identify the orthologue (the same gene) in different genomes. Apart from the value of such data in evolutionary studies, it is very useful to be able to identify the equivalent of important genes in one organism (for example, human disease genes) in other organisms that provide experimentally tractable models for studying that gene. The classic model animals are now all represented in Ensembl (Drosophila, C. elegans, mouse) as well as zebrafish, and it is hoped to include other useful models as they become available. The automated identification of orthologues is made more difficult by the existence of families of closely related genes. Under such circumstances, Ensembl may show more than one potential orthologue, and the results need to be treated with caution. Of course there may really be more than one orthologue when a lineage has generated additional family members by duplication after the divergence of the two organisms under consideration. See the section below on synteny blocks for one way that Ensembl can help you to assess the orthologue pairs. In Ensembl, orthologues are identified starting with comparisons at the protein level. Allversus all BLASTP+SW (Smith-Waterman algorithm) is first used to identify those protein pairs that are reciprocal best hits between two sets of proteins that represent every gene in the two organisms. Additional putative orthologues are then sought using synteny information and the reciprocal best hits as RHS for Reciprocal Hit supported by Synteny. Where two homologous proteins are encoded by genes each located within 1 Mb of a pair of BRH, they are good candidates for being an additional orthologous pair. Currently we divide these BRH into UBRH Unique Best Reciprocal Hit and MBRH Multiple Best Reciprocal Hit the latter when have multiple but identical best hits, as it can happen if there is perfect protein sequence duplication of translated genes within a species. The same approach permits the 37
3 identification of adjacent family members that may be recently duplicated lineage-specific paralogues. Ensembl shows the information about potential orthologues on each GeneView page (and also in SyntenyView displays of synteny blocks, where these are available). The procedure has been applied to all pairs of vertebrates within Ensembl, to the two nematodes, and to the two insects Parts of a GeneView page, showing putative orthologues EnsMart lets you access and use this orthology information in a variety of ways. A set of genes can be selected that have identified orthologues in another species, and further restricted to those that share conserved upstream sequence. For any set of genes, output can include the details of any orthologues in other species and the locations of conserved upstream sequence. Protein families What if the orthologue identification procedure fails to find a pair? And what if you are interested in looking at a wider set of potential orthologues and paralogues between two species or within a wider range of organisms? One option is to look for proteins that share particular domains. Ensembl runs domain prediction programs on all its protein sets, and provides access to this information in ProteinView (for individual proteins), and in DomainView (showing all the genes in a species that share a particular InterPro domain). However, some domains are shared by a wide range of proteins that have very different functions. Ensembl s protein families are an attempt to identify clusters of functionally related proteins, among which one might expect to find most orthologues and paralogues. The family database is generated by running the Tribe-MCL sequence clustering algorithm on a set of peptides consisting of the Ensembl predictions for each Ensembl species, together with all metazoan sequences from Swiss-Prot and SPTrEMBL. On this set of peptides, an all-against-all BLASTP is run to establish similarities. Using these similarities, clusters can be established using the MCL algorithm. [For more detail of the underlying 38
4 methods, see Enright, A.J. et al. (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res. 30, ]. The efficiency of this approach permits clustering to be done within a realistic time despite the very large numbers of proteins involved (for a recent release, around half a million protein sequences were loaded and processed in less than 24 hours, using 400 CPUs) Domain and Family information on a ProteinView page Both family and domain information are shown in Ensembl GeneView and ProteinView pages, and DomainView and FamilyView pages make it easy to examine all the identified genes and proteins within a species. For families, you can also see the family members in other Ensembl species and the UniProt entries (from all metazoans) that fall in the same cluster. In the future, we plan to introduce a similar multi-species display for protein domains. EnsMart provides the means to rapidly and easily download sets of transcript or protein sequence with particular domains or from particular families, which can be very useful as starting points for alignment and phylogenetic analysis. In addition, the Ensembl database stores pre-calculated protein alignments for all members of a family, and these alignments can be displayed in JalView. 39
5 Part of a family alignment in JalView Whole genome DNA-DNA alignments The alignment of the whole DNA sequence from two organisms is computationally demanding, and the algorithms to carry it out are under active development [see for example Ureta-Vidal, A. et al. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes Nat. Rev. Genet. 4, ]. Such data are of great interest both in studies of the mechanisms of molecular evolution and in attempts to identify conserved functional sequences such as novel genes and regulatory regions. Whole genome alignments become increasingly difficult as the evolutionary distance between two organisms increases. At present, Ensembl displays pair-wise alignments within mammals (human, mouse and rat), and within nematodes (C. elegans and C. briggsae); within these species groups, separation probably occurred <100 Mya. Ensembl is experimenting with different procedures to do the alignments: at present conserved regions are by identified either by BLASTz (data obtained from UCSC Genome Bioinformatics group) or by Phusion/BLASTn (used for C. elegans and C. briggsae); this firstly runs the two genomes through Phusion, which takes unique 17mers from one genome and compares them to the second genome, creating clusters of contigs from both genomes; and then comparing contigs within clusters using Washington University's version of BLASTn (without repeat masking). The output of wublastn is postprocessed to keep only high-scoring pairs and to identify diagonals of blast alignments. Regions of these alignments that represent highly conserved regions are then selected using a filtering method devised by Jim Kent [see Schwartz, S. et al. (2003) Human-Mouse Alignments with BLASTZ Genome Res. 13, ]. A third method, translated BLAT is used to compare genomes from more evolutionarily distant species, at the amino acid level. Thus regions of similarity will be biased towards those that code for proteins, although highly conserved non-coding regions might be detected as well. You can show a number of tracks (e.g. human vs. chimpanzee, human vs. rat, human vs. mouse, human vs. chicken, human vs. Fugu, human vs. zebrafish, human vs. Tetraodon, etc) displaying the conservation within the cluster (vertebrates, arthropods and nematodes) clustered d/or Caenorhabditis elegans vs. C. briggsae) from the Compara menu in ContigView 40
6 for each comparison, showing two levels of conservation (labelled cons for BLASTz or Phusion/BLAST comparisons and high cons for highly conserved). Links make it easy to navigate back and forth to see details of the region in the two genomes and to download the sequence of regions of interest. Part of a human ContigView detailed display panel, showing whole genome alignments with mouse and rat. Further access to the conserved sequences at Ensembl is provided via DotterView displays of local alignments, while EnsMart provides a route for identifying specifically those highly conserved sequences that are located in regions upstream of pairs of orthologous genes. DotterView display showing two homologous exons Synteny blocks The identifications of segments of the genome where the order of particular genes is conserved between two species ( synteny blocks ) is of interest not only for studying the evolution of chromosome structure, but also for helping to predict and identify pairs of genes between species that are (or are not!) orthologues. Where candidate orthologues in two 41
7 species are found to be located within well-conserved synteny blocks, you can have more confidence that the pair have been correctly labelled as orthologues. Ensembl finds synteny blocks by grouping the conserved regions identified from genomic alignments. To be grouped, matches must represent the same relative orientation of chromosome sequences and must be separated by less than 100 kb. Groups that make up a block of less than 100 kb are discarded (parameters may be varied for different species). The approach requires that the species are close enough for genomic alignments to be attempted. Part of a human CytoView display of human chromosome 11, showing synteny blocks conserved on chimpanzee, mouse, rat and chicken. Ensembl shows the synteny blocks in ContigView (overview panel) and CytoView displays for mammalian and nematode comparisons. In CytoView only, the blocks provide links to display that region in the other species. In addition, SyntenyView shows all blocks on a whole chromosome, related to the conserved blocks in a second species. (SyntenyView is not available for the C. elegans - C. briggsae comparison, as the C. briggsae genome sequence has not yet been assembled into chromosomes.) Links provide navigation between blocks and between species, and the display also shows genes in the current block together with their putative orthologues in the second species. 42
8 Comparative genomics and proteomics in Ensembl - examples a) From human gene to putative mouse orthologue and to Ensembl protein family Human CFTR GeneView Link to mouse homologue Link to FamilyView (human) 43
9 b) FamilyView Alignments in JalView Family members in home Family members in other species 44
10 c) SyntenyView Human chromosome 8 surrounded by mouse chromosomes with conserved synteny blocks. Human genes in the selected block, together with their putative mouse orthologues 45
11 Comparative genomics and proteins in Ensembl - exercises Main exercise: Explore a protein family in human, mouse and rat, identify putative orthologues, and explore regions of conserved synteny. 1. Find the GeneView page for human SNX5 (Ensembl gene), and scroll down to the first Transcript/Translation Summary. 2. Take the link to the associated Protein Family. How many human genes produce proteins in this family? Are they all known genes? Are there members of the same family in mouse, rat and zebrafish? How many? What about invertebrate species? Click on one of the rat peptides and go to rat ProteinView. From there take the link to the corresponding rat FamilyView. How many rat genes are part of this family? Find your way to mouse FamilyView, and follow the link to mouse Sorting Nexin 5 (GeneView). Have a look at the section Orthologue Predictions. Follow the link to human SNX5 (this takes you back to where you started). 3. Examine the genomic context of the human and mouse genes. From human SNX5 GeneView, follow the link View gene in genomic location to ContigView. Which chromosomal region is the human gene located in? Customise the display of ContigView. Select only Ensembl Trans., mouse (Mm) cons. and high cons. and rat (Rn) cons. and high cons.; deselect all other options. Have a look at the mouse and rat conserved regions in relation to the human Ensembl transcript. Note that there the correspondence with exons, but note also that this is not perfect. Zoom in to examine in more detail. The conserved regions are probably showing grouped (a red - shows to the left of the track label). Note that pointing to a region produces a pop-up with details of and a link to that region in the other species. Click on the red - to the left of the Mm cons. track: ContigView will reload, the red + replaces the - and the hits are now ungrouped. Point to a mouse match in this track, and take the link to DotterView. Note the dots on the diagonals where exons align. Zoom in to examine a smaller region. Go back to human ContigView, point to a mouse match, and this time take the link to Jump to Mus musculus. This takes you to the corresponding display in mouse ContigView. In which chromosomal region is the gene located? Zoom and/or customise the ContigView display to focus on the mouse Snx5 transcript, and turn on the rat and human matches tracks if necessary. Compare the amount of sequence showing as matched (the same threshold Blast score is used). 46
Comparing Genomes! Homologies and Families! Sequence Alignments!
Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species
More informationMultiple Alignment of Genomic Sequences
Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part
More informationBrowsing Genomic Information with Ensembl Plants
Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial
More informationRELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES
Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that
More informationA Browser for Pig Genome Data
A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationHomolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics
Orthologue Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs
More informationGenomes and Their Evolution
Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationOrthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under
More informationHands-On Nine The PAX6 Gene and Protein
Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationGATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny
Phylogenetics and chromosomal synteny of the GATAs 1273 GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny CHUNJIANG HE, HANHUA CHENG* and RONGJIA ZHOU* Department
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationEnsembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI
Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Exercise 1 Exploring the human MYH9 gene (a) Go to the Ensembl homepage (http://www.ensembl.org). Select Search:
More informationUSING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES
USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?
More informationEnsembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser.
Paul Kersey [1] DNA & RNA Beginner 0.5 hour This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser. Learning objectives: Basic understanding of Ensembl Genomes
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationChromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre
PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008
More informationReassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
Genes 2011, 2, 449-501; doi:10.3390/genes2030449 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationSynteny Portal Documentation
Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationSupplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences
Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Xiaoyu Chen Martin Tompa Department of Computer Science and Engineering Department of Genome Sciences
More informationAnalysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada
Multiple Whole Genome Alignments Without a Reference Organism Inna Dubchak 1,2, Alexander Poliakov 1, Andrey Kislyuk 3, Michael Brudno 4* 1 Genome Sciences Division, Lawrence Berkeley National Laboratory,
More informationPhylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)
Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to
More informationSupplemental Figure 1.
Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome
More informationComparative Bioinformatics Midterm II Fall 2004
Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationTranscription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.
Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Promoters and Enhancers Systematic discovery of transcriptional regulatory motifs
More informationInvestigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationStudent Handout Fruit Fly Ethomics & Genomics
Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic
More informationAnnotation of Drosophila grimashawi Contig12
Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:
More informationFrazer et al. ago (Aparicio et al. 2002), conserved long-range sequence organization has not been reported for more distantly related species. Figure
Review Cross-Species Sequence Comparisons: A Review of Methods and Available Resources Kelly A. Frazer, 1,6 Laura Elnitski, 2,3 Deanna M. Church, 4 Inna Dubchak, 5 and Ross C. Hardison 3 1 Perlegen Sciences,
More informationGenome Browsers And Genome Databases. Andy Conley Computational Genomics 2009
Genome Browsers And Genome Databases Andy Conley Computational What is a Genome Browser Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context
More informationAnnotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait
Annotation and Nomenclature: A Zebrafish Example Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: An Example: Zebrafish The goal Solutions Annotation and Nomenclature: An
More informationMegAlign Pro Pairwise Alignment Tutorials
MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract
More informationComparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008
Comparative genomics Lucy Skrabanek ICB, WMC 6 May 2008 What does it encompass? Genome conservation transfer knowledge gained from model organisms to non-model organisms Genome evolution understand how
More informationChapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE
142 Part 5 Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE Golovnina K. *1, Blinov A. 1, Chang L.-S. 2 1 Institute of Cytology and
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationBiased amino acid composition in warm-blooded animals
Biased amino acid composition in warm-blooded animals Guang-Zhong Wang and Martin J. Lercher Bioinformatics group, Heinrich-Heine-University, Düsseldorf, Germany Among eubacteria and archeabacteria, amino
More informationPhylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human
Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human Leo Goodstadt *, Chris P. Ponting Medical Research Council Functional Genetics Unit, University of Oxford, Department
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationBioinformatics: Investigating Molecular/Biochemical Evidence for Evolution
Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Background How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare
More informationOrthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com
More informationVisit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13
Visit to BPRC Adres: Lange Kleiweg 161, 2288 GJ Rijswijk Utrecht CS à Den Haag CS 9:44 Spoor 9a, arrival 10:22 Den Haag CS à Delft 10:28 Spoor 1, arrival 10:44 10:48 Delft Voorzijde à Bushalte TNO/Lange
More informationOrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy
Emms and Kelly Genome Biology (2015) 16:157 DOI 10.1186/s13059-015-0721-2 SOFTWARE OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy
More informationBrowsing Genes and Genomes with Ensembl
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/312/5780/1653/dc1 Supporting Online Material for The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene Laurent Duret,* Corinne Chureau,
More informationElements of Bioinformatics 14F01 TP5 -Phylogenetic analysis
Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila
More informationSynonymous Codon Substitution Matrices
Synonymous Codon Substitution Matrices Adrian Schneider, Gaston H. Gonnet, and Gina M. Cannarozzi Computational Biology Research Group, Institute for Computational Science, ETH Zürich, Universitätstrasse
More informationA bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family
A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer
More informationBMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)
BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationEmily Blanton Phylogeny Lab Report May 2009
Introduction It is suggested through scientific research that all living organisms are connected- that we all share a common ancestor and that, through time, we have all evolved from the same starting
More informationUsing Bioinformatics to Study Evolutionary Relationships Instructions
3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing
More informationAlignment Strategies for Large Scale Genome Alignments
Alignment Strategies for Large Scale Genome Alignments CSHL Computational Genomics 9 November 2003 Algorithms for Biological Sequence Comparison algorithm value scoring gap time calculated matrix penalty
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More information1 ATGGGTCTC 2 ATGAGTCTC
We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that
More informationSmall RNA in rice genome
Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and
More informationG4120: Introduction to Computational Biology
ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and
More informationComparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition
Chapter for Human Genetics - Principles and Approaches - 4 th Edition Editors: Friedrich Vogel, Arno Motulsky, Stylianos Antonarakis, and Michael Speicher Comparative Genomics Ross C. Hardison Affiliations:
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationDUPLICATED RNA GENES IN TELEOST FISH GENOMES
DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary
More informationSupplementary text for the section Interactions conserved across species: can one select the conserved interactions?
1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationProcedure to Create NCBI KOGS
Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based
More informationChapter 18 Active Reading Guide Genomes and Their Evolution
Name: AP Biology Mr. Croft Chapter 18 Active Reading Guide Genomes and Their Evolution Most AP Biology teachers think this chapter involves an advanced topic. The questions posed here will help you understand
More informationtraining workshop 2015
TransPLANT user training workshop 2015 Slides: http://tinyurl.com/transplant2015 Workshop on variation data EMBL-EBI Hinxton-UK 2nd July 2015 Ensembl Genomes Team Notes: This workshop is based on Ensembl
More informationDNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids
Database searches 1 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids 2 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids (cntd) 3 DNA and protein databases SWISS-PROT
More informationIntroduction to Bioinformatics Integrated Science, 11/9/05
1 Introduction to Bioinformatics Integrated Science, 11/9/05 Morris Levy Biological Sciences Research: Evolutionary Ecology, Plant- Fungal Pathogen Interactions Coordinator: BIOL 495S/CS490B/STAT490B Introduction
More informationSequences, Structures, and Gene Regulatory Networks
Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align
More informationIntroduction to protein alignments
Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationTandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb
Overview Fosmid XAAA112 consists of 34,783 nucleotides. Blat results indicate that this fosmid has significant identity to the 2R chromosome of D.melanogaster. Evidence suggests that fosmid XAAA112 contains
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search
More information3/8/ Complex adaptations. 2. often a novel trait
Chapter 10 Adaptation: from genes to traits p. 302 10.1 Cascades of Genes (p. 304) 1. Complex adaptations A. Coexpressed traits selected for a common function, 2. often a novel trait A. not inherited from
More informationEvolutionary dynamics of conserved. non-coding DNA elements: Big bang. or gradual accretion? Sujai Kumar
Evolutionary dynamics of conserved non-coding DNA elements: Big bang or gradual accretion? Sujai Kumar Master of Science School of Informatics University of Edinburgh 2007 Abstract Background Previous
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationHandling Rearrangements in DNA Sequence Alignment
Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome
More informationComparative Genomics. Primer. Ross C. Hardison
Primer Comparative Genomics Ross C. Hardison A complete genome sequence of an organism can be considered to be the ultimate genetic map, in the sense that the heritable characteristics are encoded within
More informationGenome Sequencing & DNA Sequence Analysis
7.91 / 7.36 / BE.490 Lecture #1 Feb. 24, 2004 Genome Sequencing & DNA Sequence Analysis Chris Burge What is a Genome? A genome is NOT a bag of proteins What s in the Human Genome? Outline of Unit II: DNA/RNA
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationIdentifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity
Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity Melvin Zhang Department of Computer Science National University of Singapore 13 Computing Drive, Singapore
More information