Comparing Genomes! Homologies and Families! Sequence Alignments!

Similar documents
Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Quantitative and qualitative analyses. of in-paralogs

SUPPLEMENTARY INFORMATION

Supplemental Figure 1.

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Supporting Online Material for

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Expanded View Figures

Biased amino acid composition in warm-blooded animals

Combination of X-ray crystallography, SAXS and DEER to obtain the structure of the FnIII-3,4 domains of integrin α6β4

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

Sheet1. Page 1. protein

Graph Alignment and Biological Networks

Comparative Genomics II

Supplementary Material

Example of Function Prediction

Exploring evolution of brain genes involved in microcephaly through phylogeny and synteny analysis

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Ch. 9 Multiple Sequence Alignment (MSA)

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

Browsing Genes and Genomes with Ensembl

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

BIOINFORMATICS LAB AP BIOLOGY

Studies of the Growth Hormone-Prolactin Gene Family and their Receptor Gene Family in Relation to Vertebrate Tetraploidizations

BLAST. Varieties of BLAST

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

Hands-On Nine The PAX6 Gene and Protein

Vertebrate genome sequencing: building a backbone for comparative genomics

Inparanoid: a comprehensive database of eukaryotic orthologs

Comparative Bioinformatics Midterm II Fall 2004

Synonymous Codon Substitution Matrices

Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors

BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan. ECS 254A; Phone: x3748

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

Bioinformatics and Genomics Program, Center for Genomic Regulation, Doctor Aiguader, 88, Barcelona, Spain.

TRANSPOSABLE ELEMENTS DYNAMICS IN TAXA WITH DIFFERENT REPRODUCTIVE STRATEGIES OR SPECIATION RATE

Comparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Phylogenetic analysis of uroporphyrinogen III synthase (UROS) gene

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

Multiple Sequence Alignments

Procedure to Create NCBI KOGS

Exceptionally high cumulative percentage of NUMTs originating from linear mitochondrial DNA molecules in the Hydra magnipapillata genome

Rapid birth-and-death evolution of the xenobiotic metabolizing NAT gene family in vertebrates with evidence of adaptive selection

MegAlign Pro Pairwise Alignment Tutorials

Evolution by duplication

Biol478/ August

Emergence of Xin Demarcates a Key Innovation in Heart Evolution

Multiple Whole Genome Alignment

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Application of new distance matrix to phylogenetic tree construction

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Heuristic Methods. Heuristic methods for alignment Sequence databases Multiple alignment Gene and protein prediction

Mammalogy: the study of the evolution, ecology, physiology, and anatomy of members of the Class Mammalia (Chordata, Vertebrata).

EVOLUTIONARY DISTANCES

CONSTRUCTION OF PHYLOGENETIC TREE FROM MULTIPLE GENE TREES USING PRINCIPAL COMPONENT ANALYSIS

Gene mention normalization in full texts using GNAT and LINNAEUS

An Evolutionary Trend Discovery Algorithm Based on Cubic Spline Interpolation

Introduction to Bioinformatics

Session 5: Phylogenomics

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Bioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis. Alanna Lewis

Genome Sequencing & DNA Sequence Analysis

mosaic: Supplementary material

SUPPLEMENTARY INFORMATION

Browsing Genomic Information with Ensembl Plants

Computational Structural Bioinformatics

and both play a significant role in the rise of variable size gene families originating

28-Way vertebrate alignment and conservation track in the UCSC Genome Browser

Genomes and Their Evolution

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Phylogenetics a primer.

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada

Introduction to protein alignments

A Browser for Pig Genome Data

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI

Protein Coding Regions of Eukaryotes

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

CGS 5991 (2 Credits) Bioinformatics Tools

A novel laminin β gene BmLanB1-w regulates wing-specific cell adhesion in silkworm, Bombyx mori


Duplicated Gene Evolution Following Whole-Genome Duplication in Teleost Fish

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex

Comparative genomics: Overview & Tools + MUMmer algorithm

Inferring Phylogenies from RAD Sequence Data

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES.

Transcription:

Comparing Genomes! Homologies and Families! Sequence Alignments!

Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species at the genome level! The function of human genes and other regions may be revealed by studying their counterparts in lower organisms! Helps identify both coding and non-coding genes and regulatory elements!

Deletion Mutation ACTGACATGTACCA Sequence edits AC----CATGCACCA Rearrangements Inversion Translocation Duplication

Comparative genomics predicts one long transcript.

Uses all the species! Prediction pipeline: Begins with!!blast and sequence clustering! Compares gene relationships to!species relationships!

Proteins (all species) ---> BLAST ---> group similar proteins Alignments Phylogenetic Trees Reconcile Gene & Species Trees Extract ortholog & Paralog relationships

(1) Load the longest translation of each gene from all species used in Ensembl." (2) Run WUBLASTp+SW of every gene against every other (both self and non-self species) in a genome-wide manner." (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values." (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family." (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE." (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage." (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree (TreeBeSt)." (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types."

Anopheles gambiae Aedes aegypti Drosophila melanogaster Dasypus novemcinctus Loxodonta africana Echinops telfairi Tupaia belangeri Homo sapiens Pan troglodytes Macaca mulatta Otolemur garnettii Mus musculus Rattus norvegicus Spermophilus tridecemlineatus Cavia porcellus Oryctolagus cuniculus Erinaceus europaeus Myotis lucifugus Canis familiaris Felis catus Bos taurus Monodelphis domestica Ornithorhynchus anatinus Gallus gallus Xenopus tropicalis Gasterosteus aculeatus Oryzias latipes Takifugu rubripes Tetraodon nigroviridis Danio rerio Ciona intestinalis Ciona savignyi Caenorhabditis elegans Saccharomyces cerevisiae

GeneView page! GeneTreeView!

Orthologs : any gene pairwise relation where the ancestor node is a speciation event! Paralogs : any gene pairwise relation where the ancestor node is a duplication event!

ortholog_one2one" ortholog_one2many" ortholog_many2many" apparent_ortholog_one2one" within_species_paralog" between_species_paralog"

What is 1 to 1? What is 1 to many?

How: Cluster proteins for every isoform!! (transcript) in every species.! Why: Predict a function for novel!!! genes/proteins!!! Understand gene relationships!

More than 1,800,000 proteins clustered:! All Ensembl protein predictions from all species supported! 895,070 protein predictions! All metazoan (animal) proteins in UniProt:! 96,030 UniProtKB/Swiss-Prot! 892,0208 UniProtKB/TrEMBL!

BLASTP all-versus-all comparison! Markov clustering! For each cluster:! Calculation of multiple sequence alignments with ClustalW! Assignment of a consensus description!

Link to FamilyView

JalView multiple alignments Ensembl family members within human! Ensembl family members in other species!

Comparing Genomes! Homologies and Families! Sequence alignments!

To identify homologous regions! To spot trouble gene predictions! Conserved regions could be functional! To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)!

Should find all highly similar regions between two sequences! Should allow for segments without similarity, rearrangements etc.! Issues! Heavy process! Scalability, as more and more genomes are sequenced! Time constraint!

Enredo!!( regions Defines orthology map (co-linear Supports segmental duplications! Pecan! Consistency based multiple aligner! Optimized to cope with long DNA sequences! Ortheus! Ancestral sequences reconstructor! Inferring the history of insertion and deletions!

Use all coding exons! Get sets of best reciprocal hits! Create orthology maps! Build multiple global alignments!

In the Detailed View Panel:!

Choose Compara pairwise alignments!

Anchors 500.000 anchors for mammals --- more than 1 anchor per 10Kb Supports segmental duplications!! Covers 90% of the human protein coding genes ( Hsap-Mmus-Rnor-Cfam-Btau )

Human chromosome Orthologues Mouse chromosomes Mouse chromosomes

Syntenic blocks

View Homology in pages such as GeneView, ProtView, SyntenyView, GeneTreeView, or BioMart! View Protein Family information in FamilyView! View Alignments in ContigView, GeneSeqAlign View, through BioMart!

BIOMART