Orthologs Detection and Applications

Similar documents
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Homology and Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Computational approaches for functional genomics

Example of Function Prediction

Introduction to Bioinformatics

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Genomes and Their Evolution

Computational methods for predicting protein-protein interactions

Comparative Bioinformatics Midterm II Fall 2004

Session 5: Phylogenomics

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

MiGA: The Microbial Genome Atlas

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Introduction to Evolutionary Concepts

BLAST. Varieties of BLAST

Outline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University

SUPPLEMENTARY INFORMATION

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Bioinformatics: Network Analysis

Cladistics and Bioinformatics Questions 2013

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Comparative genomics: Overview & Tools + MUMmer algorithm

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Computational Structural Bioinformatics

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

16.4 Evidence of Evolution

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Protein function prediction based on sequence analysis

8/23/2014. Phylogeny and the Tree of Life

Evidence of evolution

BL1102 Essay. The Cells Behind The Cells

Basic Local Alignment Search Tool

BIOLOGY Grades Summer Units: 10 high school credits UC Requirement Category: d. General Description:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Chapter 27: Evolutionary Genetics

Natural Selection. Factors for Natural Selection: 1. Variation 2. Heritability 3. Overproduction (Overpopulation) 4. Reproductive Advantage

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

EECS730: Introduction to Bioinformatics

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION

Chapter 16: Evolutionary Theory

History of Biological Diversity. Evolution: Darwin s travel

Bioinformatics Exercises

2 Genome evolution: gene fusion versus gene fission

Phylogenomics. Gene History, Genome History and Organismal Phylogeny (genealogy of cellular life)

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Comparative Genomics II

WTHS Biology Keystone Exams

Exploring Evolution & Bioinformatics

Revision Based on Chapter 19 Grade 11

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

NGSS UNIT OVERVIEW EVOLUTION

Sequences, Structures, and Gene Regulatory Networks

Review sheet for Mendelian genetics through human evolution. What organism did Mendel study? What characteristics of this organism did he examine?

Lowndes County Biology II Pacing Guide Approximate

Introduction to Bioinformatics Introduction to Bioinformatics

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

2/17/17. B. Four scientists important in development of evolution theory

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Tools and Algorithms in Bioinformatics


Darwin's theory of natural selection, its rivals, and cells. Week 3 (finish ch 2 and start ch 3)

Chapter 19. History of Life on Earth

Outline Sequence-comparison methods. Buzzzzzzzz. MB330 - The class of 2008

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

What is Phylogenetics

Evolution. Species Changing over time

Fitness constraints on horizontal gene transfer

Comparing Genomes! Homologies and Families! Sequence Alignments!

3/8/ Complex adaptations. 2. often a novel trait

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Campbell Essential Biology, 4/e (Simon/Reece/Dickey)

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

BME 5742 Biosystems Modeling and Control

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Protein Families. João C. Setubal University of São Paulo Agosto /23/2012 J. C. Setubal

Quantitative Genetics & Evolutionary Genetics

Press Release BACTERIA'S KEY INNOVATION HELPS UNDERSTAND EVOLUTION

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

In 1831 people thought:

V14 extreme pathways

CSCE555 Bioinformatics. Protein Function Annotation

Computational Biology: Basics & Interesting Problems

Introduction to protein alignments

Introduction to Bioinformatics Integrated Science, 11/9/05

Campbell Essential Biology, 5e (Simon/Yeh) Chapter 1 Introduction: Biology Today. Multiple-Choice Questions

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Transcription:

Orthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25

Table of contents 1 Background on homology 2 Proteinortho 3 Domain wide commons 4 Annotation pipeline 5 References Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 2 / 25

Definitions Homologous genes have derived from a common ancestor Orthology evolved by speciation thought to have a similar function Paralogy homologous genes within the same species thought to have a related function (neo-/subfunctionalization) out-paralogs arose form a duplication preceding a speciation in-paralogs evolved by duplication subsequent to speciation Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 3 / 25

Example Figure: Illustration of relationships: Three species with orthologs, xeno-, in- and out-paralogs Adapted from [1] Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 4 / 25

Problems Interpretation original definition of homology (1843): the same organ under every variety of form and function [2] Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25

Problems Interpretation original definition of homology (1843): the same organ under every variety of form and function [2] still a very good quantitative indication but neither essential nor sufficient Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25

Problems Interpretation original definition of homology (1843): the same organ under every variety of form and function [2] still a very good quantitative indication but neither essential nor sufficient Homology of two proteins is not equivalent with a common function, sequence nor structure! Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 5 / 25

Problems Relative definition in-/out-paralog definition only in subjection to a certain species greatly dependent on available data no absolute view Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 6 / 25

Problems Figure: Illustration of relationships: Complete view needed Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 7 / 25

Problems Figure: Illustration of relationships: Complete view needed Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 8 / 25

Problems Information benefit duplications are known to be a major source of innovation in evolution proteins are homologs per definition, if they have a common ancestor irrespective of their actual similarity or function most proteins are anciently related but have evolved far Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 9 / 25

Problems Figure: Multiple gene duplications: All are homologs per definition but smaller groups may be more of use Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 10 / 25

Problems Information benefit duplications are known to be a major source of innovation in evolution proteins are homologs per definition, if they have a common ancestor irrespective of their actual similarity or function most proteins are anciently related but have evolved far Up to which point is the homology information useful? Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 11 / 25

Conclusion Proteinortho approach arose from the same ancestor + similar function similar sequence should return a useful subset of homologs (isofunctional aimed) reciprocal best blast(s) Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 12 / 25

Reciprocal best blast(s) for homologs detection Figure: Homology detection using blast Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 13 / 25

Proteinortho Features orthologs and paralogs assignment for proteins/protein coding genes designed for large-scale application behaves nicely in memory consumption capable of distributed computing Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 14 / 25

Workflow Figure: Proteinortho workflow: 1) Reciprocal blasts 2) Transformation into graph representation 3) Coloring and decomposition 4) Reconversion and mapping to species with encoded proteins Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 15 / 25

Distributed computing Figure: a) Multiple PCs running Proteinortho, cooperating dynamically using an N-way technique b) Workflow of synchronization Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 16 / 25

Challenge Application to all bacteria available on NCBI 710 species, 15 million proteins Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 17 / 25

Challenge Application to all bacteria available on NCBI 710 species, 15 million proteins took about two weeks on 50 CPU-cores (Intel Xenon 233 GHz) peak of only 25 GB RAM, but 300 GB hard disk Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 17 / 25

Results 300 Coverage overview cumulative # of connected components 275 250 225 200 175 150 125 100 75 original blasted blasted filtered 50 25 0 400 450 500 550 600 650 700 # of species covered Figure: Number of common proteins Sets with over 5% paralogs where filtered Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 18 / 25

Results Common proteins 30S ribosomal proteins S2-5, S7, S8, S10-13, S17, S19 50S ribosomal proteins L1-3, L5, L6, L11, L14, L22, L23 trna synthetases for seryl, arginyl, phenylalanyl (alpha chain) preprotein translocase, SecY subunit peptidase M22, O-sialoglycoprotein endopeptidase transcription elongation/termination factor NusA Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 19 / 25

Annotation pipeline Application for annotation in: newly sequenced bacterial genome out: annotation of protein coding genes candidates for non-coding genes no previous knowledge required runs in 10 to 90 minutes Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 20 / 25

Relatives discovery Figure: Relatives detection using reference proteins and tree Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 21 / 25

Relatives discovery with colors Figure: Advanced relatives detection using colors Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 22 / 25

Seeding Figure: Pipeline seeding with proteins Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 23 / 25

Pipeline overview Figure: Pipline seeding with proteins Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 24 / 25

The end Thank you for listening! Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 25 / 25

W M Fitch Homology a personal view on some of the problems Trends Genet, 16(5):227 31, May 2000 Richard Owen, Cooper, and William White Lectures on the comparative anatomy and physiology of the invertebrate animals London :Longman, Brown, Green, and Longmans, 1843 http://wwwbiodiversitylibraryorg/bibliography/6788 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 25 / 25