Comparative Genomics II

Similar documents
Example of Function Prediction

Bioinformatics and Genomics Program, Center for Genomic Regulation, Doctor Aiguader, 88, Barcelona, Spain.

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

Session 5: Phylogenomics

Computational approaches for functional genomics

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Computational methods for predicting protein-protein interactions

Evolutionary Tree Analysis. Overview

Multiple Sequence Alignment. Sequences

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Comparative Bioinformatics Midterm II Fall 2004

and both play a significant role in the rise of variable size gene families originating

Comparing Genomes! Homologies and Families! Sequence Alignments!

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Algorithms in Bioinformatics

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Multiple Whole Genome Alignment

Phylogenetic trees 07/10/13

Phylogenetic Tree Reconstruction

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT


BLAST. Varieties of BLAST

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

8/23/2014. Phylogeny and the Tree of Life

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

BINF6201/8201. Molecular phylogenetic methods

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Evolution by duplication

A (short) introduction to phylogenetics

Analysis of Gene Order Evolution beyond Single-Copy Genes

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Phylogenomics of closely related species and individuals

Gene function annotation

Evolutionary Rate Covariation of Domain Families

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

EVOLUTIONARY DISTANCES

Bioinformatics: Network Analysis

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

SUPPLEMENTARY INFORMATION

Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods

What is Phylogenetics

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

Chapter 9. Inferring Orthology and Paralogy. Adrian M. Altenhoff and Christophe Dessimoz. Abstract. 1. Introduction

Mitochondrial Genome Annotation

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Phylogeny: building the tree of life

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

molecular evolution and phylogenetics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Phylogenetics: Building Phylogenetic Trees

Introduction to protein alignments

Comparative genomics of gene families in relation with metabolic pathways for gene candidates highlighting

Chapter 27: Evolutionary Genetics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Reconciliation with Non-binary Gene Trees Revisited

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Homology and Information Gathering and Domain Annotation for Proteins

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogenetic molecular function annotation

From BBCC Conference 2017 Naples, Italy December 2017

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

Pairwise & Multiple sequence alignments

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Biol478/ August

Supplementary Information

Protein Families. João C. Setubal University of São Paulo Agosto /23/2012 J. C. Setubal

Big Questions. Is polyploidy an evolutionary dead-end? If so, why are all plants the products of multiple polyploidization events?

Whole Genome Alignments and Synteny Maps

SUPPLEMENTARY INFORMATION

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Reconstructing the history of lineages

Package WGDgc. June 3, 2014

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

RGP finder: prediction of Genomic Islands

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Estimating Evolutionary Trees. Phylogenetic Methods

Phylogenetic inference

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogeny and the Tree of Life

Unified modeling of gene duplication, loss and coalescence using a locus tree

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Learning in Bayesian Networks

Lecture 8 Multiple Alignment and Phylogeny

Transcription:

Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II Introduction Slide 2/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II Introduction Gene Families Slide 3/31

Orthology inference and Gene family identification How to cluster genes by similarity? Want to uncover paralogy and orthology relationships. Approaches: Single-linkage Markov-Clustering Phylogenetic approaches Comparative Genomics II Introduction Gene Families Slide 4/31

Orthology prediction methods Comparative Genomics II Introduction Gene Families Slide 5/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II Introduction Pairwise Methods Slide 6/31

Pairwise methods Best Bidirectional Hits (A B) Single linkage COGs InParanoid & OrthoMCL Comparative Genomics II Introduction Pairwise Methods Slide 7/31

Best Bidirectional Hits (BBH) All pairs of proteins with reciprocal best hits are considered orthologs. Note that this method is unable to predict the othology with the yellow protein. Pro Intuitive and fast Con Has problem of promiscuous domains leading to over-connecting Con Requires a single cutoff for establishing linkages Comparative Genomics II Introduction Pairwise Methods Slide 8/31

Clusters of Orthologous Genes (COG) Proteins in the nodes of triangular networks of BBHs are considered as orthologs (green, red and yellow protein 1). New proteins are added to the orthologous group if they are present in BBH triangles that share an edge with a given cluster. The COG-like approach can add additional proteins from the same genome if they are more similar to each other than to proteins in other genomes, or if they form BBH triangles with members of the cluster. This is not the case for yellow protein 2, which is, again, misclassified. Comparative Genomics II Introduction Pairwise Methods Slide 9/31

InParanoid approach - correct for paralogy This is similar to BBH but other proteins within a proteome (yellow protein 2 in this example) are included as in-paralogs if they are more similar to each other than to their corresponding hits in the other species. Comparative Genomics II Introduction Pairwise Methods Slide 10/31

OrthOMCL approach - Markov Cluster http://www.micans.org/mcl/ani/mcl-animation.html This is similar to BBH but other proteins within a proteome (yellow protein 2 in this example) are included as in-paralogs if they are more similar to each other than to their corresponding hits in the other species. Comparative Genomics II Introduction Pairwise Methods Slide 11/31

OrthoMCL workflow Comparative Genomics II Introduction Pairwise Methods Slide 12/31

OrthoMCL distance correction for paralog method Comparative Genomics II Introduction Pairwise Methods Slide 13/31

OrthoMCL able to connect families unlinked by Single-Linkage or COG Comparative Genomics II Introduction Pairwise Methods Slide 14/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II Introduction Phylogenetic Methods Slide 15/31

Tree Reconciliation Duplication nodes (marked with a D) are defined by comparing the gene tree (small tree at the top) with the species tree (small tree at the bottom) to derive a reconciled tree (big tree on the right) in which the minimal number of duplication and gene loss (dashed lines) events necessary to explain the gene tree are included. In this case, both the yellow proteins are included in the orthologous group but the red and gray proteins are excluded. Comparative Genomics II Introduction Phylogenetic Methods Slide 16/31

Species overlap phylogenetic approach All proteins that derive from a common ancestor by speciation are considered members of the same orthologous group. Duplication nodes are detected when they define partitions with at least one shared species. A one-to-many orthology relationship emerges because of a recent duplication in the lineage leading to the yellow proteome. Comparative Genomics II Introduction Phylogenetic Methods Slide 17/31

SYNERGY [Wapinski] Clusters of similar genes are found and trees inferred at once Phylogenetic approach that builds up a tree and breaks groups when a ancestral duplication is found that is older than the species group. Can take into account scoring scheme that uses synteny SYNERGY InParanoid Comparative Genomics II Introduction Phylogenetic Methods Slide 18/31

SYNERGY Comparative Genomics II Introduction Phylogenetic Methods Slide 19/31

SYNERGY starts (top) with a collection of genes (A1, B1, C1 and so on), their chromosomal order (grey lines) and sequence distances (blue arrows; arrows of the same thickness have similar sequence distances). It then builds orthogroups as it climbs the species tree. First, it collects the genes in species A and B that share a common ancestor in species X (second panel, orange ovals). Then, it merges orthogroups formed in the previous stage with the genes in C, resulting in new orthogroups representing ancestral genes in species Y (third panel, yellow ovals). The orthogroups assembled at each stage are associated with gene trees reflecting divergence, duplication and loss events (bottom). b, Gene tree reconstruction and refining orthogroup assignments. An unrooted phylogeny is reconstructed for the genes and sub-orthogroups in each putative orthogroup (dashed oval). Some rootings (purple arrow) indicate that all the genes descended from a common ancestor (for example, X3, bottom left). Others (green arrow) show that a duplication occurred at the root of the gene tree (for example, X2 and X3, bottom right). In the latter case, the orthogroup is partitioned before proceeding. Comparative Genomics II Introduction Phylogenetic Methods Slide 20/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II Introduction Gene tree reconciliation Slide 21/31

Gene tree reconciliation Resolve Duplication and Speciation events on a gene tree Uses the known phylogeny of species and walk up the gene tree and assign nodes Some methods impute missing data (gene losses that are unobserved) Comparative Genomics II Introduction Gene tree reconciliation Slide 22/31

Speciation-Duplication Inference [Zmaseck and Eddy 2002] Very simple recursion to reconcile gene tree and species tree. Each node is labeled. Doesn t try and infer that there is missing data. Improved upon with Resampling Inference of Orthology (RIO) by same authors. Comparative Genomics II Introduction Gene tree reconciliation Slide 23/31

Notung Very simple recursion to reconcile gene tree and species tree. Each node is labeled. Doesn t try and infer that there is missing data. Comparative Genomics II Introduction Gene tree reconciliation Slide 24/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II Introduction Databases of Orthologs Slide 25/31

COGs and KOGs Don t use this as a way to classify your orthologs. Many other more accurate methods exist. Comparative Genomics II Introduction Databases of Orthologs Slide 26/31

OrthoMCL database OrthoMCL is an MCL based clustering gene family assignment Comparative Genomics II Introduction Databases of Orthologs Slide 27/31

PhylomeDB PhylomeDB strategy Comparative Genomics II Introduction Databases of Orthologs Slide 28/31

TreeFam Curated gene trees and gene families starting with automated clusters. Comparative Genomics II Introduction Databases of Orthologs Slide 29/31

Other tools Bayesian gene tree with species tree knowledge Prime-GSR OrthoStrapper for orthology TreeBEST Likelihood gene tree inference which is species tree aware. Comparative Genomics II Introduction Databases of Orthologs Slide 30/31

Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods Gene tree reconciliation Databases of Orthologs References Comparative Genomics II References Slide 31/31

References Frech C and Chen N. (2010) Genome-Wide Comparative Gene Family Classification PLoS One 5(10):e13409. URL http://dx.doi.org/10.1371/journal.pone.0013409 Gabaldon T. (2008) Large-scale assignment of orthology: back to phylogenetics? Genome Biol 9:235. URL http://dx.doi.org/10.1186/gb-2008-9-10-235 Zmaseck C and Eddy SR. (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17(9):821-8. URL http://www.hubmed.org/display.cgi?uids=11590098 Comparative Genomics II References Slide 31/31