Phylogenetic Analysis of Molecular Interaction Networks 1

Similar documents
Comparative Analysis of Modularity in Biological Systems

Functional Characterization of Molecular Interaction Networks

Functional Characterization and Topological Modularity of Molecular Interaction Networks

Network alignment and querying

Comparative Analysis of Molecular Interaction Networks

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Bioinformatics: Network Analysis

MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks

Network Alignment 858L

EVOLUTIONARY DISTANCES

In post-genomic biology, the nature and scale of data. Algorithmic and analytical methods in network biology. Mehmet Koyutürk 1,2

Building Cellular Interaction Databases: Analysis, Synthesis, and Interfaces

Computational methods for predicting protein-protein interactions

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

COMPARATIVE ANALYSIS OF BIOLOGICAL NETWORKS. A Thesis. Submitted to the Faculty. Purdue University. Mehmet Koyutürk. In Partial Fulfillment of the

Analysis, Synthesis, and Interfaces. Building Cellular Interaction Databases:

Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT

Algorithms to Detect Multiprotein Modularity Conserved During Evolution

Graph Alignment and Biological Networks

Increasing availability of experimental data relating to biological sequences, coupled with efficient

A Multiobjective GO based Approach to Protein Complex Detection

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Protein Complex Identification by Supervised Graph Clustering

EFFICIENT AND ROBUST PREDICTION ALGORITHMS FOR PROTEIN COMPLEXES USING GOMORY-HU TREES

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

Predicting Protein Functions and Domain Interactions from Protein Interactions

Supplementary Information

Comparison of Protein-Protein Interaction Confidence Assignment Schemes

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Comparative Network Analysis

Algorithms and tools for the alignment of multiple protein networks

Towards Detecting Protein Complexes from Protein Interaction Data

Dr. Amira A. AL-Hosary

Comparative Genomics II

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Computational approaches for functional genomics

Detecting Conserved Interaction Patterns in Biological Networks

Discovering molecular pathways from protein interaction and ge

Analysis of Biological Networks: Network Robustness and Evolution

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

METABOLIC PATHWAY PREDICTION/ALIGNMENT

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Integration of functional genomics data

Self Similar (Scale Free, Power Law) Networks (I)

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

BIOINFORMATICS. Integrative Network Alignment Reveals Large Regions of Global Network Similarity in Yeast and Human

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

Phylogenetic trees 07/10/13

Computational Biology: Basics & Interesting Problems

V 5 Robustness and Modularity

Systems biology and biological networks

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Modular Algorithms for Biomolecular Network Alignment

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Phylogenetics: Building Phylogenetic Trees

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm

Evolutionary Tree Analysis. Overview

Protein function prediction via analysis of interactomes

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Tools and Algorithms in Bioinformatics

The Physical Language of Molecules

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Comparative genomics: Overview & Tools + MUMmer algorithm

How Scale-free Type-based Networks Emerge from Instance-based Dynamics

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

O 3 O 4 O 5. q 3. q 4. Transition

Assessing Significance of Connectivity and. Conservation in Protein Interaction Networks

Biological Networks Analysis

Phylogenetic Networks, Trees, and Clusters

Written Exam 15 December Course name: Introduction to Systems Biology Course no

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Biological networks CS449 BIOINFORMATICS

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Hands-On Nine The PAX6 Gene and Protein

Phylogenetic Tree Reconstruction

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

Introduction to Bioinformatics

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Theory of Evolution Charles Darwin

A Max-Flow Based Approach to the. Identification of Protein Complexes Using Protein Interaction and Microarray Data

CSCE555 Bioinformatics. Protein Function Annotation

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Complex (Biological) Networks

Dividing Protein Interaction Networks by Growing Orthologous Articulations

Protein-protein Interaction: Network Alignment

GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways

Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

networks in molecular biology Wolfgang Huber

Identifying Evolutionarily Conserved Protein Interaction Modules Using GraphHopper

Transcription:

Phylogenetic Analysis of Molecular Interaction Networks 1 Mehmet Koyutürk Case Western Reserve University Electrical Engineering & Computer Science 1 Joint work with Sinan Erten, Xin Li, Gurkan Bebek, and Jing Li university-logo

Outline 1 Background & Motivation Why Network Phylogenetics? 2 Approach Modularity Based Phylogenetic Analysis Identifying Modular Network Components Constructing Module Maps Reconstructing Network Phylogenies 3 Results Performance on Simulated Networks Performance on Real Data 4 Conclusion Acknowledgments university-logo

Outline Background & Motivation 1 Background & Motivation Why Network Phylogenetics? 2 Approach Modularity Based Phylogenetic Analysis Identifying Modular Network Components Constructing Module Maps Reconstructing Network Phylogenies 3 Results Performance on Simulated Networks Performance on Real Data 4 Conclusion Acknowledgments university-logo

Background & Motivation Why Network Phylogenetics? Large Scale Data on Diverse Biological Systems Protein-protein interactions (PPI): Which proteins bind to each other (possibly, to perform specific tasks together)? Interacting proteins can be identified via high-throughput screening (e.g., Y2H, TAP) What can we learn from PPI data that belong to diverse species?

Background & Motivation Comparative Network Analysis Why Network Phylogenetics? Network Alignment: Identify "common" subgraphs in interaction networks of different species These are likely to correspond to modular components of the cellular machinery Algorithms mostly focus on one-to-one matching of interactions PathBlast (Kelley et al., PNAS, 2003), NetworkBlast (Sharan et al., RECOMB, 2004), MAWISH (Koyutürk et al., RECOMB, 2005), MULE (Koyutürk et al., JCB, 2006), Graemlin (Flannick et al., Genome Research, 2007) S.Cerevisiae Rpt1 D.Melanogaster CG7257 Rpt2 Rpt5 Rpt3 Rpt6 CG12010 PA CG12010 PB A conserved subgraph identified by MAWISH on yeast and fly networks: Proteosome regulatory particle

Background & Motivation Why Network Phylogenetics? Network Evolution & Phylogenetics Can we learn more about these networks by paying more attention to their evolutionary histories? Incorporating evolutionary information enhances the performance of alignment algorithms (Hirsh & Sharan, 2007; Flannick et al., RECOMB, 2008) Network similarity based algorithms are quite successful in reconstructing evolutionary trees based on metabolic networks (Chor & Tuller, RECOMB, 2006) Phylogenetic tree reconstructed by Chor & Tuller s RDL-based algorithm

Modular Evolution Background & Motivation Why Network Phylogenetics? What are the evolutionary mechanisms that create, preserve, and diversify modularity in biological systems? Pathway alignment for glycolysis, Entner-Doudoroff pathway, and pyruvate processing (Dandekar et al., 1999) university-logo

Outline Approach 1 Background & Motivation Why Network Phylogenetics? 2 Approach Modularity Based Phylogenetic Analysis Identifying Modular Network Components Constructing Module Maps Reconstructing Network Phylogenies 3 Results Performance on Simulated Networks Performance on Real Data 4 Conclusion Acknowledgments university-logo

Our Approach Approach Modularity Based Phylogenetic Analysis Instead of comparing interactions, compare emergent properties We use conservation of modular network components as an indicator of evolutionary proximity Phylogeny reconstruction provides a testbed for hypotheses Does network information provide information beyond what is gleaned from genome? Does modularity say anything more about the evolutionary histories of these networks? Not another tool for phylogeny reconstruction!

Approach Modularity Based Phylogenetic Analysis university-logo MOPHY A framework for modularity based phylogenetic analysis Reconstruction of whole-network and module phylogenies Identification of module families Graph Clustering Projection Phylogenetic Analysis... G1 M11 M1N1.... G2... M21... M1N2.... GK Networks MK1 Modules MKNK. Module Map Network Phylogenetics

What is Modularity? Approach Identifying Modular Network Components Functional module: A group of molecules that perform a distinct function together (Hartwell, Nature, 1999) Functional modules manifest themselves as highly connected subgraphs in the network (Rives & Galitski, PNAS, 2003) Many graph clustering algorithms have been developed to identify functional modules from interaction data MCODE (Bader et al., BMC Bioinformatics, 2003), MCL (Pereira-Leal et al., Proteins, 2004), SIDES (Koyutürk et al., RECOMB, 2006) Functional coherence is often used to verify modularity VAMPIRE (Hsiao et al., NAR, 2005), Ontologizer (Grossman et al., RECOMB, 2006)

Network Connectivity Approach Identifying Modular Network Components Commonly used as an indicator of modularity Mutual Clustering Coefficient: Fraction (significance) of shared interactions between two proteins (Goldberg & Roth, PNAS, 2003) Significance can be computed using hypergeometric models χ = 3/6 Mutual Clustering Coefficient χ(p i, P j ) = {P k V : P i P k E P j P k E} {P k V : P i P k E P j P k E}

Network Proximity Approach Identifying Modular Network Components A more flexible indicator of modularity Data is incomplete: Indirect paths can account for missing interactions Projection of modularity: Indirect paths may relax evolutionary pressure on preserving direct interactions ψ = 1/2 Proximity ψ(p i, P j ) = 1/δ(P i, P j )

Approach Network Proximity and Modularity Identifying Modular Network Components Process Similarity 4.5 4 3.5 3 2.5 2 1.5 1 C.eleg PPI S.cere PPI S.pomb PPI Struct DDI HC+MC DDI Comp-2 PPI Comp-1 DDI 0.5 0 1 1.5 2 2.5 3 3.5 4 Length of Shortest Path in Network Average process similarity with respect to network distance in a variety of PPI and DDI networks Process similarity: Information content of the biological processes that describes all processes associated with the two proteins (Pandey et al., ECCB, 2008) university-logo

Approach Module Identification Algorithm Identifying Modular Network Components Bottom-up complete linkage hierarchical clustering of proteins in the network Primary similarity criterion: Proximity Secondary similarity criterion: Mutual clustering coefficient Discrete modules are obtained by putting a threshold on proximity

Projection of Proximity Approach Constructing Module Maps Sequence similarity relates proteins in different networks σ(p i, P j ) = 1 + 1/ log E ij, where E ij denotes the BLAST E-value of the best bidirectional hit between proteins P i and P j The proximity between proteins P i, P j V(G) with respect to network G G is the aggregate proximity of their orthologs in G ψ G (P i, P j ) = ˆσ G (P i, P k )ˆσ G (P j, P l )ψ(p k, P l ) P k,p l V(G ) Here, ˆσ G (P i, P k ) = σ(p i, P k ) σ(p i, P l ) P l V(G )

Approach Projection of Modularity Constructing Module Maps The modularity of a set S V(G) of proteins with respect to network G is defined as the average proximity of all pairs of proteins in S with respect to G 2 ψ G (P i, P j ) µ G (S) = P i,p j S S ( S 1)

Module Maps Approach Constructing Module Maps Once modules are projected on all networks, each network is associated with a module map A module map is a vector with each entry signifying the modularity of the corresponding module in the corresponding network Modules identified in G j : {S j 1, Sj 2,..., Sj m j } For all networks G k, the module map of G k with respect to G j is given by f ij = [µ Gi (S j 1 ) µ G i (S j 2 )... µ G i (S j m j )] If there are K networks, then the complete module map of G k is given by f i = [f i1 f i2... f ik ]

Approach Reconstructing Network Phylogenies Phylogeny Reconstruction Using Module Maps Estimate evolutionary distances based on module maps d(g i, G j ) = 1 f i f j f i f j Once evolutionary distances are available, reconstruct a phylogenetic tree using traditional algorithms Neighbor Joining (Saitou & Nei, Mol. Biol. Evol., 1987) Handling noise and missing data Estimate the evolutionary distances between two networks based on only their { module maps with respect to each other ˆd(G i, G j ) = min 1 f ii f ji f ii f ji, 1 f } jj f ij f jj f ij

Outline Results 1 Background & Motivation Why Network Phylogenetics? 2 Approach Modularity Based Phylogenetic Analysis Identifying Modular Network Components Constructing Module Maps Reconstructing Network Phylogenies 3 Results Performance on Simulated Networks Performance on Real Data 4 Conclusion Acknowledgments university-logo

Results Simulation of Network Evolution Performance on Simulated Networks Generate multiple networks based on theoretical models of network evolution Duplication/Mutation/Complementation (DMC) model (Middendorf et al., PNAS, 2005) Generalized models reconstruct the properties (degree distribution, clustering coefficient distribution) of extant networks (Bebek et al., Theo. Comp. Sci., 2006)

Results Evolving Multiple Networks Performance on Simulated Networks Generate multiple networks based on the DMC model Speciation: A network is duplicated, each copy evolves independently... Duplication Mutation Complementation Speciation... Using the proposed framework, construct a phylogenetic tree based on the generated networks Performance evaluation, comparison of different algorithms, adjustment of parameters university-logo

Performance Measures Results Performance on Simulated Networks Comparison of the topologies of underlying and reconstructed phylogenetic trees Symmetric Difference: Number of partitions that are on one tree and not the other (Robinson & Foulds, Math. Biosci., 1981) Comparison of the actual and estimated evolutionary distances Nodal Distance: Sum of distances of all node pairs in each tree (Bluis et al., IEEE BIBE, 2003) Statistical significance Random Method: Use random groups of proteins of the same size instead of identified modules Compute p-values via t-test on repeated runs

Results Reconstructing Known Phylogeny Performance on Simulated Networks Underlying Tree Reconstructed Tree MOPHY reconstructs the topology of the underlying phylogeny perfectly Nodal distance: 5.12 (p < 0.002)

Effect of Parameters Results Performance on Simulated Networks Most Specific Modules Diameter 2 3 4 Coverage MOPHY Random p< MOPHY Random p < MOPHY Random p < 20% 1.6 11.2 0.004 1.6 11.2 0.004 1.6 12.0 0.003 40% 1.6 12.0 0.001 1.6 10.8 0.009 1.6 12.0 0.001 60% 1.6 11.2 0.002 1.6 11.6 0.004 1.6 11.6 0.002 Most Comprehensive Modules Diameter 2 3 4 Coverage MOPHY Random p < MOPHY Random p < MOPHY Random p < 20% 2.4 11.6 0.005 2.8 10.8 0.009 4.4 10.8 0.012 40% 2.8 12.0 0.004 2.8 10.8 0.003 4.4 10.4 0.018 60% 1.6 10.8 0.006 2.4 11.2 0.003 3.6 10.0 0.005 Performance difference between MOPHY and random method is consistently significant Conservation of proximity between modular groups of proteins captures evolutionary histories better than that of arbitrary proteins university-logo

Effect of Noise Results Performance on Simulated Networks Once networks are generated, we add noise to each network by randomly swapping interactions 16 15 MOPHY random 14 13 Nodal Distance 12 11 10 9 p=0.0003 p=0.0027 p=0.007 p=0.0059 8 p=0.0001 7 6 0 10 20 30 40 50 Noise (%) university-logo

Results Performance on Real Data Comparison to Existing Algorithms Sequence-based Phylogenetic Tree MOPHY RDL (Chor & Tuller, 2006) university-logo

Outline Conclusion 1 Background & Motivation Why Network Phylogenetics? 2 Approach Modularity Based Phylogenetic Analysis Identifying Modular Network Components Constructing Module Maps Reconstructing Network Phylogenies 3 Results Performance on Simulated Networks Performance on Real Data 4 Conclusion Acknowledgments university-logo

Conclusion Remarks & Ongoing Work Success in phylogenetic tree reconstruction can be seen as proof of concept We have worked on "rows" of the module map so far What do we learn if we work on columns? How can we trace evolutionary histories of modules? Find modules that map to each other, co-evolve, or complement each other Evolutionary trajectories can be used to identify hierarchical module families and construct module ontologies

Thanks... Conclusion Acknowledgments CWRU Sinan Erten, Xin Li, Gurkan Bebek, Jing Li Purdue Ananth Grama UC-San Diego Shankar Subramaniam