Interaction Network Analysis

Similar documents
Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Protein function prediction via analysis of interactomes

Self Similar (Scale Free, Power Law) Networks (I)

networks in molecular biology Wolfgang Huber

MACHINE LEARNING 2012 MACHINE LEARNING

Systems biology and biological networks

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

Complex (Biological) Networks

Comparative Network Analysis

V 5 Robustness and Modularity

Data Mining and Analysis: Fundamental Concepts and Algorithms

Erzsébet Ravasz Advisor: Albert-László Barabási

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

Interspecific Segregation and Phase Transition in a Lattice Ecosystem with Intraspecific Competition

Graph Alignment and Biological Networks

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Complex (Biological) Networks

Evolutionary Tree Analysis. Overview

Graph Theory and Networks in Biology

Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks

Discovering Binding Motif Pairs from Interacting Protein Groups

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

Bioinformatics: Network Analysis

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Differential Modeling for Cancer Microarray Data

Types of biological networks. I. Intra-cellurar networks

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Graph Theory Approaches to Protein Interaction Data Analysis

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Statistical analysis of biological networks.

A Multiobjective GO based Approach to Protein Complex Detection

Statistical NLP: Lecture 7. Collocations. (Ch 5) Introduction

Preface. Contributors

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks Analysis

identifiers matched to homologous genes. Probeset annotation files for each array platform were used to

Network models: random graphs

MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Network alignment and querying

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

12.1 Systems of Linear equations: Substitution and Elimination

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

CSE 546 Midterm Exam, Fall 2014

Functional Characterization and Topological Modularity of Molecular Interaction Networks

( ) ( ) ( )( ) Algebra II Semester 2 Practice Exam A DRAFT. = x. Which expression is equivalent to. 1. Find the solution set of. f g x? B.

Computational methods for predicting protein-protein interactions

CSCE555 Bioinformatics. Protein Function Annotation

Sequence Alignment Techniques and Their Uses

Chapter 8: The Topology of Biological Networks. Overview

Analysis of Biological Networks: Network Robustness and Evolution

INTRODUCTION GOOD LUCK!

4452 Mathematical Modeling Lecture 13: Chaos and Fractals

MMJ1153 COMPUTATIONAL METHOD IN SOLID MECHANICS PRELIMINARIES TO FEM

Biological Networks. Gavin Conant 163B ASRC

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

V 6 Network analysis

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Towards Detecting Protein Complexes from Protein Interaction Data

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Graph Clustering Algorithms

Ch 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Iteration Method for Predicting Essential Proteins Based on Orthology and Protein-protein Interaction Networks

CS4220: Knowledge Discovery Methods for Bioinformatics Unit 6: Protein-Complex Prediction. Wong Limsoon

Chapter 6: Securing neighbor discovery

AE/ME 339. K. M. Isaac Professor of Aerospace Engineering. December 21, 2001 topic13_grid_generation 1

BioControl - Week 6, Lecture 1

Computer Vision & Digital Image Processing

Supplementary Information

Network Alignment 858L

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Graphics Example: Type Setting

Handout for Adequacy of Solutions Chapter SET ONE The solution to Make a small change in the right hand side vector of the equations

Protein Complex Identification by Supervised Graph Clustering

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Solving Systems Using Tables and Graphs

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Pattern Matching (Exact Matching) Overview

Unit 12 Study Notes 1 Systems of Equations

Fragment Orbitals for Transition Metal Complexes

Applied Algebra II Semester 2 Practice Exam A DRAFT. 6. Let f ( x) = 2x A. 47 B. 92 C. 139 D. 407

SYSTEMS BIOLOGY 1: NETWORKS

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013

Multiple Whole Genome Alignment

Proteomics Systems Biology

Copyright 2000 N. AYDIN. All rights reserved. 1

The architecture of complexity: the structure and dynamics of complex networks.

Correlation Networks

Probabilistic Graphical Models

Clustering and Network

14.1 Systems of Linear Equations in Two Variables

Methods for Advanced Mathematics (C3) Coursework Numerical Methods

Simplified Fast Multipole Methods for Micromagnetic Modeling

Chapter 5-2: Clustering

Transcription:

CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations between genes or proteins Importance Show the mechanisms of molecular function in a cell Ke resource for functional characterization of molecules in a sstematic view Eamples Metabolic networks Protein-protein or genetic interaction networks Gene regulator networks Signal transduction networks 1

Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification Protein-Protein Interactions 1 Protein Structures β-strand α-heli protein-protein interactions monomer, dimer, trimer,, l-mer 2

Protein-Protein Interactions 2 Protein-Protein Interactions Phsical interactions to form protein complees Functional relationships to perform the same molecular functions Interactome The entire set of protein-protein interactions in the genome scale Problem: large scale Requires sstematic, computational analsis Protein Interaction etworks Interaction etworks Undirected, unweighted graph representation with a set of nodes V as proteins and a set of edges E as interactions between them Problem: comple connectivit Eample 3

Interaction Determination 1 Tpes of Interactions Permanent stable interactions Transient interactions Eperimental Methods Mass spectrometr Identification of components in protein complees Two-hbrid sstem Determination of binar protein-protein interactions Problem: a large number of false positives Protein microarra Interaction Determination 2 Computational Methods Sequence-based approaches Phlogenetic profile analsis Gene fusion analsis Gene neighborhood analsis Structure-based approaches Homolog search Interface similarit search Epression-based approaches Epression profile correlation Interaction-based approaches Interaction prediction from known interactions 4

Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification Random Graph Model 1 Erdós and Réni, 1960 E-R Model Random graph as nodes connected b m edges that are randoml chosen from -1/2 possible edges m = p[-1/2] where p is the probabilit of each pair of nodes being connected -1 Degree distribution Pk = p k 1-p -1-k k Degree of v: the number of links from v to other nodes Degree distribution Pk: probabilit that a node has k links Epected number of nodes with degree k EX k = Pk = -1 k p k 1-p -1-k = λ k PX k = r = e -λk λ r k / r! Poisson distribution 5

Random Graph Model 2 Eample of E-R Model Poisson distribution with = 1000 and p = 0.0015 Scale-Free etwork Model 1 Barabasi and Albert, 1999 B-A Model Focused on network dnamics based on these two steps: Growth: networks are continuousl epanded b the addition of new nodes with a link to the nodes alread present Preferential attachment: the nodes are likel to be linked to high-degree nodes -γ Power law degree distribution: Pk ~ k where 2 < < 3 Features A ver few high-degree nodes and man low-degree nodes hub-oriented network structure Ver low average shortest path length small-world network 6

Scale-Free etwork Model 2 Eample of B-A Model Power-law degree distribution with the best fit of =2.9 on the dashed line Modular etwork Model Modular etworks Verified b high average clustering coefficient Densit of a graph GV,E: the number of actual edges over the number of all possible edges, DG = 2E / VV-1 Clustering coefficient of a node v: the densit of a subgraph with neighbors of v Dense intra-connections among the nodes in the same modules Sparse inter-connections between two nodes in different modules bridge peripheral nodes modular nodes 7

Hierarchical etwork Model Hierarchical etworks Integrated of scale-free topolog with modular structure Hierarch of modules is controlled b hubs Clustering coefficient distribution C Scale-free network & Modular network: C is independent of degree k Hierarchical network: C ~ k -1 Schematic View Scale-free etworks Modular etworks Hierarchical etworks 8

Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification Research Topics Function Prediction Protein Comple Detection Conserved Pattern Identification Essential Protein Selection Signaling Pathwa Prediction 9

Phlogenetic Profile Analsis 1 Main idea if two proteins in one organism have orthologs in another organism, the are likel to interact with each other and be functionall linked Genome P1 P2 P3 P4 P1 P2 P4 P5 P6 P7 P5 P7 E. Coli EC S. Cerevisiae SC P2 P3 P5 P6 P7 B. Subtilis BS Conclusion P1 P3 P5 P6 H. Influenzae HI P2 and P7 are functionall linked P3 and P6 are functionall linked P4 1 0 0 P1 1 0 1 Phlogenetic Profile EC SC BS HI P1 1 0 1 P2 1 1 0 P3 0 1 1 P4 1 0 0 P5 1 1 1 P6 0 1 1 P7 1 1 0 Profile Clusters P3 0 1 1 P6 0 1 1 P2 1 1 0 P7 1 1 0 P5 1 1 1 Phlogenetic Profile Analsis 2 Process Input: a pair of proteins Find orthologs across species Build scoring matrices Calculate correlation between two proteins A1 A2 A3 A4 A5 Protein A A1 A2 A3 A4 A5 A1 3 4 6 6 A2 4 6 6 A3 5 6 A4 1 A5 Sequence Alignment Scoring Matri Pearson Correlation A Protein B B1 B2 B3 B4 B5 2 3 6 7 4 5 6 6 6 1 B B1 B2 B3 B4 B5 B1 B2 B3 B4 B5 10

Interaction-based Local Analsis Majorit-Based Method Assigning majorit of functions of interacting partners to the unknown gene Called guilt-b-association f 1, f 2, f 4 f 2, f 3 B C A unknown D E f 1, f 2, f 4, f 5 f 2, f 5, f 6 Interaction-based Global Analsis Etension of Majorit-Based Method Assigning functions of all unknown genes in an interaction network Requires optimization f 2, f 3 D A unknown B unknown E f 1, f 2, f 4, f 5, f 6 F f 4, f 5, f 6 unknown C f 5, f 6, f 7 H G f 3, f 5 11

12 Common eighbor Analsis Ratio of Common Interacting Partners If two proteins have man common interacting partners, the are likel to interact with each other Jaccard coefficient: Geometric coefficient: Dice coefficient: Simpson coefficient: Hper-geometric coefficient P-value:, S, 2 S 2, S, min, S T T Z T Z Z T Z T P Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification

Clustering Interaction etworks Protein Comple A group of proteins having phsical interactions at the same place, same time Funtional Module A group of proteins having the same function A group of proteins having interactions even at different place, different time Protein Comple / Functional Module Detection Clustering interaction networks b graph clustering algorithms Detecting densel connected subgraphs Grouping densel connected proteins and adding peripheral proteins Maimum Clique Algorithm Find all maimum sized cliques Use antimonotonic propert If a subset of set A is not a clique, then the set A is not a clique G A B C D E F J I K size 2 cliques: {AB}, {AC}, {AE},.. size 3 cliques: {ABC}, {ACE},.. size 4 cliques: {JKLM} L M 13

Clique Percolation Definitions Two k-cliques are adjacent if the share k-1 vertices where k is the number of vertices in each clique A k-clique chain is a sub-graph comprising the union of a sequence of adjacent k cliques Algorithm 1 Find all k-cliques 2 Find all maimal k-clique chains b iterative merging adjacent k-cliques Reference Palla, G., et al., Uncovering the overlapping communit structure of comple networks in nature and societ, ature 2005 Hierarchical Approaches Bottom-Up Agglomerative Approaches Start with each verte as a cluster Iterativel merge the closest clusters Require a distance function between two clusters Top-Down Divisive Approaches Start with the whole graph as a cluster Recursivel divide up the clusters Require a cutting algorithm 14

Merging b Shortest Path Length Main Idea Agglomerative approach using single-link distance Algorithm 1 Select two closest vertices from different clusters based on the shortest path length between them 2 Merge two clusters that include the selected vertices 3 Repeat 1 and 2 until the shortest path length reaches a threshold Merging b Common eighbors 1 Main Idea Agglomerative approach using the similarit based on common neighbors More common neighbors two vertices share, more similar the are Algorithm 1 Find the most similar vertices from different clusters based on a similarit function 2 Merge the two clusters if the merged cluster reaches a densit threshold 3 Repeat 1 and 2 until no more clusters can be merged Similarit Functions Jaccard coefficient: S, Geometric coefficient: 2 S, 15

Merging b Common eighbors 2 More Similarit Functions Dice coefficient: Simpson coefficient: Marland bridge coefficient: 2 S, S, min, 1 S, 2 Reference Brun, C., et al., Clustering proteins from interaction networks for the prediction of cellular functions BMC Bioinformatics 2004 Minimum Cut Definitions Cut: a set of edges whose removal disconnects the graph Minimum cut: a cut with minimum number of edges Algorithm Recursivel find the minimum cut B D F G I A Parameter C E J K Minimum densit threshold Minimum size threshold L M 16

Betweenness Cut Betweenness Measurement of vertices or edges located between clusters Algorithm 1 Iterativel eliminate a verte or an edge with the highest Betweenness value until the graph is separated 2 Recursivel appl 1 into each subgraph 3 Repeat 1 and 2 until all subgraphs reach a densit threshold Reference Dunn, R., et al., The use of edge-betweenness clustering to investigate biological function in protein interaction networks BMC Bioinformatics 2005 Seed Growth Main Idea Search for local optimization Use a modularit densit function Tpes of seeds Random seeds: selected randoml Core seeds: selected b degree or clustering coefficient Algorithm 1 Select a seed as an initial cluster S 2 Add a neighbor to S repeatedl to find the ma modularit or densit 3 Return S if the modularit of S > threshold 4 Repeat 1, 2 and 3 to find a set of clusters 17

Graph Entrop 1 Definitions An eample of seed-growth algorithms General notation Inner links of v in G V,E : edges from v to the vertices in V p i v: probabilit of v having inner links Outer links of v in G V,E : edges from v to the vertices not in V p o v: probabilit of v having outer links Definitions Verte entrop: ev = - p i v log 2 p i v - p o v log 2 p o v Graph entrop : egv,e = Σ vv ev Find the minimum graph entrop during seed growth Graph Entrop 2 Eample 18

Graph Entrop 3 Algorithm 1 Select a seed node, and include all neighbors of the seed node into a seed cluster 2 Iterativel remove a neighbor if removal decreases graph entrop 3 Iterativel add a node on the outer boundar of a current cluster if addition decreases graph entrop 4 Output the cluster with the minimal graph entrop 5 Repeat 1, 2, 3, and 4 until no seed node remains Reference Kenle, E.C. and Cho, Y.-R., Detecting protein complees and functional modules from protein interaction networks: a graph entrop approach, Proteomics 2011 Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification 19

Predicting Signaling Pathwas Signaling Pathwa A series of proteins having signaling and response relationship Signaling etwork A combined form of linear signaling pathwas A directed acclic graph Signaling Pathwa / Signaling etwork Prediction Given starting and ending nodes, searching the strongest paths Combining the strongest paths to form a signaling network Path Frequenc-Based Approach Main Idea Measuring path frequenc towards the target on a PPI network Appling a greed algorithm to repeatedl select a link making the most paths to the target Frequent Path Mining Given a path prefi l, computes the support of the association rule, l vi 20

etwork Motif-Based Approach Main Idea Mapping network motifs into a PPI network Appling a greed algorithm to repeatedl select a link which fits the most network motifs etwork Motif Mapping Calculates motif occurrence scores b counting the matches Computes motif strength b summing the weights of the matches Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification 21

Measuring Essentialit of Proteins Essential Genes / Proteins Functional core genes or proteins in a functional module Hubs in an interaction network Significance in biomedical applications: drug target detection Measurement Essentialit of Genes / Proteins Local centralit measures Degree Clustering coefficient Global centralit measures Closeness Betweenness Local Centralit Clustering Coefficient of v i The densit of a sub-graph G V,E where V is the set of neighbors of v i j v, v, v v j vk k i C vi v v 1 i i Measuring the effectiveness of v i on denseness Average Clustering Coefficient of GV,E Average of the clustering coefficients of all vertices in V Maimum is 1 Measuring the modularit of G 22

Global Centralit Closeness, C c v i Detects the vertices located in the center of a graph CC vi v j V 1 p v, v s i j where p s v i,v j is the shortest path length between v i and v j Betweenness, C B v i Detects the vertices located between two clusters C B v i svi tv st vi st where σ st is the number of shortest paths between s and t, and σ st v i is the number of shortest paths between s and t, which pass through the verte v i Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification 23

etwork Alignment Main Idea & Goal Aligning two or more evolutionar distal interaction networks to identif evolutionar conserved connection patterns Measure sequential similarit between molecules orthologs, AD topological similarit between interaction networks Sequence Alignment vs. etwork Alignment Sequence Alignment Aligning two or more sequences Searches matches identical letters, mismatches non-identical letters, and gaps Returns alignment in a two-row representation including gaps etwork Alignment Aligning two or more networks Searches matches orthologs, mismatches non-orthologs, and gaps Returns an alignment network having ortholog pairs as nodes and conserved interactions as edges 24

Issues in etwork Alignment Technical Issues How to map two or more networks to detect a common sub-network How to optimize the alignment network for multiple orthologs How to improve efficienc of network alignment etwork Alignment Tpes Global network alignment o Aligning two or more entire networks Local network alignment o Detecting maimall strongl conserved sub-networks Questions? Lecture Slides are found on the Course Website, web.ecs.balor.edu/facult/cho/5330 25