CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations between genes or proteins Importance Show the mechanisms of molecular function in a cell Ke resource for functional characterization of molecules in a sstematic view Eamples Metabolic networks Protein-protein or genetic interaction networks Gene regulator networks Signal transduction networks 1
Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification Protein-Protein Interactions 1 Protein Structures β-strand α-heli protein-protein interactions monomer, dimer, trimer,, l-mer 2
Protein-Protein Interactions 2 Protein-Protein Interactions Phsical interactions to form protein complees Functional relationships to perform the same molecular functions Interactome The entire set of protein-protein interactions in the genome scale Problem: large scale Requires sstematic, computational analsis Protein Interaction etworks Interaction etworks Undirected, unweighted graph representation with a set of nodes V as proteins and a set of edges E as interactions between them Problem: comple connectivit Eample 3
Interaction Determination 1 Tpes of Interactions Permanent stable interactions Transient interactions Eperimental Methods Mass spectrometr Identification of components in protein complees Two-hbrid sstem Determination of binar protein-protein interactions Problem: a large number of false positives Protein microarra Interaction Determination 2 Computational Methods Sequence-based approaches Phlogenetic profile analsis Gene fusion analsis Gene neighborhood analsis Structure-based approaches Homolog search Interface similarit search Epression-based approaches Epression profile correlation Interaction-based approaches Interaction prediction from known interactions 4
Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification Random Graph Model 1 Erdós and Réni, 1960 E-R Model Random graph as nodes connected b m edges that are randoml chosen from -1/2 possible edges m = p[-1/2] where p is the probabilit of each pair of nodes being connected -1 Degree distribution Pk = p k 1-p -1-k k Degree of v: the number of links from v to other nodes Degree distribution Pk: probabilit that a node has k links Epected number of nodes with degree k EX k = Pk = -1 k p k 1-p -1-k = λ k PX k = r = e -λk λ r k / r! Poisson distribution 5
Random Graph Model 2 Eample of E-R Model Poisson distribution with = 1000 and p = 0.0015 Scale-Free etwork Model 1 Barabasi and Albert, 1999 B-A Model Focused on network dnamics based on these two steps: Growth: networks are continuousl epanded b the addition of new nodes with a link to the nodes alread present Preferential attachment: the nodes are likel to be linked to high-degree nodes -γ Power law degree distribution: Pk ~ k where 2 < < 3 Features A ver few high-degree nodes and man low-degree nodes hub-oriented network structure Ver low average shortest path length small-world network 6
Scale-Free etwork Model 2 Eample of B-A Model Power-law degree distribution with the best fit of =2.9 on the dashed line Modular etwork Model Modular etworks Verified b high average clustering coefficient Densit of a graph GV,E: the number of actual edges over the number of all possible edges, DG = 2E / VV-1 Clustering coefficient of a node v: the densit of a subgraph with neighbors of v Dense intra-connections among the nodes in the same modules Sparse inter-connections between two nodes in different modules bridge peripheral nodes modular nodes 7
Hierarchical etwork Model Hierarchical etworks Integrated of scale-free topolog with modular structure Hierarch of modules is controlled b hubs Clustering coefficient distribution C Scale-free network & Modular network: C is independent of degree k Hierarchical network: C ~ k -1 Schematic View Scale-free etworks Modular etworks Hierarchical etworks 8
Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification Research Topics Function Prediction Protein Comple Detection Conserved Pattern Identification Essential Protein Selection Signaling Pathwa Prediction 9
Phlogenetic Profile Analsis 1 Main idea if two proteins in one organism have orthologs in another organism, the are likel to interact with each other and be functionall linked Genome P1 P2 P3 P4 P1 P2 P4 P5 P6 P7 P5 P7 E. Coli EC S. Cerevisiae SC P2 P3 P5 P6 P7 B. Subtilis BS Conclusion P1 P3 P5 P6 H. Influenzae HI P2 and P7 are functionall linked P3 and P6 are functionall linked P4 1 0 0 P1 1 0 1 Phlogenetic Profile EC SC BS HI P1 1 0 1 P2 1 1 0 P3 0 1 1 P4 1 0 0 P5 1 1 1 P6 0 1 1 P7 1 1 0 Profile Clusters P3 0 1 1 P6 0 1 1 P2 1 1 0 P7 1 1 0 P5 1 1 1 Phlogenetic Profile Analsis 2 Process Input: a pair of proteins Find orthologs across species Build scoring matrices Calculate correlation between two proteins A1 A2 A3 A4 A5 Protein A A1 A2 A3 A4 A5 A1 3 4 6 6 A2 4 6 6 A3 5 6 A4 1 A5 Sequence Alignment Scoring Matri Pearson Correlation A Protein B B1 B2 B3 B4 B5 2 3 6 7 4 5 6 6 6 1 B B1 B2 B3 B4 B5 B1 B2 B3 B4 B5 10
Interaction-based Local Analsis Majorit-Based Method Assigning majorit of functions of interacting partners to the unknown gene Called guilt-b-association f 1, f 2, f 4 f 2, f 3 B C A unknown D E f 1, f 2, f 4, f 5 f 2, f 5, f 6 Interaction-based Global Analsis Etension of Majorit-Based Method Assigning functions of all unknown genes in an interaction network Requires optimization f 2, f 3 D A unknown B unknown E f 1, f 2, f 4, f 5, f 6 F f 4, f 5, f 6 unknown C f 5, f 6, f 7 H G f 3, f 5 11
12 Common eighbor Analsis Ratio of Common Interacting Partners If two proteins have man common interacting partners, the are likel to interact with each other Jaccard coefficient: Geometric coefficient: Dice coefficient: Simpson coefficient: Hper-geometric coefficient P-value:, S, 2 S 2, S, min, S T T Z T Z Z T Z T P Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification
Clustering Interaction etworks Protein Comple A group of proteins having phsical interactions at the same place, same time Funtional Module A group of proteins having the same function A group of proteins having interactions even at different place, different time Protein Comple / Functional Module Detection Clustering interaction networks b graph clustering algorithms Detecting densel connected subgraphs Grouping densel connected proteins and adding peripheral proteins Maimum Clique Algorithm Find all maimum sized cliques Use antimonotonic propert If a subset of set A is not a clique, then the set A is not a clique G A B C D E F J I K size 2 cliques: {AB}, {AC}, {AE},.. size 3 cliques: {ABC}, {ACE},.. size 4 cliques: {JKLM} L M 13
Clique Percolation Definitions Two k-cliques are adjacent if the share k-1 vertices where k is the number of vertices in each clique A k-clique chain is a sub-graph comprising the union of a sequence of adjacent k cliques Algorithm 1 Find all k-cliques 2 Find all maimal k-clique chains b iterative merging adjacent k-cliques Reference Palla, G., et al., Uncovering the overlapping communit structure of comple networks in nature and societ, ature 2005 Hierarchical Approaches Bottom-Up Agglomerative Approaches Start with each verte as a cluster Iterativel merge the closest clusters Require a distance function between two clusters Top-Down Divisive Approaches Start with the whole graph as a cluster Recursivel divide up the clusters Require a cutting algorithm 14
Merging b Shortest Path Length Main Idea Agglomerative approach using single-link distance Algorithm 1 Select two closest vertices from different clusters based on the shortest path length between them 2 Merge two clusters that include the selected vertices 3 Repeat 1 and 2 until the shortest path length reaches a threshold Merging b Common eighbors 1 Main Idea Agglomerative approach using the similarit based on common neighbors More common neighbors two vertices share, more similar the are Algorithm 1 Find the most similar vertices from different clusters based on a similarit function 2 Merge the two clusters if the merged cluster reaches a densit threshold 3 Repeat 1 and 2 until no more clusters can be merged Similarit Functions Jaccard coefficient: S, Geometric coefficient: 2 S, 15
Merging b Common eighbors 2 More Similarit Functions Dice coefficient: Simpson coefficient: Marland bridge coefficient: 2 S, S, min, 1 S, 2 Reference Brun, C., et al., Clustering proteins from interaction networks for the prediction of cellular functions BMC Bioinformatics 2004 Minimum Cut Definitions Cut: a set of edges whose removal disconnects the graph Minimum cut: a cut with minimum number of edges Algorithm Recursivel find the minimum cut B D F G I A Parameter C E J K Minimum densit threshold Minimum size threshold L M 16
Betweenness Cut Betweenness Measurement of vertices or edges located between clusters Algorithm 1 Iterativel eliminate a verte or an edge with the highest Betweenness value until the graph is separated 2 Recursivel appl 1 into each subgraph 3 Repeat 1 and 2 until all subgraphs reach a densit threshold Reference Dunn, R., et al., The use of edge-betweenness clustering to investigate biological function in protein interaction networks BMC Bioinformatics 2005 Seed Growth Main Idea Search for local optimization Use a modularit densit function Tpes of seeds Random seeds: selected randoml Core seeds: selected b degree or clustering coefficient Algorithm 1 Select a seed as an initial cluster S 2 Add a neighbor to S repeatedl to find the ma modularit or densit 3 Return S if the modularit of S > threshold 4 Repeat 1, 2 and 3 to find a set of clusters 17
Graph Entrop 1 Definitions An eample of seed-growth algorithms General notation Inner links of v in G V,E : edges from v to the vertices in V p i v: probabilit of v having inner links Outer links of v in G V,E : edges from v to the vertices not in V p o v: probabilit of v having outer links Definitions Verte entrop: ev = - p i v log 2 p i v - p o v log 2 p o v Graph entrop : egv,e = Σ vv ev Find the minimum graph entrop during seed growth Graph Entrop 2 Eample 18
Graph Entrop 3 Algorithm 1 Select a seed node, and include all neighbors of the seed node into a seed cluster 2 Iterativel remove a neighbor if removal decreases graph entrop 3 Iterativel add a node on the outer boundar of a current cluster if addition decreases graph entrop 4 Output the cluster with the minimal graph entrop 5 Repeat 1, 2, 3, and 4 until no seed node remains Reference Kenle, E.C. and Cho, Y.-R., Detecting protein complees and functional modules from protein interaction networks: a graph entrop approach, Proteomics 2011 Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification 19
Predicting Signaling Pathwas Signaling Pathwa A series of proteins having signaling and response relationship Signaling etwork A combined form of linear signaling pathwas A directed acclic graph Signaling Pathwa / Signaling etwork Prediction Given starting and ending nodes, searching the strongest paths Combining the strongest paths to form a signaling network Path Frequenc-Based Approach Main Idea Measuring path frequenc towards the target on a PPI network Appling a greed algorithm to repeatedl select a link making the most paths to the target Frequent Path Mining Given a path prefi l, computes the support of the association rule, l vi 20
etwork Motif-Based Approach Main Idea Mapping network motifs into a PPI network Appling a greed algorithm to repeatedl select a link which fits the most network motifs etwork Motif Mapping Calculates motif occurrence scores b counting the matches Computes motif strength b summing the weights of the matches Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification 21
Measuring Essentialit of Proteins Essential Genes / Proteins Functional core genes or proteins in a functional module Hubs in an interaction network Significance in biomedical applications: drug target detection Measurement Essentialit of Genes / Proteins Local centralit measures Degree Clustering coefficient Global centralit measures Closeness Betweenness Local Centralit Clustering Coefficient of v i The densit of a sub-graph G V,E where V is the set of neighbors of v i j v, v, v v j vk k i C vi v v 1 i i Measuring the effectiveness of v i on denseness Average Clustering Coefficient of GV,E Average of the clustering coefficients of all vertices in V Maimum is 1 Measuring the modularit of G 22
Global Centralit Closeness, C c v i Detects the vertices located in the center of a graph CC vi v j V 1 p v, v s i j where p s v i,v j is the shortest path length between v i and v j Betweenness, C B v i Detects the vertices located between two clusters C B v i svi tv st vi st where σ st is the number of shortest paths between s and t, and σ st v i is the number of shortest paths between s and t, which pass through the verte v i Overview Backgrounds etwork Modeling Interaction / Function Prediction Protein Comple / Functional Module Detection Signaling Pathwa / Functional Pathwa Prediction Essential Protein Selection Conserved Sstem Identification 23
etwork Alignment Main Idea & Goal Aligning two or more evolutionar distal interaction networks to identif evolutionar conserved connection patterns Measure sequential similarit between molecules orthologs, AD topological similarit between interaction networks Sequence Alignment vs. etwork Alignment Sequence Alignment Aligning two or more sequences Searches matches identical letters, mismatches non-identical letters, and gaps Returns alignment in a two-row representation including gaps etwork Alignment Aligning two or more networks Searches matches orthologs, mismatches non-orthologs, and gaps Returns an alignment network having ortholog pairs as nodes and conserved interactions as edges 24
Issues in etwork Alignment Technical Issues How to map two or more networks to detect a common sub-network How to optimize the alignment network for multiple orthologs How to improve efficienc of network alignment etwork Alignment Tpes Global network alignment o Aligning two or more entire networks Local network alignment o Detecting maimall strongl conserved sub-networks Questions? Lecture Slides are found on the Course Website, web.ecs.balor.edu/facult/cho/5330 25