Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks

Similar documents
Networks & pathways. Hedi Peterson MTAT Bioinformatics

Types of biological networks. I. Intra-cellurar networks

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Self Similar (Scale Free, Power Law) Networks (I)

Interaction Network Analysis

In-Depth Assessment of Local Sequence Alignment

Systems biology and biological networks

Metabolic Networks analysis

How Scale-free Type-based Networks Emerge from Instance-based Dynamics

A Multiobjective GO based Approach to Protein Complex Detection

networks in molecular biology Wolfgang Huber

Computational Biology: Basics & Interesting Problems

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

Computational approaches for functional genomics

Protein-protein interaction networks Prof. Peter Csermely

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Biological networks CS449 BIOINFORMATICS

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Preface. Contributors

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

SUPPLEMENTAL DATA - 1. This file contains: Supplemental methods. Supplemental results. Supplemental tables S1 and S2. Supplemental figures S1 to S4

Name Period The Control of Gene Expression in Prokaryotes Notes

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Introduction. Gene expression is the combined process of :

Labeling network motifs in protein interactomes for protein function prediction

V14 Graph connectivity Metabolic networks

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

Gene Network Science Diagrammatic Cell Language and Visual Cell

Modeling (bio-)chemical reaction networks

15.2 Prokaryotic Transcription *

Topic 4 - #14 The Lactose Operon

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

Biological Networks. Gavin Conant 163B ASRC

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Chapter 15 Active Reading Guide Regulation of Gene Expression

Biological Networks Analysis

Computational Cell Biology Lecture 4

Complex (Biological) Networks

Gene regulation II Biochemistry 302. Bob Kelm February 28, 2005

V19 Metabolic Networks - Overview

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

V14 extreme pathways

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

It s a Small World After All

BME 5742 Biosystems Modeling and Control

Welcome to Class 21!

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Initiation of translation in eukaryotic cells:connecting the head and tail

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

V 6 Network analysis

86 Part 4 SUMMARY INTRODUCTION

Detecting temporal protein complexes from dynamic protein-protein interaction networks

Chapter 17. From Gene to Protein. Biology Kevin Dees

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Complex (Biological) Networks

Network motifs in the transcriptional regulation network (of Escherichia coli):

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Integration of functional genomics data

56:198:582 Biological Networks Lecture 10

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Computational methods for predicting protein-protein interactions

V 5 Robustness and Modularity

Basic modeling approaches for biological systems. Mahesh Bule

Chapter 1. Topic: Overview of basic principles

CS4220: Knowledge Discovery Methods for Bioinformatics Unit 6: Protein-Complex Prediction. Wong Limsoon

Modules in the metabolic network of E.coli with regulatory interactions

Graph Theory and Networks in Biology

Prokaryotic Gene Expression (Learning Objectives)

Translation and Operons

Identifying Signaling Pathways

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

The architecture of complexity: the structure and dynamics of complex networks.

Host-Pathogen Interaction. PN Sharma Department of Plant Pathology CSK HPKV, Palampur

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Graph Alignment and Biological Networks

Erzsébet Ravasz Advisor: Albert-László Barabási

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Molecular Biology - Translation of RNA to make Protein *

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Chapter 12. Genes: Expression and Regulation

56:198:582 Biological Networks Lecture 8

Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality

Chapter 8: The Topology of Biological Networks. Overview

MTopGO: a tool for module identification in PPI Networks

A Simple Protein Synthesis Model

ATLAS of Biochemistry

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

BMD645. Integration of Omics

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Applications of Complex Networks

Chapter 6- An Introduction to Metabolism*

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

CS4220: Knowledge Discovery Methods for Bioinformatics Unit 6: Protein-Complex Prediction. Wong Limsoon

Transcription:

22 International Conference on Environment Science and Engieering IPCEE vol.3 2(22) (22)ICSIT Press, Singapoore Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks zadeh Jafarnejad and Mahmood. Mahdavi + Department of Chemical Engineering, Ferdowsi University of Mashhad, zadi Square, Pardis Campus, 9779-48944, Mashhad, Iran bstract: iological systems are composed of interacting and functional modules. Identifying these modules is essential to understand structure of biological systems. One of the essential modules in each protein-protein interaction network is ribosomal cluster. Our analysis on 2 protein-protein interaction networks of bacteria showed that this cluster is the largest and mostly connected cluster in all networks. The properties of this cluster were described using centrality parameters of the nodes in this sub network. It was found that centrality parameters are effective information to interpret biological aspects of networks with the support of biochemical information. Ribosomal cluster was analyzed in this study as a typical example. Keywords: "Centrality", "Cluster", "Protein-Protein interaction".. Introduction Protein-protein interactions (PPI) are crucial for important molecular processes in the cell and PPI networks provide insights into protein functions []. These networks are modular and clusters correspond to two types of modules: protein complexes and functional modules. Protein complexes are permanent interacting proteins at same time and places but functional modules are transient interacting proteins that participate in a particular cellular process at different time and places [2]. Proteins with high betweenness and low connectivity act as important links between these modules and are more likely to be essential [3]. ottlenecks are proteins with a high betweenness centrality [4]. In scale-free protein interaction networks, most proteins have few connections, whereas a small proportion of proteins interact with many partners. High-degree proteins are hubs and are approximately threefold more likely to be essential than non-hubs [5]. It was shown that the most central metabolites in the metabolic network have the largest closeness centrality value [6]. vertex with a high radiality value can easily reach other vertices and is generally closer to the other nodes [7]. Thus, centrality parameters in networks are meaningful biological terms and are very good indicators of essential proteins in PPI networks. In this study, PPI networks of twenty microorganisms, all bacteria, were analyzed and centrality parameters were obtained. Using the parameters the mostly populated and mostly connected cluster of all studied networks was identified as "ribosome" cluster. The features of this cluster were studied and some biological facts were confirmed based on the findings. 2. Materials and Methods 2.. PPI networks The PPI catalogues of twenty microorganisms were retrieved from STRING (version 8, 2). This version covers about 2.5 million proteins from 63 organisms, providing the most comprehensive view on protein protein interactions currently available [8]. There is a combined score between any pair of proteins in STRING. In this study, interactions with a combined score of 9 or more were used. The visualized + Corresponding author. E-mail address: mahdavi@ferdowsi.um.ac.ir. 87

network of interactions resulted in a majority of nodes connected to each other as a network and a minority of nodes disconnected from the whole network. Only the connected portions of the PPI datasets were selected for analysis. 2.2. Software Duplicated interactions were removed using a Perl script and modified data was imported to the Cytoscape 2.8, a bioinformatics package for biological network visualization and data integration [9]. Centrality analysis was performed using CentiScaPe, a Cytoscape plugin for computing network centrality parameters []. Clusters were detected using ClusterONE, a plugin to discover densely connected and possibly overlapping regions within the Cytoscape network []. Topological parameters were computed using Network nalyzer, a Cytoscape plugin for computing and displaying a comprehensive set of topological parameters of networks [2]. 3. Results and Discussion 3.. Overall structure of the networks y plotting the degree probability P(k) as a function of degree (k) for 2 datasets, it was observed that most nodes have low degrees and a few nodes have high degrees, as observed in scale free networks. However, small networks were far from scale free distribution more than the larger ones. For example in the largest studied network i.e. Escherichia coli with 263 nodes and 72 interactions the exponent of the degree (γ) was.8 while that was.3 for the smallest studied network related to Pelotomaculum thermopropionicum whit 334 nodes and 56 edges as shown in Figure. s seen in this figure there is a hump shape that demonstrates a trace of initial conditions of the degree distribution which may be found for any network size. s the size of networks grows, humps become smoother and degree distribution approaches to scale free distribution [3]. P.thermopropionicum.. P(K). P(K).... K. Fig. : probability distribution P(k) as a function of degree k... P.thermopropionicum K In disassortative networks high-degree vertices connect preferentially to vertices with low degrees. verage neighbor degree of vertices in the network is a measure to evaluate degree correlations [4]. Technological and biological networks tend to be disassortative [5]. y plotting the neighbor degree of vertices versus degree in interaction networks of 2 microorganisms no evidence of disassortativity mixing in protein level was observed. n important point about these networks is that big clusters in networks are not connected to each other directly but there are some low degree vertices whit high betweenness value acting as linkers between these modules. In other words, in modular level, networks are disassortative; but, inside the modules degree and neighbor degree are correlated. Neighbor degrees in and Shewanella oneidensis networks are shown in Figure 2. The trend indicates that in lower degrees the average value increases whereas in higher degrees the average value remains constants in a certain interval. 88

5 5 4 4 3 3 2 2 Neighbor degree ve Neighbor degree Fig. 2: Neighbor degree vs. degree in and in. Dots in blue represent neighbor degrees and dots in red represent average neighbor degrees in each degree. Neighbor degree ve. Neighbor degree Clustering coefficient is an important measure that shows internal structure of a network that is related to the local cohesiveness of a network and measures the probability that two vertices with a common neighbor are connected [4]. Clustering coefficients of vertices versus degree in PPI networks of 2 microorganisms were computed. It was observed that vertices with low degrees had a wide range of clustering coefficients and the maximum value of clustering coefficient that is was found to be abundant in low degrees. Highdegree nodes had average clustering coefficients comparable to the average of the whole network. verage clustering coefficients in and networks are shown in Figure 3..8.8.6.6.4.4.2.2 ve. Clustring Coe. ve. Clustring Coe. Fig. 3: verage clustering coefficients versus degree in and. nother important centrality index is betweenness. High betweenness centrality means that the node, for certain paths is crucial to maintain node connections. etweenness in and networks are shown in Figure 4. The highest values belong to the nodes with higher degrees. 5 5 etweenness 4 3 2 etweenness 4 3 2 Fig. 4: etweenness distribution with degree in and In these networks two important features i.e. clustering coefficient and average neighbor degree, were observed to be high in high-degree nodes. This is an indication that high-degree nodes have a high tendency to be the core of clusters and construct fully-connected components. lthough some low-degree nodes have close to maximum clustering coefficients, they can not construct major clusters because they have a few neighbors. There are high-degree proteins with low betweenness centrality in the network. These nodes are 89

the ones that aren t in the paths with high betweenness. These nodes are mostly found inside clusters and have no major role in the whole network connectivity. On the other hand, there are some low-degree nodes with high betweenness in clusters that link one cluster to the other. These nodes are crucial for the connectivity of the network. 3.2. Ribosomal cluster In all the studied networks, the first and the most important identified cluster was found to be ribosomal cluster, which is the most populated cluster of networks with ribosomes as member protein. Ribosomal functional group plays an important role in existence of the cell and acts as the synthesizer of proteins in the cell. In all networks the highest degrees are related to ribosomal proteins. These proteins are highly connected to each other and their internal connections are much more than their external connections. Most members, despite their high degrees, are only connected to the internal members of the cluster and almost all their proteins in the twenty microorganisms have a very low betweenness. There are only some proteins observed in the cluster with high betweenness playing the role of connecting bridge of this fully connected high populated group with the rest of the network. lthough the connections of this sub network are not limited to these proteins but other paths are much less important in comparison with them and the paths leading to them have a very low betweenness. For example in P.thermopropionicum network, eliminating five member of ribosomal subgroup causes the disconnection of more than 65 percent of all of the paths in this cluster with the network. 3.3. Centrality indexes and biological facts Six centrality indexes for each node including degree, betweenness, stress, closeness, eccentricity and radiality were computed for proteins of the networks. Examination of centrality indexes of ribosomal cluster in the networks shows that in most of the networks the RN polymerase enzyme and its subunits have the highest betweenness. This observation is relied on the fact that in all of the organisms, especially in bacteria, this enzyme is the main factor in transcription. Transcription will not continue without this enzyme and the next process, which is the translation and synthesizing the protein with the help of ribosome, will not occur. So the main connections of ribosomes with the network are through this enzyme. Energy is supplied for the activity of ribosome and protein synthesis through an enzyme called denylate kinase, which is one of the nodes with the most Centrality. It provides the essential energy by reverse transfer of the last phosphate from TP to MP. This small enzyme is involved in both energy metabolism and synthesizing the nucleotides and is one of the essential enzymes for growth and reconstruction of cell. Hence, all ribosomal proteins have interactions with this enzyme to gain energy. This enzyme is one of the hubs and bottlenecks of the network. RN polymerase and denylate kinase are two typical enzymes which are as nodes with the highest betweennesses in all studied networks. In ribosomal clusters other centrality parameters are of interest. Eccentricity values of the member nodes are high as these enzymes need to be in contact with the furthest nodes of the network due to their importance. These nodes also have high closeness and Radiality values because these nodes are not only bottleneck, but also hub and other nodes are in the proximity of these nodes. There are some members in ribosome cluster which have high betweenness parameter in spite of their low degrees. typical member is the sigma subunit of polymerize RN enzyme. This subunit is responsible for connecting RN polymerase to DN and start transcription. The Eccentricity of this enzyme is high as the function of all ribosomal proteins starts with this enzyme. ut the Closeness of this node, in comparison with other members with high betweenness, is low. This is due to the observation that this member is connected to ribosome through other subunits of RN polymerase. This parameter also shows that the sigma subunit is a bottleneck but not a hub. The least betweenness in the ribosome cluster is related to ribosomes and their connecting proteins to trn. These proteins are only connected with each other and don't have much connections with outside. They rather have a high Closeness parameter because of their high numbers in the cluster, but due to the lack of connection with other members of the network, their eccentricity is low. Some ribosomal proteins with unexpected high betweenness exist in the cluster. These nodes have low eccentricities despite their high betweennesses. This shows that they are not central nodes in the network. The reason of their high 9

betweennesses is that they connect members of the cluster and don't have much connection with outside of the cluster independently. 4. Conclusion The protein-protein interaction networks of twenty bacteria were analyzed in terms of centrality parameters and it was found that ribosomal cluster is the largest and mostly connected cluster in all studied networks. Centrality parameters also assist in recognizing hubs, bottlenecks, and effective members of the cluster. Combination of these parameters would have biological meanings related to the key proteins of a network. Centrality parameters are effective information for interpretation of PPI networks that requires the extensive support of biochemistry of proteins and nucleotides. 5. References [] U. Stelzl, et al. Human Protein-Protein Interaction Network: Resource for nnotating the Proteome. Cell.25, 22: 957 968. [2] C. Lin, et al. Clustering Methods in Protein-Protein Interaction Network. In: Knowledge Discovery in ioinformatics: Techniques, Methods, and pplications. John Wiley & Sons, Inc. 26, pp. 39-356. [3] M. P. Joy, et al. High-etweenness Proteins in the Yeast Protein Interaction Network. Journal of iomedicine and iotechnology. 25, 2: 96 3. [4] Yu, H, et al. The Importance of ottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics. PLoS Comput iol, 27, 3 : 73-72. [5] J. J. Han, et al. Evidence for dynamically organized modularity in the yeast protein protein interaction network. NTURE, 24, 43: 88-93. [6] H. W. Ma,. P. Zeng. The connectivity structure, giant strong component and centrality of metabolic networks. ioinformatics, 22, 9: 423 43. [7] D. Koschützki, F. Schreiber. Centrality nalysis Methods for iological Networks and Their pplication to Gene Regulatory Networks. Gene Regulation and Systems iology, 28, 2: 93 2. [8] C. V. Mering, et al. STRING: known and predicted protein protein associations, integrated and transferred across organisms. Nucleic cids Research, 25, 33: D433 D437. [9] M. E. Smoot, et al. Cytoscape 2.8: new features for data integration and network visualization. ioinformatics, 2, 27: 43-432. [] G. Scardoni, M. Petterlini, C. Lauda. nalyzing biological network parameters with CentiScaPe. ioinformatics, 29, 25: 2857-2859. [] T. Nepusz, H. Yu,. Paccanaro, Detecting overlapping protein complexes in protein-protein interaction networks. In preparation. [2] Y. ssenov, et al. Coputing topological parameters of biological networks. ioinformatics, 28, 24: 282-284. [3] S. N. Dorogovtsev, J. F. Mendes,. N. Samukhin. Size-dependent degree distribution of a scale-free growing network. Physical Review E, 2, 63: 62-624. [4]. H. JUNKER, F. SCHREIER, nalysis Of iological Networks. WileyInterscience, 28. [5] M. E. J. Newman, ssortative mixing in networks. Physical Review Letters, 22, 89: 287-2875 9