PROTEOMICS. Is the intrinsic disorder of proteins the cause of the scale-free architecture of protein-protein interaction networks?

Similar documents
Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Graph Alignment and Biological Networks

Keywords: intrinsic disorder, IDP, protein, protein structure, PTM

Analysis of Biological Networks: Network Robustness and Evolution

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Deterministic scale-free networks

Self Similar (Scale Free, Power Law) Networks (I)

arxiv:q-bio/ v1 [q-bio.mn] 13 Aug 2004

How Scale-free Type-based Networks Emerge from Instance-based Dynamics

Types of biological networks. I. Intra-cellurar networks

Self-organized scale-free networks

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

TREND OF AMINO ACID COMPOSITION OF PROTEINS OF DIFFERENT TAXA

BioControl - Week 6, Lecture 1

The architecture of complexity: the structure and dynamics of complex networks.

Towards Detecting Protein Complexes from Protein Interaction Data

Systems biology and biological networks

Chapter 8: The Topology of Biological Networks. Overview

Fine-scale dissection of functional protein network. organization by dynamic neighborhood analysis

Finding important hubs in scale-free gene networks

Lecture VI Introduction to complex networks. Santo Fortunato

networks in molecular biology Wolfgang Huber

Biological Networks Analysis

Comparison of Protein-Protein Interaction Confidence Assignment Schemes

HIGH THROUGHPUT INTERACTION DATA REVEALS DEGREE CONSERVATION OF HUB PROTEINS

Basic modeling approaches for biological systems. Mahesh Bule

Emergence of complex structure through co-evolution:

Interaction Network Topologies

Biological networks CS449 BIOINFORMATICS

Bayesian Inference of Protein and Domain Interactions Using The Sum-Product Algorithm

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

Protein Interactions and Fluctuations in a Proteomic Network using an Elastic Network Model

EVOLUTION OF COMPLEX FOOD WEB STRUCTURE BASED ON MASS EXTINCTION

An introduction to SYSTEMS BIOLOGY

Topology of technology graphs: Small world patterns in electronic circuits

Procedure to Create NCBI KOGS

arxiv: v2 [q-bio.mn] 5 Sep 2016

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

UNIVERSITY OF CALIFORNIA SANTA BARBARA DEPARTMENT OF CHEMICAL ENGINEERING. CHE 154: Engineering Approaches to Systems Biology Spring Quarter 2004

Protein-protein interaction networks Prof. Peter Csermely

Constructing More Reliable Protein-Protein Interaction Maps

Dynamics and Inference on Biological Networks

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Hydrophobic, Hydrophilic, and Charged Amino Acid Networks within Protein

Computational Structural Bioinformatics

Cross-fertilization between Proteomics and Computational Synthesis

Evolution of Diversity and Cooperation 2 / 3. Jorge M. Pacheco. Departamento de Matemática & Aplicações Universidade do Minho Portugal

Application of random matrix theory to microarray data for discovering functional gene modules

Biological Networks. Gavin Conant 163B ASRC

Influence of temperature and diffusive entropy on the capture radius of fly-casting binding

Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks

How much non-coding DNA do eukaryotes require?

Comparison of Human Protein-Protein Interaction Maps

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Warm Up. What are some examples of living things? Describe the characteristics of living things

Unit # - Title Intro to Biology Unit 1 - Scientific Method Unit 2 - Chemistry

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Computational Biology: Basics & Interesting Problems

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT-

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

BIOINFORMATICS. Biological network comparison using graphlet degree distribution

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Topology of Protein Interaction Network Shapes Protein Abundances and Strengths of Their Functional and Nonspecific Interactions

Unravelling the biochemical reaction kinetics from time-series data

Graph Theory Approaches to Protein Interaction Data Analysis

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Letter to the Editor Edge vulnerability in neural and metabolic networks

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

BIOINFORMATICS CS4742 BIOLOGICAL NETWORKS

In order to compare the proteins of the phylogenomic matrix, we needed a similarity

A Multiobjective GO based Approach to Protein Complex Detection

Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life

arxiv: v2 [physics.soc-ph] 15 Nov 2017

Supplemental Material

Protein function studies: history, current status and future trends

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Bioinformatics Chapter 1. Introduction

arxiv: v1 [q-bio.mn] 7 Nov 2018

Undergraduate Curriculum in Biology

Supporting Information

Erzsébet Ravasz Advisor: Albert-László Barabási

ECOL/MCB 320 and 320H Genetics

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Graph Theory and Networks in Biology

arxiv:cond-mat/ v1 [cond-mat.soft] 16 Nov 2002

TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg

Complex Graphs and Networks Lecture 3: Duplication models for biological networks

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

V 5 Robustness and Modularity

Supporting Information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Transcription:

Is the intrinsic disorder of proteins the cause of the scale-free architecture of protein-protein interaction networks? Journal: Manuscript ID: Wiley - Manuscript type: Date Submitted by the Author: Complete List of Authors: Key Words: draft Rapid Communication n/a Schnell, Santiago; Indiana University, School of Informatics Fortunato, Santo; Indiana University, School of Informatics Roy, Sourav; Indiana University, School of Informatics Bioinformatics, network architecture, protein disorder, systems biology, Protein-protein interaction

Page 1 of 15 SHORT COMMUNICATION Is the intrinsic disorder of proteins the cause of the scale-free architecture of protein-protein interaction networks? Santiago Schnell, Santo Fortunato and Sourav Roy Indiana University School of Informatics, 1900 East Tenth Street, Eigenmann Hall, Bloomington, IN 47406, USA Abstract In protein-protein interaction networks certain topological properties appear to be recurrent: networks maps are considered scale-free. It is possible that this topology is reflected in the protein structure. In this paper we investigate the role of protein disorder in the network topology. We find that the disorder of a protein (or of its neighbors) is independent of its number of protein-protein interactions. This result suggests that protein disorder does not play a role in the scale-free architecture of protein networks. Keywords: protein-protein interaction networks, protein disorder, network topology Abbreviations: PPI, protein-protein interaction Correspondence: Santiago Schnell, Indiana University School of Informatics and Biocomplexity Institute, 1900 East Tenth Street, Eigenmann Hall 906, Bloomington, IN 47406, USA. Email: schnell@indiana.edu. Fax: +1-812-856-1955. 1

Page 2 of 15 There is much current interest in the structure of biological networks, in particular focusing on their topology [1]. Certain organizational properties appear to be a recurrent feature of biological systems. A power-law probability distribution for the number of links adjacent to the same node (degree) has been reported in systems as diverse as protein folding [2], networks of metabolic reactions [3-4] and ecological food webs [5]. A power-law degree distribution indicates that connectivity is quite heterogeneous on the network, so that many nodes having just a few connections coexist with a considerable number of nodes with many connections (known as hubs). Scale-free networks are resilient to random failures but vulnerable to targeted attacks against the most connected elements or hubs [6]. Network topology is implicated in conferring robustness of network properties, for example in setting up patterns of cellular differentiation in gene regulatory networks in the early development of the fruit fly [7-8], or in providing resistance to mutation of environmental stress [9-10]. Therefore, the determination of the topology of a biological network has become important for assessing the stability, function, dynamics and design aspects of the network [11]. Network topologies are determined using a variety of sample methods, which are then used to infer the topology of the whole network [12]. The application of these methodologies to protein-protein interaction (PPI) networks has revealed that they are scale-free [13-18]. This architecture of the PPI networks must be reflected in the protein structure [19-20]: What is the structural feature of hub proteins which allows them to interact with a large number of partners? 2

Page 3 of 15 Recently, Fernández and Berry [21] made an attempt to answer this question. They examined the structure wrapping, which protects a protein from water attack, in the PPI network of Saccharomyces cerevisiae [13]. They found that relaxing the structure of the packing interface increases the number of binding interactions in a protein. However, this study did not examine the correlation between the degree of a protein (number of interactions) in a PPI network and the extent of its overall wrapping deficiency. As a consequence, the results of this study are of limited applicability for understanding the architecture of a PPI network. Fernández and Berry [21] have found that there is a direct correlation between the number of hydrogen bonds that are deficiently protected from water and the inherent structural disorder of a protein. Intrinsically disordered proteins do not have specific three-dimensional structures [22-23]. They exist in wide-motion conformation ensembles both in vitro and in vivo [24-25]. The intrinsically disordered regions of a protein can provide flexible linkers between functional domains, which facilitate binding diversity and can explain the existence of protein hubs [19-20]. Dunker and co-workers [19] have proposed two roles for intrinsic disorder in PPI: I. Intrinsic disorder can serve as the structural basis for hubs promiscuity, and II. Intrinsically disordered proteins can bind to structured hub proteins. In this short communication, we determine if there is a correlation between the number of interactions of a protein in a PPI network and the disorder of the protein itself or that of its neighbors. 3

Page 4 of 15 We started downloading the PPI networks for Homo sapiens (human), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), Caenorhabditis elegans (worm) and Escherichia coli (bacterium) from the Biomolecular Interaction Network Database (BIND http://bind.ca) [26]. The information collected was cured using the lowthroughput track of the database and by filtering out redundant interactions. First, we carried out a standard analysis of the five networks under consideration. In Table 1 we list the main attributes of the networks: the number of nodes, the number of links, the average degree and the diameter. The average degree reveals that all networks are quite sparse; as a matter of fact, they are fragmented in many components, with most components consisting of a single node. Due to the limited size of the networks, a statistical analysis is meaningful only for the largest systems. Figure 1 displays the degree distribution of the human PPI network. The data points were obtained by averaging the probability within logarithmic bins of degree, to reduce fluctuations. The distribution looks quite skewed, with a tail following approximately a power law with exponent 2.5. Our aim is to address quantitatively the issue of the existence of a correlation between the topological connectivity of a protein, expressed by its degree k, and the disorder of the protein itself or that of its neighbors. We employ the VL3 model [27] to predict intrinsically disordered regions of protein sequences for the PPI networks under consideration. The VL3 predictor is a feed-forward neural network algorithm which uses attributes such as amino acid compositions, average flexibility [28] and sequence complexity [29] for calculating the disorder score. The disorder score is a value between 0 and 1. We calculate the number of disordered regions in a protein as those putative 4

Page 5 of 15 segments which were of 10 or more residues in length with a disorder score of 0.5 or greater. We estimate these segments, because the increasingly recognized mechanism of interaction between intrinsically disordered proteins and their binding partners is through molecular recognition elements. These elements are typically located within disordered regions, which provide enough physical space and flexibility to accommodate access of a variety of potential partners thereby facilitating binding with multiple and diverse targets [30-32]. We computed the standard Pearson correlation coefficient between the degree of all proteins of a network and their disorder, as expressed by the disorder score and by the number of disordered regions. The corresponding correlation coefficients for the protein (node) degree versus disorder score and protein (node) degree versus number of disordered regions are listed in Table 2. They are all close to zero. This result indicates the absence of correlations between the number of interactions of a protein and its disorder coefficient, and between the number of interactions of a protein and its number of disordered segments. We completed the analysis by calculating the Pearson correlation coefficients between the degree k of a protein and the average disorder of its neighbors. The corresponding coefficients are listed in Table 3, and they are again very close to zero, hinting at the absence of correlations between the degree of the protein and the average disorder of its neighbors. We have as well investigated the relation between degree and disorder, as the knowledge of the correlation coefficient only gives a global indication on the correlation between 5

Page 6 of 15 two variables. This type of analysis is particularly suitable for large systems, as one ideally wishes to have a significant number of data points for each degree value. Therefore we present here the result of the analysis for the largest network we have, i.e. the human PPI network. Figure 2 shows four scatter plots between degree and disorder. To show the trend of the plots, we averaged disorder within logarithmic bins of degree. The resulting curves are essentially flat, which shows that the disorder of a protein (or of its neighbors) is independent of its degree, confirming the absence of correlation we had derived from the examination of the correlation coefficients. From our analysis, we find that an increase in the disorder score or number of disordered regions of a protein does not increase its topological connectivity (nor does the disorder of its neighbors). We also learned that the average disorder of the neighbors of a hub protein does not correlate with the number of interactions of the hub protein. A limitation of our analysis is that we have calculated the disorder coefficient for the entire protein sequence, and we do not specifically study the regions involved directly in proteinprotein binding. This investigation could be refined by studying the disorder of the residuals involved in the binding of proteins. The intracellular environment is heterogeneous, structured and compartmentalized. Biochemical reactions are diffusion limited and proteins are contained in different compartments [33]. The networks considered in this communication were treated as a homogeneous landscape, because they do not reflect the underlying cellular and molecular process. Whether this would also affect the results of our analysis needs to be determined. By itself, the discovery that PPI networks have scale-free properties is of limited use to biologists [34]. An important 6

Page 7 of 15 research direction is investigating what structural or molecular properties of the proteins are the bases for the scale-free architecture of PPI networks. We are grateful for the useful suggestions made by Profs. A. Keith Dunker and Vladimir N. Uversky (Department of Biochemistry and Molecular Biology, Indiana University School of Medicine), and Profs. Predrag Radivojac (Indiana University School of Informatics) during the development of this work. We would like to thank Alessandro Vespignani (Indiana University School of Informatics) for his critical comments. The VL3 predictor for protein disorder was kindly provided Prof. Radivojac. SS would like to the FLAD Computational Biology Collaboratorium at the Gulbenkian Institute in Oeiras, Portugal, for hosting and providing facilities used to conduct part of this research between April 2nd-10th. This research has been supported by the Division of Information & Intelligent Systems, National Science Foundation, Grant No. 0513650. Any opinions, findings, conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the United States Government. References [1] Crampin, E.J., Schnell, S., Prog. Biophys. Mol. Biol. 2004, 86, 1-4. [2] Koonin, E.V., Wolf, Y.I., Karev, G.P., Nature 2002, 420, 218 223. [3] Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., et al., Nature 2000, 407, 651 654. 7

Page 8 of 15 [4] Wagner, A., Fell, D.A., Proc. R. Soc. London Ser. B 2001, 268, 1803 1810. [5] Solé, R.V., Ferrer-Cancho, R., Montoya, J.M., Valverde, S., Complexity 2003, 8, 20 33. [6] Albert, R., Jeong, H., Barabási, A.-L., Nature 2000, 406, 378-382. [7] von Dassow, G., Meir, E., Munro, E.M., Odell, G.M., Nature 2000, 406, 188 192. [8] Albert, R., Othmer, H.G., J. Theor. Biol. 2003, 223, 1 18. [9] Wagner, A., Nat. Genet. 2000, 24, 335-361. [10] Yeong, H., Mason, S.P. Barabási, A.-L., Oltvai, Z.N., Nature 2001, 411, 41-42. [11] Barabási, A.-L., Oltvai, Z.N., Nature Rev. Genet. 2004, 5, 101 113. [12] Albert, R., Barabási, A.-L., Rev. Mod. Phys. 2002, 74, 47-97. [13] Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., et al., Nature 2000, 403, 623-627. [14] Ito, T., Chiba, T., Ozawa, R., Yoshida, M., et al., Proc. Natl. Acad. Sci. USA 2001, 98, 4569-4574. [15] Reboul, J., Vaglio, P., Rual, J.F., Lamesch, P., et al., Nat. Genet. 2003, 34, 35-41. [16] Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., et al., Science 2003, 302, 1727-1736. [17] Li, S.M., Armstrong, C.M., Bertin, N., Ge, H., et al., Science 2004, 303, 540-543. 8

Page 9 of 15 [18] Han, J.D.J., Bertin, N., Hao, T., Goldberg, D.S., et al., Nature 2004, 430, 88-93. [19] Dunker, A.K., Cortese, M.S., Romero, P., Iakoucheva, L.M., et al., FEBS Journal 2005, 272, 5129-5148. [20] Uversky, V.N., Oldfield, C.J., Dunker, A.K., J. Mol. Recog. 2005, 18, 343-384. [21] Fernández, A., Berry, R.S., Proc. Natl. Acad. Sci. USA 2005, 101, 13460-13465. [22] Wright, P.E., Dyson, H.J., J. Mol. Biol. 1999, 293, 321-331. [23] Dunker, A.K., Lawson, J.D., Brown, C.J., Williams, R.M., et al., J. Mol. Graph. Model. 2001, 19, 26-59. [24] Uversky, V.N., Gillespie, J.R., Fink, A.L., Proteins 2000, 41, 415-427. [25] Dedmon, M.M., Patel, C.N., Young, G.B., Pielak, G.J., Proc. Natl. Acad. Sci. USA 2002, 99, 12681 12684. [26] Bader, G.D., Donaldson, I., Wolting, C., Ouellette, B.F.F., et al., Nucleic Acids Res. 2001, 29, 242-245. [27] Obradovic, Z., Peng, K., Vucetic, S., Radivojac, P., et al., Proteins, 2003, 53, 566-572. [28] Vihinen, M., Torkkila, E., Riikonen, P., Proteins 1994, 19, 141-149. [29] Wootton, J.C., Comput. Chem. 1994, 18, 269-285. 9

Page 10 of 15 [30] Fuxreiter, M., Simon, I., Friedrich, P., Tompa P., J. Mol. Biol. 2004, 338, 1015-1026. [31] Oldfield, C.J., Cheng, Y., Cortese, M.S., Romero, P., et al., Biochemistry 2005, 44, 12454-12470. [32] Roy, S., Schnell, S., Radivojac, P., Comp. Biol. Chem. 2006, DOI: 10.1016/j.compbiolchem.2006.04.005 [33] Schnell, S., Turner, T.E., Prog. Biophys. Mol. Biol. 2004, 85, 235-260. [34] Bray, D., Science 2003, 301, 1864-1865. 10

Page 11 of 15 Table 1. Protein-protein interaction network properties. Species Nodes Links Average Diameter degree Human 2,758 3,237 2.347 27 Yeast 881 1,107 2.513 21 Fly 353 280 1.586 8 Worm 244 244 2.000 7 Bacterium 166 132 1.590 4 11

Page 12 of 15 Table 2. Pearson correlation coefficients between the degree of a protein and its disorder: the latter is expressed by the disorder score or by the number of disordered regions. The degree of a node corresponds to number of interactions of a protein. The disorder coefficients are calculated using the VL3 predictor [27]. Species Correlation Coefficient k-disorder Correlation Coefficient k-disordered regions Human -8.45212E-004 1.75295E-002 Yeast 3.80361E-002 7.76235E-002 Fly 7.10621E-003 2.88277E-002 Worm -1.18723E-002-6.00492E-002 Bacterium 5.15177E-002 0.11902 12

Page 13 of 15 Table 3. Pearson correlation coefficients between the degree of a protein and the average disorder of its neighbors: the latter is expressed by the disorder score or by the number of disordered regions. The degree of a node corresponds to the number of interactions of a protein. The disorder coefficients are calculated using the VL3 predictor [27]. Species Correlation Coefficient k-disorder Correlation Coefficient k-disordered regions Human 2.27327E-002 4.00298E-002 Yeast -1.63826E-002 2.15817E-002 Fly 1.59666E-002-1.09563E-002 Worm 0.26913-1.15449E-002 Bacterium 7.39843E-002 0.11434 13

Page 14 of 15 Figure 1. Degree distribution for the human protein-protein interaction network. The degree of a node corresponds to the number of interactions of a protein. Note that the distribution is quite broad, spanning two orders of magnitude in degree. The dashed line is a tentative power law fit of the tail of the distribution: the exponent is 2.5. 14

Page 15 of 15 Figure 2. Relation between disorder and degree for the human PPI network. The two plots on the left show the variability with degree of the disorder of a protein, with the latter being expressed by the disorder score (top left) and by the number of disordered regions (bottom left). The two plots on the right show how the average disorder of the neighbors of a protein varies with the degree of the protein. Again, we use both the disorder score (top right) and the number of disordered regions (bottom right). 15