arxiv:q-bio/ v1 [q-bio.mn] 13 Aug 2004

Similar documents
A General Model for Amino Acid Interaction Networks

Node Degree Distribution in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Research Article A Topological Description of Hubs in Amino Acid Interaction Networks

Hydrophobic, Hydrophilic, and Charged Amino Acid Networks within Protein

Ant Colony Approach to Predict Amino Acid Interaction Networks

Small-world communication of residues and significance for protein dynamics

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Self-organized scale-free networks

arxiv:physics/ v1 [physics.bio-ph] 18 Jan 2006

Erzsébet Ravasz Advisor: Albert-László Barabási

How Scale-free Type-based Networks Emerge from Instance-based Dynamics

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Self Similar (Scale Free, Power Law) Networks (I)

Basics of protein structure

networks in molecular biology Wolfgang Huber

arxiv:cond-mat/ v1 2 Feb 94

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Network Structure of Protein Folding Pathways

A Study of the Protein Folding Dynamic

Deterministic scale-free networks

Comparative analysis of transport communication networks and q-type statistics

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

Protein structure similarity based on multi-view images generated from 3D molecular visualization

Supporting Online Material for

Protein Structure: Data Bases and Classification Ingo Ruczinski

Complex networks and evolutionary games

The architecture of complexity: the structure and dynamics of complex networks.

What is the central dogma of biology?

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Biological Networks Analysis

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Biological Networks. Gavin Conant 163B ASRC

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Evolving network with different edges

Protein Folding Prof. Eugene Shakhnovich

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein structure forces, and folding

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Introduction to" Protein Structure

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran

Protein folding. Today s Outline

Data Mining and Analysis: Fundamental Concepts and Algorithms

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Cross-fertilization between Proteomics and Computational Synthesis

BioControl - Week 6, Lecture 1

Complex (Biological) Networks

CS224W: Social and Information Network Analysis

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Department of Pathology, Northwestern University Medical School, Chicago, IL 60611

Understanding Protein Structure from a Percolation Perspective

UNIVERSALITIES IN PROTEIN TERTIARY STRUCTURES: SOME NEW CONCEPTS

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

arxiv: v1 [nlin.cg] 23 Sep 2010

Supplementary Information

On Resolution Limit of the Modularity in Community Detection

STRUCTURAL BIOLOGY AND PATTERN RECOGNITION

BIBC 100. Structural Biochemistry

Complex networks: an introduction

Evolution of a social network: The role of cultural diversity

arxiv: v2 [physics.soc-ph] 15 Nov 2017

Graph Theory and Networks in Biology

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme

Computer simulations of protein folding with a small number of distance restraints

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Networks as a tool for Complex systems

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Bio nformatics. Lecture 23. Saad Mneimneh

Heteropolymer. Mostly in regular secondary structure

Objective: Students will be able identify peptide bonds in proteins and describe the overall reaction between amino acids that create peptide bonds.

A Modified Earthquake Model Based on Generalized Barabási Albert Scale-Free

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Optimization of the Sliding Window Size for Protein Structure Prediction

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Biomolecules: lecture 10

Supersecondary Structures (structural motifs)

Complex (Biological) Networks

Types of biological networks. I. Intra-cellurar networks

BIRKBECK COLLEGE (University of London)

A Structure-Centric View of Protein Evolution, Design and Adaptation

Analysis of Complex Systems

arxiv:cond-mat/ v1 [cond-mat.soft] 30 Dec 2003

Graph Alignment and Biological Networks

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 3 Oct 2005

THRESHOLDS FOR EPIDEMIC OUTBREAKS IN FINITE SCALE-FREE NETWORKS. Dong-Uk Hwang. S. Boccaletti. Y. Moreno. (Communicated by Mingzhou Ding)

Random matrix analysis of complex networks

Topology of technology graphs: Small world patterns in electronic circuits

Better Bond Angles in the Protein Data Bank

Evolutionary Games and Computer Simulations

Transcription:

Network properties of protein structures Ganesh Bagler and Somdatta Sinha entre for ellular and Molecular Biology, Uppal Road, Hyderabad 57, India (Dated: June 8, 218) arxiv:q-bio/89v1 [q-bio.mn] 13 Aug 2 Protein structures can be studied as complex networks of interacting amino acids. We study proteins of different structural classes from the network perspective. Our results indicate that proteins, regardless of their structural class, show small-world network property. Various network parameters offer insight into the structural organisation of proteins and provide indications of modularity in protein networks. PAS numbers: 89.75.-k, 87.1.Ee, 5.5.+b Keywords: protein, network, amino acid, graph theory I. INTRODUTION Biological systems have been studied as networks at different levels: protein-protein interaction network [1], metabolic pathways network [2, 3], gene regulatory network [], and protein as a network of amino acids [5,, 7, 8]. Proteins are biological macromolecules made up of a linear chain of amino acids and are organised into threedimensional structure comprising of different secondary structural elements. They perform diverse biochemical functions and also provide structural basis in living cells. It is important to understand how proteins consistently fold into their native-state structures and the relevance of structure to their function. Network analysis of protein structures is one such attempt to understand possible relevance of various network parameters. There have been several efforts to study proteins as networks. Aszódi and Taylor [5] compared the linear chain of amino acids in a protein and its three dimensional structure with the help of two topological indices connectedness and effective chain length related to path length and degree of foldedness of the chain. Kannan and Vishveshwara[9] have used the graph spectral method to detect side-chain clusters in three-dimensional structures of proteins. In recent years, with the elaboration of network properties in a variety of real networks, Vendruscolo et al. [] showed that protein structures have small-world topology. They also studied transition state ensemble (TSE) structures to identify the key residues that play a key role of hubs in the network of interactions to stabilise the structure of the transition state. Greene and Higman [7] studied the short-range and long-range interaction networks in protein structures and showed that long-range interaction network is not small world and its degree distribution, while having an underlying scale-free behaviour, is dominated by an exponential term indicative of a single-scale system. Atilgan et al. [8] studied the network properties of the core and surface of globular protein structures, and established that, regardless of size, the cores have the same local packing arrangements. They also explained, with an example of binding of two proteins, how the small-world topology could be useful in efficient and effective dissipation of energy, generated upon binding. In this study, we model the native-state protein structure as a network made of its constituent amino-acids and their interactions. The α atom of the amino acid has been used as a node and two such nodes are said to be linked if they are less than or equal to a threshold distance apart from each other [, 8]. We use 7Å as the threshold distance. Our results show that proteins are small-world networks regardless of their structural classification across four major groups as enumerated in Structural lassification of Proteins (SOP) [1]. We also highlight the differences in some of the network properties among these classes. Our studies are indicative of the modular nature of these networks. II. METHODOOGY The four structural classes (from SOP) of proteins chosen are: α, β, α + β, and α/β. The α proteins are composed predominantly of α helices, and the β proteins of β sheets. The α+β proteins mainly have anti-parallel β sheets, whereas those in α/β consist of mainly parallel beta sheets. We consider 2 proteins from each of these classes whose sizes range from 73 to 2359 amino acids. The structural data is obtained from the Protein Data Bank (PDB) [11]. The parameters used to characterise the network are : (i) The Degree (k i ) of a node i is the number of nodes to which it is directly connected. Average degree, K, of a network with N nodes is defined as K = 1 N k i. i=1 (ii) The Average Shortest Path ength is defined as Electronic address: ganesh@ccmb.res.in Electronic address: sinha@ccmb.res.in = N 1 1 N(N 1) i=1 j=i+1 ij,

2 2 18 1 1 α+ α/ 2 15 α+ α/ (b) 12 1 1 8 2 5.2...8 1 1 1 2 log (N) 1 3 1 FIG. 1: The plot of proteins from four structural classes. (b) Increase in the of proteins with logarithmic increase in size (N). Random controls are indicated by an arrow in both the figures. where ij is the shortest path length between nodes i and j. The Diameter (D) of the network is the largest of all the shortest path lengths. (iii) The lustering oefficient ( i ) for a node i is defined as the fraction of links that exist among its nearest neighbours to the maximum number of possible links among them. The Average lustering oefficient () of the network is defined as = 1 N i. i=1 A network is a small-world network if it has high and if its scales logarithmically with N [12]. A network lacking a characteristic degree and having degree distribution of a power-law form is known as scale-free network [13]. III. RESUTS A. Network parameters for different structural classes of proteins We calculate and for each protein. As controls, we calculate the and of random graphs and onedimensional (1-d) regular graphs of the same N and K. Figure 1 shows the plot for the proteins and their random controls (indicated by an arrow). The averages of the distribution of and for the protein networks are.88 ± 2.1 and.553 ±.27, respectively. The random and random are 2.791±.38and.31±.22, and that for the comparable 1-d regular graphs are regular = 29. ± 25.97 and regular =.3 ±.. The Kolmogorov-Smirnov test [1] shows that the differences between and of the proteins and random or regular lattices (not shown in Fig. 1) are statistically significant. Thus, these protein networks have significantly high clustering coefficient than their random counterparts and the and -values fall between the random and regular networks in the plot. Fig. 1(b) shows of all proteins with different N and their random counterparts (indicated by an arrow). It can be seen that increases with logn, regardless of the structural classification of the proteins and the slope is higher than the random controls. This property, along with high, indicate that protein networks are smallworld networks [12]. B. Degree Distribution The distribution of the degrees is an important property which characterises the network topology. The degree distribution of a random network is characterised by a Poisson distribution. Figure 2 shows the degree distributions of α, β, α +β, and α/β protein networks. The shape of these distributions are bell-shaped, Poissonlike [7], and the number of nodes with very high degree falls off rapidly. This is understandable as there is a physical limit on the number of amino acids that can occupy the space within a certain distance around another amino acid. Such system-specific restrictions have been identified to be responsible for the emergence of different classes of networks with characteristic degreedistributions by Amaral et al. [1]. They observed that preferential attachment to vertices in many real scalefree networks [15] can be hindered by factors like ageing of the vertices (e.g. actors networks), cost of adding links to the vertices, or, the limited capacity of a vertex (e.g. airports network).. Fibrous proteins Most proteins are globular proteins in their threedimensional structure, where the polypeptide chain folds into a compact shape. In contrast, fibrous proteins

3 3 3 25 (b) 2 2 Number of amino acids with degree k 1 5 1 15 2 25 (c) 2 15 1 5 15 1 5 3 2 1 5 1 15 2 (d) 5 1 15 2 Degree (k) 5 1 15 2 Degree (k) FIG. 2: Degree distributions for α, (b) β, (c) α+β, and (d) α/β proteins, 2 of each class. have relatively simple, elongated three-dimensional structure suitable for their biological function (see Fig. 3(b)). The small-world nature of globular proteins was argued [8] to be required for enhancing the ease of dissipation of disturbances. We studied fibrous proteins and compared their network properties with globular proteins of comparable size. As shown in the plot in Fig. 3, fibrous proteins have larger, although the are similar to those of globular proteins. Thus, in this respect, the fibrous proteins also show small-world properties. The average diameter for the fibrous proteins (D = 15) was found to be larger than that of the globular proteins (D = 8.57). This is expected because of the elongated structure of fibrous proteins. Despite this major difference in structure, the network properties are not much different between the fibrous proteins and globular proteins. This indicates that the small-world property of proteins is ubiquitous and persists irrespective of structural differences. D. α and β proteins As seen earlier, both α and β proteins show smallworld properties. On finer analysis, we find that there is a marginal, yet consistent difference in the of α and β proteins as shown in Fig.. The mean of for α and β proteins studied are.588 and.538, respectively. According to Kolmogorov-Smirnov test, this difference is statistically significant. Owing to the helical structure of the α proteins, the amino acids are densely packed compared to that of the flat β sheets. This may contribute to the small increase in the Average lustering oefficient of the α proteins. Since α + β and α/β have a mixed composition of α helices and β sheets, they do not show any clear distinction. E. hange in with N lustering coefficient characterises local organisation. For both random as well as scale-free networks, the is expected to fall with increasing size [15]. It has been shown [8] that, regardlessof the size, the remains almost same in the core of the protein. We show the change in with increasing protein size (N) in Fig. (b). The figure shows that does not change significantly with increasing size of proteins. Similar result [3] is shown for the metabolic networks of 3 distinct organisms. This property is suggestive of potential modularity in the topology of the protein networks. IV. ONUSIONS Our results show that protein networks have smallworld property regardless of their structural classification (α, β, α+β, and α/β) and tertiary structures (glob-

.5 Fibrous Proteins () Globular Proteins (7) 5.5 5.5 3.5 3.5.52.5.5.58..2...8.7 FIG. 3: plot of Fibrous and Globular proteins. (b) Examples of three-dimensional structures of a fibrous and globular protein (not to the scale) with their PDB codes. 1 1 1. α+ α/ 12.8 1. 8. 2.5.52.5.5.58..2..2 5 1 15 2 25 N FIG. : plot for α and β proteins. Arrows indicate the means of for α and β proteins.(b) hange in of proteins with increasing N. ular and fibrous proteins), even though small but definite differences exist between α and β classes, and fibrous and globular proteins. The size independence of the Average lustering oefficient in proteins indicates toward an inherent modular organisation in the protein network. In the cell, starting from a linear chain of amino acids, the protein folds in different secondary motifs such as, the α helices and β sheets and their mixtures. These then assume three-dimensional tertiary structures with helices, sheets and random coils, folding to give the final shape that is useful to carry on the biochemical function. This structure evolves in such a way as to confer stability and also allow transmission of biochemical activity (binding of ligand, allostery, etc.) for efficient functioning. Thus the networks built from such proteins are expected to show high clustering and also reflect its modular or hierarchically folded organisation. Unlike other hierarchical networks [3] that are modelled to form by replicating a core set of nodes and links, this network primarily grows linearly first, and then this polypeptide chain organises itself in a modular manner at different levels (secondary and tertiary). Evolution of such type of network architectures demands further study. Acknowledgments GB thanks ouncil of Scientific and Industrial Research (SIR), Govt. of India, for the Senior Research Fellowship. The authors thank Bosiljka Tadic for discussions. [1] S. Y. Yook, Z. N. Oltvai, A.-. Barabási, Proteomics (2) 928 92. [2] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, A.-.

5 Barabási, Nature 7 (2) 51 5. [3] E. Ravsaz, A.. Somera, D. A. Mongru, Z. N. Oltvai, A.-. Barabási, Science 297 (22) 1551 1555. [] R.Albert, H.G. Othmer, J. Theor. Biol. 223 (23) 1 18. [5] A. Aszódi, W. R. Taylor, ABIOS 9 (1993) 523 529. [] M. Vendruscolo, N. V. Dokholyan, E. Paci, M. Karplus, Phys. Rev. E 5 (22) 191. [7]. H. Greene, V. A. Higman, J. Mol. Biol. 33 (23) 781 791. [8] A. R. Atilgan, P. Akan,. Baysal, Biophys. J. 8 (2) 85 91. [9] N. Kannan, S. Vishveshwara, J. Mol. Biol. 292 (1999) 1. [1] A. G. Murzin, S. E. Brenner, T. Hubbard,. hothia, J. Mol. Biol. 27 (1995) 53 5. [11] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourneand, W. R. Taylor, Nucleic Acids Res. 28 (2) 235 22. [12] D. J. Watts, Small Worlds The Dynamics of Networks Between Order and Randomness, Princeton University Press, 1999. [13] A.-. Barabási, R. Albert, Science 28 (1999) 59 512. [1] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical Recipes in, 2nd Edition, ambridge University Press, 1993. [15] R. Albert, A.-. Barabási, Rev. Mod. Phys. 7 (22) 7 97. [1]. A. N. Amaral, A. Scala, M. Barthélémy, H. E. Stanley, Proc. Natl. Acad. Sci. (USA) 97 (2) 1119 11152.