Discrete structure of van der Waals domains in globular proteins

Similar documents
Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

The CATH Database provides insights into protein structure/function relationships

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Universal Similarity Measure for Comparing Protein Structures

Heteropolymer. Mostly in regular secondary structure

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein Structure: Data Bases and Classification Ingo Ruczinski

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

A General Model for Amino Acid Interaction Networks

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

arxiv:cond-mat/ v1 2 Feb 94

Supporting Online Material for

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

AN INVESTIGATION INTO THE DEPENDENCE OF EGG PROTEIN DENATURATION ON TEMPERATURE.

Basics of protein structure

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein Structures. 11/19/2002 Lecture 24 1

Ab-initio protein structure prediction

Computer simulations of protein folding with a small number of distance restraints

Free energy, electrostatics, and the hydrophobic effect

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

arxiv: v1 [physics.bio-ph] 26 Nov 2007

Protein Structure Prediction

Useful background reading

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Bio nformatics. Lecture 23. Saad Mneimneh

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Unfolding CspB by means of biased molecular dynamics

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

SUPPLEMENTARY MATERIALS

Supersecondary Structures (structural motifs)

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

BCMP 201 Protein biochemistry

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Protein Structure & Motifs

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

D Dobbs ISU - BCB 444/544X 1

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

STRUCTURAL BIOINFORMATICS. Barry Grant University of Michigan

Motif Prediction in Amino Acid Interaction Networks

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th

= (-22) = +2kJ /mol

Prediction and refinement of NMR structures from sparse experimental data

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

CAP 5510 Lecture 3 Protein Structures

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

SUPPLEMENTARY INFORMATION

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

Introduction to" Protein Structure

Protein Folding. I. Characteristics of proteins. C α

Template Free Protein Structure Modeling Jianlin Cheng, PhD

The protein folding problem consists of two parts:

Computer Modeling of Protein Folding: Conformational and Energetic Analysis of Reduced and Detailed Protein Models

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Packing of Secondary Structures

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein Folding Prof. Eugene Shakhnovich

Protein structure analysis. Risto Laakso 10th January 2005

arxiv: v1 [cond-mat.soft] 22 Oct 2007

Orientational degeneracy in the presence of one alignment tensor.

Protein structure alignments

Molecular Dynamics Studies of Human β-glucuronidase

Computer simulation of polypeptides in a confinement

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

BIBC 100. Structural Biochemistry

Conformational Geometry of Peptides and Proteins:

Competition between face-centered cubic and icosahedral cluster structures

Lipid Regulated Intramolecular Conformational Dynamics of SNARE-Protein Ykt6

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Protein Structure Basics

The sequences of naturally occurring proteins are defined by

Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements

Multimedia : Fibronectin and Titin unfolding simulation movies.

DATE A DAtabase of TIM Barrel Enzymes

Abstract. Introduction

Biomolecules: lecture 10

MODIFIED PROXIMITY CRITERIA FOR THE ANALYSIS OF THE SOLVATION OF A POLYFUNCTIONAL SOLUTE.

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Exploring the Changes in the Structure of α-helical Peptides Adsorbed onto Carbon and Boron Nitride based Nanomaterials

D. Antiparallel β Domains

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Performing a Pharmacophore Search using CSD-CrossMiner

Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Supplementing information theory with opposite polarity of amino acids for protein contact prediction

The typical end scenario for those who try to predict protein

Secondary structure stability, beta-sheet formation & stability

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Transcription:

Protein Engineering vol.16 no.3 pp.161 167, 2003 DOI: 10.1093/proeng/gzg026 Discrete structure of van der Waals domains in globular proteins Igor N. Berezovsky Department of Structural Biology, The Weizmann Institute of Science, P.O.B. 26, Rehovot 76100, Israel Present address: Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street M-105, Cambridge, MA 02138, USA E-mail: inberez@fas.harvard.edu Most globular proteins are divisible by domains, distinct substructures of the globule. The notion of hierarchy of the domains was introduced earlier via van der Waals energy profiles that allow one to subdivide the proteins into domains (subdomains). The question remains open as to what is the possible structural connection of the energy profiles. The recent discovery of the loop-n-lock elements in the globular proteins suggests such a structural connection. A direct comparison of the segmentation by van der Waals energy criteria with the maps of the locked loops of nearly standard size reveals a striking correlation: domains in general appear to consist of one to several such loops. In addition, it was demonstrated that a variety of subdivisions of the same protein into domains is just a regrouping of the loop-n-lock elements. Keywords: closed loops/formation and alteration of domains/ hierarchy of domain structure/loop-n-lock structure/protein folding/protein structure Introduction Proteins consist of distinct, semi-independent, stable structural fragments (domains) that were elucidated from the results of limited proteolysis (Porter, 1959) even before the first protein X-ray structure was determined (Kendrew et al., 1958). A variety of computational methods for identifying domains was developed (Berezovskii and Tumanyan, 1995; Islam et al., 1995; Siddiqui and Barton, 1995; Berezovskii et al., 1997; Berezovsky et al., 1997, 1999; Jones et al., 1998; Wernisch et al., 1999) and various factors in the formation of the domains have been discussed (Doolittle, 1995). Hierarchic organization of the protein globule was first described by Crippen (Crippen, 1978) and Rose (Rose, 1979) and recently discussed in the model of hierarchic protein folding (Baldwin and Rose, 1999a,b). Van der Waals energy calculations (Berezovskii and Tumanyan, 1995; Berezovskii et al., 1997; Berezovsky et al., 1997, 1999, 2000a) allow the detection of the domains and changes in the domain structure at different energy levels. Domains can be further subdivided into energetically distinct segments and combinations thereof. The energybased dissection of globular proteins into domains (Berezovskii and Tumanyan, 1995; Berezovskii et al., 1997; Berezovsky et al., 1997, 1999, 2000b) appears to be the most natural way to separate independent elements and to evaluate interactions between them. It was recently discovered that the globular proteins are universally built of nearly standard size closed loops (loop-n- lock elements), in other words, returns of the chain trajectory with tight end-to-end contacts (Berezovsky et al., 2000c; Berezovsky and Trifonov, 2001a,b). In order to establish the possible connection between energy-derived domains and subdomains on the one hand and the structurally defined loopn-lock elements on the other, we compared the results obtained by these two independent approaches. The detailed comparisons of the energy maps with positions of the primary closed loops, as described below, show that the domains are essentially made of the loops and the hierarchy of domain structure is defined by interactions between the loops and by the loop regrouping. Materials and methods Structural data Major representatives (Orengo et al., 1994) of the protein superfolds [(Globin (1thb), Trefoil (1i1b), Up-Down (256b), Immunoglobulin (2rhe), αβ Sandwich (1aps), Jelly Roll (2stv), Doubly Wound (4fxn), UB αβ roll (1ubq), TIM-Barrel (7tim)] were analyzed. X-ray data from the Protein Data Bank were supplemented with coordinates of H-atoms (Berezovsky et al., 1999). Hierarchy of protein domain structure by the van der Waals energy approach The algorithm is based on the segmentation of the globule into parts with high concentration of van der Waals energy and further detailed analysis of interactions between these segments. Van der Waals energies were calculated for all pairs of contacting atoms. Only the contact distances between 2.5 and 5.0 Å were considered (Berezovsky et al., 1999). The Lennard Jones 6 12 potential and the standard Scheraga parameters for different types of atoms were used (Dunfield et al., 1978; Nemethy et al., 1983). The van der Waals energies were calculated for atoms belonging to residues separated by at least two amino acids along the polypeptide chain (Berezovskii and Tumanyan, 1995). Figure 1A demonstrates an example of van der Waals energy walks for the TIM-Barrel fold (7tim). Every point of the curve of the plot is an energy interaction between the parts of the globule separated by a given amino acid residue. The procedure of domain detection and setting of the levels of energy to establish the hierarchy of the domains consists of the following steps: (i) Calculation of the interaction energy between parts of the native globule separated by each given residue. The minimal energy value (E 0 ) is found on the curve of interaction energy between parts of the globule. Every local maximum on the curve of interaction energy between parts of the globule corresponds to a point of separation. The null value of the interaction energy means complete energy independence of the adjacent regions from each other. Protein Engineering 16(3), Oxford University Press 161

I.N.Berezovsky Figure 1. Van der Waals interaction energy curves: energy walks for triosephosphate isomerase (7tim). (A) Initial energy curve. Position 190 corresponds to the minimum of interaction energy (E 0 ). Positions 64 and 124 are maxima of the interaction energy that correspond to segmentation with potential barrier 0.25E 0 (see graph B). (B) Energy curves for segments 1 63, 65 123 and 125 248 (value of potential barrier is 0.25E 0 0.2E 0 ). (C) The same curve for the value of barrier 0.15E 0 : segments 1 63, 65 123, 125 160 and 162 248. (D) Barrier 0.1E 0 : segments 1 63, 65 91, 93 123, 125 160, 162 210 and 212 248. (E) Barrier 0.05E 0 : segments 1 63, 65 91, 93 103, 105 123, 125 160, 162 173, 175 210 and 212 248. (ii) Setting of potential barriers, e.g. 0.3E 0, 0.25E 0, 0.2E 0, 0.15E 0, 0.1E 0, 0.05E 0, and analysis of the initial curve of interactions between parts of the globule at different levels of the potential barrier. Any maximum on the initial curve is considered to be a point of structural separation if the differences between this maximum and neighboring deep minima exceed the value of a chosen potential barrier. This generates sets of structural segments corresponding to the values of the barriers. Thus, alternative domains can be defined at different levels of the barrier. These segments are characterized as follows: internal energy of isolated segments e ii, integral energy of external interaction of each segment with others e ij and interaction energies Σj i for each pair of segments e ij (i,j 1,..., K, where K is the number of segments). (iii) Analysis of the interaction energy within the structural segments separated at the previous step, and between the 162 segments. Any isolated segment is considered as a candidate domain if e ii e ij (here and below, we 1.5Σj i compare absolute values of energy). Any candidate domain will be classified as domain if e ii e ij. Any two 3Σj i potential domains i and k will be combined into one independent domain if e ik Σ j k,i e ij and e ki Σ l k,i e kl simultaneously. Two potential domains will also be combined if more than 70% of the external energy of one domain pertains to the interaction with the other domain. Any isolated segment with e ii e ij will be joined with 1.5Σj i the segment or potential domain for which the first segment has maximal external interaction energy. The procedure is explained in further detail by the illustration (presented in Figure 1) that contains a full set of van der Waals curves for triosephosphate isomerase (TIM-Barrel fold, 7tim). Graph A represents the initial curves of a van der Waals energy walk and graphs B E energy curves at different levels of hierarchy. In Figure 1, accordingly: (i) Position 190 of the initial curve (see graph A) gives a minimal value E 0 of interactions. (ii) Different types of structure splitting are observed for the following levels of a potential barrier: 0.25E 0 (Figure 1B), 0.15E 0 (Figure 1C), 0.1E 0 (Figure 1D) and 0.05E 0 (Figure 1E). Figure 1A demonstrates maxima at positions 64 and 124. Each of these maxima is accompanied by two minima with respective energy differences larger than 0.25E 0. This suggests segmentation 1 63, 65 123 and 125 248 at the potential barrier 0.25E 0 (see also Figure 1B). Figure 1C E demonstrate interaction energies within the segments (see legend to Figure 1) corresponding to the barrier values 0.15E 0, 0.1E 0 and 0.05E 0, respectively. (iii) This analysis therefore suggests four levels of hierarchy in triosephosphate isomerase: level 1 (0.3E 0 ), singledomain structure; level 2 (0.25E 0 0.15E 0 ), domain 1 residues 1 63 and 125 248, domain 2 residues 65 123;

Discrete structure of domains Figure 2. Superposition of nearly standard size loops (1) and the van der Waals walks. (A) UB α/β Roll (1ubq). (B) α/β Sandwich (1aps). (C) Up-Down (256b). (D) Immunoglobulin fold (2rhe). (E) Doubly Wound (4fxn). (F) Globin (1thb). (G) Trefoil (1i1b). (H) Jelly Roll (2stv). (I) TIM-Barrel (7tim). Maxima on the plots correspond to the points that separate energy-independent parts of the molecule and, thus, show boundaries between domains (subdomains), segments, etc. A domain is defined as a part of the molecule with strong internal van der Waals interactions between corresponding parts and weak external ones. They may consist of one continuous part of the globule or can be an association of several discontinuous elements suggested by the plot. According to the procedure of domain detection [see Materials and methods for explanation, and also earlier work (Berezovskii et al., 1997; Berezovsky et al., 1999)], five proteins [α/β Sandwich (1aps), Globin fold (1thb), UB α/β Roll (1ubq), Jelly Roll (2stv) and Immunoglobulin fold (2rhe)] yield a unique single-domain structure, whereas other proteins [Trefoil fold (1i1b), TIM-Barrel(7tim), Up-Down (256b) and Doubly Wound (4fxn)] reveal a hierarchy of domain structure with distinct domains in the same protein at different levels of hierarchy. Trefoil fold (1i1b) and Doubly Wound (4fxn) can have a singledomain structure as well as two-domain structure with the formation of one complex domain [in Trefoil fold, residues 3 41 and 104 153 (domain 1); residues 43 102 (domain 2); in Doubly Wound, residues 1 33 and 122 138 (domain 1); residues 35 120 (domain 2)]. Up-Down fold (256b) yields three levels of hierarchy with single-domain structure and distinct two-domain organizations [residues 1 75 (domain 1), residues 77 106 (domain 2) or residues 1 42 (domain 1), residues 44 106 (domain 2)]. TIM-Barrel fold reveals four level of hierarchy with single-, two- and three-domain organizations (see legend to Figure 4 and Materials and methods). level 3 (0.1E 0 ), domain 1 residues 1 63 and 212 248, domain 2 residues 65 123, domain 3 residues 125 210; and level 4 (0.05E 0 ), domain 1 residues 1 63 and 125 248, domain 2 residues 65 123. Comparative analysis of the domains detected by different computational approaches We compared the domain assignments by our method with other methods and authors definitions. If the assigned domain boundary is located in the interval l 2 (residue l is the domain boundary assigned by the other method), we shall consider the domain boundary assignments identical. This is related to the accuracy of the method, since we take into account the interaction between atoms belonging to the residues separated by at least two residues (see above). The accuracy score is calculated as follows: Σ M N cor i m S i 1 N tot where Ncor i is the number of residues assigned to the same domain both by our program and another method or author definition, N tot is the total number of residues in the protein chain, m is the number of domain boundaries that were similarly assigned both by the program and by the other methods and M is the number of domains under comparison. If the number of domains assigned by our method is not equal to the number of domains assigned by others, then M is the maximal number of domains in the compared assignments. Detection of the closed loops and their characterization Closed loops are defined as continuous sub-trajectories of the folded chains with small Cα Cα distance between their ends 163

I.N.Berezovsky Table I. Location of the closed loops and van der Waals segmentation in nine major superfolds PDB code Closed loops van der Waals segments 1thb 26 59, 65 80, 102 129 1 40, 42 62, 64 73, 75 78, 80 92, 94 115, 117 141 1i1b 16 30, 42 62, 70 99, 101 115, 120 136 1 41, 43 64, 66 102, 104 121, 123 153 256b 10 33, 41 62, 68 94 1 18, 20 24, 26 32, 34 39, 44 62, 63 65, 67 75, 77 106 2rhe 6 22, 35 50, 62 77, 86 106 1 45, 47 58, 60 82, 84 109 1aps 18 45, 47 75 1 52, 54 98 2stv 29 58, 72 89, 93 124, 129 147, 154 170,171 190 1 23, 25 61, 63 115, 117 155, 157 195 4fxn 1 30, 35 66, 78 107, 112 134 1 37, 39 45, 47 80, 82 86, 88 138 1ubq 2 16, 22 55, 50 67 9 16, 18 34, 36 40, 42 76 7tim 9 40, 41 62, 62 90, 95 126, 128 166, 165 209, 3 63, 65 91, 93 103, 105 123, 125 160, 209 231, 230 249 162 173, 175 210, 212 250 Figure 3. Sum of synchronized van der Waals plots around the ends of the mapped loops. To eliminate the sequence end effects the loop ends closer than 30 amino acid residues to the ends of the sequences were excluded from the calculation. A total of 54 sections of energy curves are summed together. The background of 10 kcal/mol that corresponds to the interaction energy at the loop ends is subtracted. (up to 10 Å). These are not loops in the traditional definition as connectors between elements of secondary structure (Leszczynski and Rose, 1986; Martin et al., 1995; Kwasigroch et al., 1996; Oliva et al., 1997) or so-called U-turns (Kolinski et al., 1997), which do not include loop closure points. The closed loops (Berezovsky et al., 2000c; Berezovsky and Trifonov, 2001a) connect points distantly positioned along the polypeptide chain, providing the formation of locally compact structural subunits. The Cα Cα contacts with immediate neighbors along the sequence are not considered. Five residues are taken as the cut-off value. For the (anti)parallel α- and β- structures forming several short Cα Cα contacts the shortest one is taken. According to the loop size distribution (Berezovsky et al., 2000c; Berezovsky and Trifonov, 2001a), the loops accepted into the mapping procedure have sizes from 15 to 50 amino acid residues. The mapping procedure sequentially selects the tightest loops and at each step the sequence region corresponding to the mapped loop is excluded from further calculations. In case of partial overlapping the tighter of the two loops is accepted. With overlapping less than five common amino acid residues both loops were accepted. Van der Waals walks, i.e. interaction energy between the parts of the native globule, are plotted in Figure 2 as described previously (Berezovskii et al., 1997; Berezovsky et al., 1999). The top curves on the plots in Figure 2 correspond to the smallest value of the potential barrier [for Jelly Roll fold (2stv) only this curve is presented]. Values of the barrier are as follows: 0.01E 0 for α/β Sandwich (1aps) and 0.05E 0 for other superfolds. A total of 54 sections of the van der Waals plots around the loop ends (left and right) of the nine superfolds were aligned and summed together in Figure 3. 164 Results Van der Waals segmentation and loop structure of the major superfolds As demonstrated earlier (Berezovskii et al., 1997; Berezovsky et al., 1999), calculation of the van der Waals segments of the protein globule allows one to define boundaries and hierarchy of the domains at different energy levels. This calculation generates curves with zero levels at the start and end points. The energy profiles (see Figures 1 and 2) are rather ragged, showing numerous maxima and minima. The maxima correspond to the borders between the energy-defined independent segments. For example, in Figure 1A the profile for the whole molecule shows several maxima that split the molecule into several segments. The energy-justified splitting can be as detailed, as many maxima are considered to be borders. The energy profiles are then calculated separately for each segment so that other parts of the molecule do not contribute to the neighboring segments (e.g. plots in Figure 1B E). The subdivision starts with the highest maxima observed and the procedure allows one to reveal additional maxima, which appear in the original plot as changes in the slope (shoulders) rather than maxima. Selected structural units are characterized by substantially higher internal versus external interactions. The top curves in Figure 2A I demonstrate more detailed segmentation of the globules for the smallest values of the potential barriers (the notion of the potential barrier is explained in detail and exemplified in Materials and methods). Inspection of Figure 2A I shows the typical size of these segments: 10 50 amino acid residues. Similar sizes are characteristic of closed loops (Berezovsky et al., 2000c; Berezovsky and Trifonov, 2001a); that is returning pieces of trajectory with tight Cα Cα contacts. A comparison of maps of domain boundaries with borders between primary closed loops (indicated by bars above the energy curves in the Figure 2) shows that these two maps are rather similar. That is, the majority of the loop ends correspond to the peaks on the energy curves (loop mapping error bars: 3 amino acid residues). Table I contains closed loops mapped in nine major superfolds and van der Waals segments of respective structures selected at the lowest level of the potential barrier (see Materials and methods). Sets of the loops and the segments contain 38 and 48 entities, respectively. Among 58 internal (not at the ends of the protein) loop ends there are 36 located at the respective borders of van der Waals segments (bold in Table I). Quantitative agreement of the loop borders with energy plots is further demonstrated by Figure 3, where the sections

Discrete structure of domains Figure 4. Domain structure of triosephosphate isomerase (7tim). (A) Three domains of the TIM-Barrel fold: domain 1 (residues 1 63, 212 248, dark grey), domain 2 (residues 65 123, grey), domain 3 (residues 125 210, light grey). (B D) Loop representation of three domain of TIM Barrel fold. (B) Domain 1 (residues 1 63 and 212 248): loops 9 40 (black), 41 62 (light grey), 207 228 (grey) and 229 243 (light grey). (C) Domain 2 (residues 65 123): loops 62 90 (black) and 95 126 (light grey). (D) Domain 3 (residues 125 210): loops 126 166 (grey) and 177 210 (light grey). of energy curves around the loop ends of the nine superfolds are aligned and summed together. Thus, Figure 3 represents an averaged interaction energy in the vicinity of the loop end (position zero, see Materials and methods). The average interaction energy shows a clear maximum at the aligned loop ends and the match between the loop ends and the borders of van der Waals segments. Together with the similarity of the sizes, this give a basis to consider closed loops as common elementary units of protein structure and stability. In some cases (for the smallest values of the potential barrier), there are several small segments inside the closed loop. These correspond to local rather compact pieces with high energy content. Hierarchy of domain structure There are many cases where a fold can be dissected in alternative ways (different techniques and/or authors). A principle of the systematic comparison of the domain assignments by different techniques (see Materials and methods) has been developed earlier (Berezovsky et al., 1999). In this work, the procedure was applied to the domain assignments of the major superfolds. Single-domain assignments made by the van der Waals techniques are in full accordance with the same conclusions by different techniques for the following superfolds: single-domain structure in Trefoil fold (1i1b) coincides with the result of the DOMAK program (Siddiqui and Barton, 1995); single-domain structures in α/β Sandwich (1aps), Doubly Wound (4fxn) and TIM-Barrel (7tim) have also been detected (Islam et al., 1995); both the DOMAK program and the algorithm developed by Islam et al. (Islam et al., 1995) show a single domain for Up-Down (256b) and Jelly Roll (2stv) folds. Three-domain assignment for TIM-Barrel fold coincides with that made by the DOMAK program (accuracy score 90%). In addition, the van der Waals approach demonstrates other variants of domains in these structures. Inspection of known alternative van der Waals domain structures for the major superfolds reveals that their domains 165

I.N.Berezovsky and subdomains also consist of closed loops. For example, formation of a two-domain structure in the Trefoil fold (1i1b) is achieved by the contribution of loops 43 63 and 70 99 to the two-loop domain 43 102 while the rest of the structure is a complex domain of two parts: the upstream part (residues 3 41) contains loop 19 40 and region 104 153 contains loops 101 122 and 122 144. In the Doubly Wound fold (4fxn), segment 1 33 (loop 1 30), being in strong interaction with the last loop 112 134, forms a complex domain of residues 1 30 and 122 138. At the same time, the second domain (residues 32 120) is made of two other loops (35 66 and 78 107). An Up-Down fold (256b) yields two variants of a twodomain structures (in addition to a single-domain description): domain 1 (residues 1 75; loops 10 33 and 41 62) and domain 2 (residues 77 106; loop 68 94) or, alternatively, domain 1 (residues 1 42; loop 10 33) and domain 2 (residues 44 106; loops 41 62 and 68 94). Finally, the domain structure of the TIM-Barrel fold can vary from a single-domain to two- or three-domain organization. Loops 9 40 (black, Figure 4B) and 41 62 (light grey, Figure 4B) (segment 1 63) and loops 128 166 (grey, Figure 4D), 177 210 (light grey, Figure 4D), 207 228 (grey, Figure 4B) and 229 243 (light grey, Figure 4B) (segment 125 248) form the first domain of the two-domain structure, and loops 62 90 (black, Figure 4C) and 95 126 (light grey, Figure 4C) segment (65 123) the second domain. A three-domain structure is generated by strong interaction between loops in the following order: loops 9 40 (black), 41 62 (light grey), 207 228 (grey) and 229 243 (light grey) are in the first domain (segments 1 63 and 212 248; Figure 4B); loops 62 90 (black) and 95 126 (light grey) form a second domain (residues 65 123; Figure 4C), while the third domain (residues 125 210; Figure 4D) consists of the loops 128 166 (grey) and 177 210 (light grey). Hence there is an obvious correlation of maxima of van der Waals plots with positions of the loop ends. The interaction of the loops and their closure results in the formation/alteration of domains at different levels of the energy hierarchy. Whichever domain is considered it consists of one or two or any number of nearly standard size loop-n-lock elements. Discussion Hierarchical subdivisions of the van der Waals domains provide common ground for the reconciliation of traditional definitions of domains. The very important advantage of this approach is the possibility of detecting structural domains that involve any number of continuous or discontinuous segments of the polypeptide chain. Moreover, van der Waals segmentation eventually leads to the elucidation of the levels of energy hierarchy, which correspond to distinct sets of structural domains (Berezovsky et al., 1999). We demonstrated here the existence of several levels of energy hierarchy in β-lactamase (4blm, Figure 5) and triosephosphate isomerase (7tim, Figure 4) with distinct sets of domains. In both cases clear levels of hierarchy revealed by the van der Waals approach show single-, two- and three-domain structures. Where analytical approaches were used, alternative conclusions on domain structure in β-lactamase 4blm were made (Islam et al., 1995; Siddiqui and Barton, 1995). Islam et al. suggested singledomain assignment, whereas the technique of Siddiqui and Barton showed a two-domain structure. The van der Waals approach detects both variants of domain structure as the first and third levels of energy hierarchy (with accuracy scores equal to 100 and 67%, respectively). The latter case is an 166 Figure 5. Domain structure of β-lactamase (4blm). (A) Residues 31 60 (light grey), 62 214 (grey) and 216 291 (black). (B) Two-domain structure (variant 1): domain 1 (residues 31 214, light grey), domain 2 (residues 216 291, black). (C) Two-domain structure (variant 2): domain 1 (residues 62 214, light grey), domain 2 (residues 31 60, 216 291, black). example of a multidomain protein with a discontinuous domain structure: discontinuous sections 31 60 and 216 291 make a separate domain (Siddiqui and Barton, 1995), confirmed also by our energy calculations (Berezovsky et al., 1999). In addition, another variant of two-domain structure is suggested. In the case of triosephosphate isomerase (7tim, TIM-Barrel fold), the van der Waals approach also selects single-, twoand three-domain structures at different levels of hierarchy (Figure 4). Here, a single-domain structure defined in the X- ray experiment and a three-domain structure revealed by an analytical approach (Siddiqui and Barton, 1995) belong to different levels of energy hierarchy detected by the van der Waals approach (accuracy scores are 100 and 90%, respectively). Thus, the hierarchical energy calculations allow one to single out domains of interest with regard to function, structure, energy or other properties by essentially regrouping the energy-defined segments. Domain boundaries defined by the van der Waals energy approach match well the closed loop boundaries. Considering the inaccuracies in loop mapping and in energy calculations, this match is rather surprising. The correlation is best seen, for example, in the case of a Trefoil fold (1i1b) or a TIM- Barrel fold (7tim) in Figure 2G and I, respectively. Detailed inspection of domains and subdomains of different levels shows that it is not merely a correlation, but in fact the domains are actually made of the closed loops. Figures 4 and

Discrete structure of domains 5 illustrate all the variants of domain structures in β-lactamase (4blm) and triosephosphate isomerase (7tim), with the loops regrouped accordingly. In β-lactamase (4blm) for example, the alteration of domain structure depends on the status of the loop 31 60 (light grey, Figure 5A). A reassignment of this loop results in alternative two-domain structures: domains 31 214 (light grey, Figure 5B) and 216 291 (black, Figure 5B) or domains 31 60, 216 291 (black, Figure 5C) and 62 214 (light grey, Figure 5C). A division of the triosephosphate isomerase (7tim) into eight loops provides variants from one to three domains (Figure 4). Alteration of domains at different levels of hierarchy is also defined in this case by the regrouping of the loops and interactions between them. Van der Waals locking of closed loops and additional secondary locks with the formation of domains (subdomains) prove that van der Waals interactions play a major role in the formation of mature globular structures, loops and domains equally. Directed interactions (hydrogen bonds or ion pairs) play an additional role in the local stabilization of compact structural elements (e.g. α-helices) or modulate interactions between enthalpy driven stable structures (loops, subdomains, domains). The exceptional role traditionally ascribed to directed interactions is an overestimation of their marginal role in the overall globule stability. Directed interactions are always saturated either by interaction between respective groups inside the globular structure or by the contacts with water and counter-ions. Therefore, they could only provide a small advantage for a folded versus an unfolded structure. Van der Waals closure of the loop ends and additional (secondary) distant van der Waals contacts serve as major folding enthalpy contributors. A closed loop can therefore be considered as an elementary unit of domain structure and interactions between them provide diversity of the domain structures in globular proteins. Kwasigroch,J.M., Chomilier,J. and Mornon,J.P. (1996) J. Mol. Biol., 259, 855 872. Leszczynski,J.F. and Rose,G.D. (1986) Science, 234, 849 855. Martin,A.C.R., Toda,K., Stirk,H.J. and Thornton,J.M. (1995) Protein Eng., 8, 1093 1101. Nemethy,G., Pottle,M.S. and Scheraga,H.A. (1983) J. Phys. Chem., 87, 1883 1887. Oliva,B., Bates,P.A., Querol E., Aviles,F.X. and Sternberg,M.J.E. (1997) J. Mol. Biol., 259, 814 830. Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 372, 631 634. Porter,R.R. (1959) Biochem. J., 73, 119 126. Rose,G.D. (1979) J. Mol. Biol., 134, 447 470. Siddiqui,A.S. and Barton,G.J. (1995) Protein Sci., 4, 872 884. Wernisch,L., Hunting,M. and Wodak,S. (1999) Proteins, 35, 338 352. Received March 18, 2002; revised December 13, 2002; accepted January 24, 2003 Acknowledgements Professor E.N.Trifonov s stimulating discussions and thoughtful comments and Professor M.D.Frank-Kamenetskii s critical reading of the manuscript and fruitful discussions are greatly appreciated. I am grateful to Mrs. A.Weinberg for editing of the text. I.N.B. is a Post-Doctoral Fellow of the Feinberg Graduate School at the Weizmann Institute of Science. References Baldwin,R.L. and Rose,G.D (1999a) Trends Biochem. Sci., 24, 26 33. Baldwin,R.L. and Rose,G.D. (1999b) Trends Biochem. Sci., 24, 77 83. Berezovskii,I.N. and Tumanyan,V.G. (1995) Biophysics, 40, 1181 1187. Berezovskii,I.N., Esipova,N.G. and Tumanyan,V.G. (1997) Biophysics, 42, 557 565. Berezovsky,I.N. and Trifonov,E.N. (2001a) Protein Eng., 14, 403 407. Berezovsky,I.N. and Trifonov,E.N. (2001b) J. Mol. Biol., 307, 1419 1426. Berezovsky,I.N., Tumanyan,V.G. and Esipova,N.G. (1997) FEBS Lett., 418, 43 46. Berezovsky,I.N., Namiot V,A., Tumanyan,V.G. and Esipova,N.G. (1999) J. Biomol. Struct. Dyn., 17, 133 155. Berezovsky,I.N., Esipova,N.G., Tumanyan,V.G. and Namiot V,A. (2000a) J. Biomol. Struct. Dyn., 17, 799 809. Berezovsky,I.N., Esipova,N.G. and Tumanyan,V.G. (2000b) J. Comput. Biol., 7, 183 192. Berezovsky,I.N., Grosberg,A.Y. and Trifonov,E.N. (2000c) FEBS Lett., 466, 283 286. Crippen,G.M. (1978) J. Mol. Biol., 126, 315 332. Doolittle,R.F. (1995) Annu. Rev. Biochem., 64, 287 314. Dunfield,L.G., Burgess,A.W. and Sheraga,H.A. (1978) J. Phys. Chem., 24, 2609 2616. Islam,S.A., Luo,J. and Sternberg,M.J.E. (1995) Protein Eng., 8, 513 525. Jones,S., Stewart,M., Michie,A., Swindells,M.B., Orengo,C. and Thornton J.M. (1998) Protein Sci., 7, 233 242. Kendrew,J.C., Bodo,G., Dintzis,H.M., Parrish,R.G., Wyckoff,H., Phillips,D.C. (1958) Nature, 181, 662 666. Kolinski,A., Skolnick,J., Godzik,A. and Hu,W.-P. (1997) Proteins, 27, 290 308. 167