Thermodynamics of the coil to frozen globule transition in heteropolymers

Similar documents
Freezing of compact random heteropolymers with correlated sequence fluctuations

arxiv:cond-mat/ v1 2 Feb 94

Random heteropolymer adsorption on disordered multifunctional surfaces: Effect of specific intersegment interactions

Swelling and Collapse of Single Polymer Molecules and Gels.

A simple theory and Monte Carlo simulations for recognition between random heteropolymers and disordered surfaces

Proteins polymer molecules, folded in complex structures. Konstantin Popov Department of Biochemistry and Biophysics

Statistical Physics of The Symmetric Group. Mobolaji Williams Harvard Physics Oral Qualifying Exam Dec. 12, 2016

A first-order transition in the charge-induced conformational changes of polymers

Chap. 2. Polymers Introduction. - Polymers: synthetic materials <--> natural materials

arxiv: v1 [cond-mat.soft] 22 Oct 2007

Engineering of stable and fast-folding sequences of model proteins

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Formation of microdomains in a quenched disordered heteropolymer

Statistical Mechanics of Simple Models of Protein Folding and Design

Protein design: a perspective from simple tractable models Eugene I Shakhnovich

2.4 DNA structure. S(l) bl + c log l + d, with c 1.8k B. (2.72)

Polymer Solution Thermodynamics:

Phase Transitions. µ a (P c (T ), T ) µ b (P c (T ), T ), (3) µ a (P, T c (P )) µ b (P, T c (P )). (4)

Two-Dimensional Polymers with Random. Eilon Brenner. for the M.Sc. degree. at Tel-Aviv University. School of Physics and Astronomy

PHYSICAL REVIEW LETTERS

S(l) bl + c log l + d, with c 1.8k B. (2.71)

Polymer solutions and melts

Predicting free energy landscapes for complexes of double-stranded chain molecules

Confinement of polymer chains and gels

arxiv:cond-mat/ v1 [cond-mat.soft] 5 May 1998

Coil to Globule Transition: This follows Giant Molecules by Alexander Yu. Grosberg and Alexei R. Khokhlov (1997).

Physics 562: Statistical Mechanics Spring 2002, James P. Sethna Prelim, due Wednesday, March 13 Latest revision: March 22, 2002, 10:9

PHASE TRANSITIONS IN SOFT MATTER SYSTEMS

Potts And XY, Together At Last

Spontaneous Symmetry Breaking

The glass transition as a spin glass problem

arxiv:cond-mat/ v1 [cond-mat.soft] 19 Mar 2001

Local Interactions Dominate Folding in a Simple Protein Model

First steps toward the construction of a hyperphase diagram that covers different classes of short polymer chains

FREQUENCY selected sequences Z. No.

arxiv:cond-mat/ v1 [cond-mat.soft] 23 Mar 2007

A Deeper Look into Phase Space: The Liouville and Boltzmann Equations

Computer simulation of polypeptides in a confinement

Thermodynamics of nuclei in thermal contact

Stretching lattice models of protein folding

Guessing the upper bound free-energy difference between native-like structures. Jorge A. Vila

Universal correlation between energy gap and foldability for the random energy model and lattice proteins

arxiv:chem-ph/ v1 11 Nov 1994

Evolution of functionality in lattice proteins

Short Announcements. 1 st Quiz today: 15 minutes. Homework 3: Due next Wednesday.

Distance Constraint Model; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 11

Phase Transition in a Bond. Fluctuating Lattice Polymer

chem-ph/ Feb 95

Designing refoldable model molecules

MACROSCOPIC VARIABLES, THERMAL EQUILIBRIUM. Contents AND BOLTZMANN ENTROPY. 1 Macroscopic Variables 3. 2 Local quantities and Hydrodynamics fields 4

Triangular Lattice Foldings-a Transfer Matrix Study.

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

THEORY OF PROTEIN FOLDING: The Energy Landscape Perspective

Stacking and Hydrogen Bonding. DNA Cooperativity at Melting.

Nuclear Science Research Group

arxiv:cond-mat/ v1 [cond-mat.soft] 25 Apr 2001

Olle Inganäs: Polymers structure and dynamics. Polymer physics

The (magnetic) Helmholtz free energy has proper variables T and B. In differential form. and the entropy and magnetisation are thus given by

Phase transitions and critical phenomena

A simple technique to estimate partition functions and equilibrium constants from Monte Carlo simulations

Physica A. A semi-flexible attracting segment model of two-dimensional polymer collapse

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

arxiv:cond-mat/ v1 [cond-mat.soft] 16 Nov 2002

Long Range Moves for High Density Polymer Simulations

Amorphous Polymers: Polymer Conformation Laboratory 1: Module 1

Collapse of a polymer chain: A Born Green Yvon integral equation study

Master equation approach to finding the rate-limiting steps in biopolymer folding

Scaling Law for the Radius of Gyration of Proteins and Its Dependence on Hydrophobicity

Phase transition and spontaneous symmetry breaking

Competition between face-centered cubic and icosahedral cluster structures

Transitions of tethered polymer chains: A simulation study with the bond fluctuation lattice model

Research Paper 577. Correspondence: Nikolay V Dokholyan Key words: Go model, molecular dynamics, protein folding

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Growth oscillations. LElTER TO THE EDITOR. Zheming Cheng and Robert Savit

The Second Virial Coefficient & van der Waals Equation

Pathways for protein folding: is a new view needed?

8.334: Statistical Mechanics II Spring 2014 Test 2 Review Problems

V.E Mean Field Theory of Condensation

arxiv: v1 [cond-mat.stat-mech] 6 Mar 2008

Why Complexity is Different

CRITICAL SLOWING DOWN AND DEFECT FORMATION M. PIETRONI. INFN - Sezione di Padova, via F. Marzolo 8, Padova, I-35131, ITALY

Time-Dependent Statistical Mechanics 5. The classical atomic fluid, classical mechanics, and classical equilibrium statistical mechanics

arxiv: v1 [cond-mat.soft] 4 Sep 2018

Physics 212: Statistical mechanics II Lecture XI

Introduction to Computational Structural Biology


UNDERSTANDING BOLTZMANN S ANALYSIS VIA. Contents SOLVABLE MODELS

Ensemble equivalence for non-extensive thermostatistics

On the local and nonlocal components of solvation thermodynamics and their relation to solvation shell models

MD Thermodynamics. Lecture 12 3/26/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Frontiers in Physics 27-29, Sept Self Avoiding Growth Walks and Protein Folding

arxiv:cond-mat/ v4 [cond-mat.stat-mech] 19 Jun 2007

3.091 Introduction to Solid State Chemistry. Lecture Notes No. 9a BONDING AND SOLUTIONS

Collective Effects. Equilibrium and Nonequilibrium Physics

Computer simulation methods (2) Dr. Vania Calandrini

Adsorption from a one-dimensional lattice gas and the Brunauer Emmett Teller equation

CHAPTER V. Brownian motion. V.1 Langevin dynamics

An Inverse Mass Expansion for Entanglement Entropy. Free Massive Scalar Field Theory

Quantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12

Polymer Physics MSE 458 / CHEM 482 Spring 2018

Transcription:

Thermodynamics of the coil to frozen globule transition in heteropolymers Vijay S. Pande Department of Physics, University of California at Berkeley, Berkeley, California 94720, and the Department of Physics and Center for Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 Alexander Yu. Grosberg a) and Toyoichi Tanaka Department of Physics and Center for Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 Received 18 December 1996; accepted 26 June 1997 Recent analytic theories and computer simulations of heteropolymers have centered on the freezing transition of globular heteropolymers. We present a simple analytic theory to describe the coil to globule collapse in heteropolymers and compare this to the computer simulation of the exhaustive enumeration of all 18-mer cubic lattice polymer conformations. We find that the collapse transition from coil to frozen globule can either be first or second order. The relevance to protein folding is also discussed. 1997 American Institute of Physics. S0021-9606 97 51237-2 I. INTRODUCTION In the early days of protein studies, the denaturation of globular proteins was thought of as a globule decollapse transition, and this was one of the reasons that brought much attention to the homopolymer coil to globule transition. By now, this later phenomenon is well understood in many respects. In particular, thermodynamics of this transition has been studied theoretically in great detail 1,2 and the transition has been also clearly observed experimentally despite the difficulties with phase segregation in poor solvent conditions. 3,4 Except for the simplest universal picture of the collapse of a flexible uncharged homopolymer, several other model systems have been examined, including annealed heteropolymers, 5 charged pyroelectrolytes and polyampholytes, 6 and polymeric mesogenes. 7 The corresponding phase diagrams are pretty rich and in some respects indicate certain similarities with properties of proteins. There are even assumptions albeit disputable ones in the literature 8 that the molten globule phase of proteins 9 can be modelled on the level of a homopolymer. Nevertheless, it is clear that the final native conformation of each protein is encoded in its sequence and thus it is well understood that protein folding requires a more specific heteropolymer aspect, which is believed to be freezing. 10 12 We think it is fair to say that freezing in general is understood much less clearly at present than collapse, as only the simplest models of freezing have been examined. Even more than that, these two aspects collapse and freezing are not well matched in the emerging general picture of heteropolymers. There were several works in the literature examining phase diagrams that include coil, liquid-like random globule similar to a homopolymer globule, and frozen globule phases of a heteropolymer chain. 13,14 After the submission of the present work, a new study was published 15 which contains several new important methodical innovations and pro- a On leave from the Institute of Chemical Physics, Russian Academy of Sciences, Moscow 117977. vides for new insights into the relationships of three possible phases of a heteropolymer chain. One of the important achievements of the work 15 is that it goes beyond the uncorrelated energy landscape or REM 16 model and thus continues the truck toward understanding the kinetics of folding. Despite the important achievements of the previous works, we would like with the present work to once again bring the attention to the problem of a phase diagram of a heteropolymer chain. In doing so, our main motivation is to take a closer look at the phase transition between frozen globule and coil state and to give arguments showing that this transition may be, and in many realistic cases is, of the first order. This fact seems to be missing in the previous studies, while we view it as important for the overall understanding of the possible scenarios of folding. We give simple analytical arguments to support our claim and we also demonstrate a solid numerical evidence for that. The later one is based on complete enumeration of all, not just maximally compact, conformations of an 18-mer on the cubic lattice. This enumeration has not been performed before, and it is clearly advantageous for our purposes compared to the previous model 14 of a two dimensional 16-mer. The possibility of the first order phase transition is fundamentally due to the obvious fact that the nature of the phases of a heteropolymer chain must depend on its sequence. It is well understood at present that the ensembles of possible sequences can be described much like Gibbs ensembles in the statistical mechanics, in terms of either a microcanonical ensemble, in which one specifies the energy E NS of the native state and thus knows how well this native state is optimized energetically, or in terms of a canonical ensemble, in which one controls E NS in terms of some effective temperature T d. There are several constructive models that show how to control native state energy using a canonical ensemble with temperature T d, including sequence selection 17 and Imprinting. 12 We shall show that phase transitions in the chain of a heteropolymer depend indeed on the type of sequence, that is, on T d. Our paper is organized as follows. We begin with an 5118 J. Chem. Phys. 107 (13), 1 October 1997 0021-9606/97/107(13)/5118/7/$10.00 1997 American Institute of Physics

Pande, Grosberg, and Tanaka: Coil to globule transition 5119 analytic treatment and then compare those results with the computer simulation. We conclude with a discussion of the relevance of these results to proteins and other heteorpolymeric systems. II. ANALYTIC TREATMENT A. The model What are the relevant order parameters in our study? As to the coil to globule transition, in which the system goes from a phase of swollen, random walk type conformations perturbed only by excluded volume to one in which it is dense and collapsed, it can be described by chain size or density; for each conformation one defines (x) I (x r I ), where r I is the position vector of the monomer I in the conformation. As long as the thermodynamic limit is concerned, is zero in the coil phase and non-vanishing in the globule. A more delicate order parameter is necessary to consider the freezing transition, in which the system goes from a phase consisting of an exponential number of conformations O (e N ) to one in which only O (1) states dominate. As we expect the ground state conformation to be unique, a good order parameter to examine arbitrary conformation of the chain that undergoes a freezing transition is the overlap between and the ground state : Q I J r I r J r I r J, where (r) is a function that is concentrated on nearest neighboring space points on a lattice, for lattice spacing a, (a) 1 and (r a) 0], and thus (r I r J ) is nonvanishing only when monomers I and J in conformation are neighbors in contact. It is even more convenient to define, for a conformation, its fractional overlap to the ground state, q Q /Q max, 2 where Q max is the maximum number of contacts possible we assume that is maximally compact and thus Q Q max ). Obviously, q ranges from 0 to 1. At first, these two order parameters, and Q, may seem to be completely unrelated, but as dense chains have the possibility for forming more contacts, we expect that there should be some relationship between the density and the number of contacts of a given conformation which is the same as the overlap of a conformation with itself Q I J (r I r J )]. To see this, one has to introduce the coarse grained density (x) (1/z) I (x r I ), where z is the number of nearest neighbors on which is localized for each conformation, and define also spatial averages as dr/v. Noting that Nz Q max, one arrives at q Q 2 Q 2. 3 max Thus, the average density of the chain in some conformation is almost the same as fractional overlap of this conformation 1 with itself, and as density increases, the approximation that q improves. Note that q and q are proportional to the total number of bonds between monomers in the conformation and the number of bonds that the given conformation has in common with the ground state conformation. B. Free energy We will next calculate the free energy in terms of order parameters q and q q. In our calculations, we employ the most general Hamiltonian of short-range pairwise interactions, 12 H I J B si,s J r I r J, where s I 1,...,n is the species of monomer number I, n is the number of species, B ij is the matrix of species species interactions. We implicitly assume here that position vectors r I are such that the condition of chain connectivity is met, but do not assume maximally compact conformations. Formally, the free energy of interest can be written in a general form in terms of the Hamiltonian 4 : F seq,q T ln e H /T q q, 5 where the subscript seq indicates that the free energy is written for a given sequence, and summation is performed over all conformations. The dependence on the sequence enters both through the Hamiltonian and through the ground state conformation for the given sequence. It is technically cumbersome to treat the free energy 5 in the general form. What we shall do instead is we shall resort to an interpolation expression which can be derived using the following two arguments. For a homopolymer, when q-dependence is not in question, the free energy per monomer can be described in terms of a truncated virial series: 1 N F B C 2, 6 where B and C are, respectively, two- and three-body virial coefficients. In the simplest case, B in the equation 6 is related to B ij in the equation 4 by B b 3 z exp( B ij ) 1, where z and b are the coordination number and the mesh size of the underlying lattice. For simplicity, we shall take b 1 in what follows, such that volumes and densities are unitless. For a heteropolymer whose density is evenly distributed in space and cannot fluctuate when dependence is trivial, the following expression has been derived elsewhere: 18 1 B2 F q q B2 N T p T 1 q Tq s 3 2 ln q T q ln q 1 q ln 1 q. 7 This expression is detailed in the Appendix. We make now the assumption that the heterogeneity of monomeric interactions affects only the two-body interaction 4

5120 Pande, Grosberg, and Tanaka: Coil to globule transition terms, combine equations 6 and 7, and thus arrive at the following phenomenological interpolation which is central for the present work: 1 N F,q B B2 q B2 T p T 1 q C 2 Tq s 3 2 ln q T q ln q 1 q ln 1 q. 8 To understand the approximation behind the expression 8, we note that we have omitted the loop-related entropy part in the homopolymeric free-energy 6. This may seem strange, as we did take into account a seemingly analogous loop factor in the heteropolymeric part 7. We think this procedure is legitimate, because homopolymer loops are known to contribute only to the surface energy of the globule, i.e., to the terms of the order of N 2/3, while all terms in 8 are of order N. Before proceeding to describe the resulting phase diagram, we stop to comment on the differences between our theory and previous works. 13,14 Our model goes beyond previous models in that corrections to the Random Energy Model are included see the Appendix for details. The reader might be concerned that our formalism does not utilize complicated mathematical formalisms previously employed, such as replica field theory, and thus the current theory is an oversimplification of previous work and therefore less powerful. In fact, by breaking away from the replica formalism, we are able to introduce the relevant physical ingredients and advance beyond these previous works. C. Phase diagram An inspection of our main equation for the free energy 8 indicates that the q-dependence allows for a minima at either q 1 orq 0, i.e., with either complete dominance of the native state, or with the mixture of the conformations statistically unrelated to the native state. This result coincides with that of an uncorrelated energy landscape, or Random Energy Model REM. Thus, this theory can be considered to be a simple derivation of REM for heteropolymer freezing. Note also that the q 1 minimum exists if and only if the corresponding is not too small, meaning that this minimum can be realized only for the globular polymer; obviously, it corresponds to the state of the frozen globule. By contrast, the q 0 state can be realized both for the coil ( 0) and for the globule, which in this case is the random globule. Thus, we arrive at the three-region phase diagram which includes a coil phase and two globular phases random and native frozen globules ; we call these phases C, RG, and FG, respectively. We now comment on the nature of the boundaries between the three phases: Random to frozen globule When the solvent is sufficiently poor overall (B is sufficiently negative, the transition in q will not be accompanied by a transition in. This occurs because while there would be an entropic benefit to have a random globule with smaller, the energetic penalty due to a negative B 0 overrules this. Thus, we compare the free energy with max, but with q 0 and q 1. This yields the following condition for a transition from a random to a frozen globule: RG FG: s B2 T 1 T 1 9 T p, which agrees not surprisingly with previous works describing the transition from the random to the frozen globule state. 12 Random globule to coil As q 0 in both of these phases, the only heteropolymeric complication comes from an effective second virial coefficient B eff B B 2 /T. In terms of B eff, the transition to this order is exactly the same as the homopolymer globule to coil transition, with densities B eff /2C in the globular phase and 0 in the coil phase. To this approximation, as we neglect loop-related surface corrections to the homopolymer free energy, 1 the transition happens at C RG in principle : B B2 0. 10 T We have taken into account that the coil free energy vanishes. However, this is a second order transition even if surface energy is taken into account, it remains a very weak transition 1, and for this reason it has a rather large width. In practical terms, it causes considerable complications for computer simulation, as it obscures the position of the transition point. Moreover, in reality the transition region borders the place where globule reaches its maximal possible dense packing density max. Thus, practically the transition is seen at B eff /2C max, that is C RG practically : B B2 T 2C max. 11 Frozen globule to coil As we increase the quality of the solvent, eventually the energetic penalty for decreasing will not be enough and changes as well as q upon denaturation. Thus, the two phases in question correspond to q 0, 0 C and q 1, max FG. The transition occurs at where the free energies are equal, that is, at B FG C: s max T B 2 C 2 max TT max p T. 12 Thus, if we are interested in denaturation, that is, in the destruction of the frozen globule state, we have to ask whether the FG is transformed into C or RG. These two regimes cross into one another where all the lines 9 and 12 and 11 meet, that is, at the corresponding triple point. To bring our results into a more obvious form that is more convenient for comparisons to simulations, we have to choose control parameters, as generally there may be too many of them. In particular, if one wants to speak in terms of a temperature change, then one must take into account the temperature dependence of both B and B 2. This makes equations rather cumbersome. We choose to get rid of temperatures T and T p by redefinitions B B/T,

Pande, Grosberg, and Tanaka: Coil to globule transition 5121 FIG. 1. Phase diagram as predicted theoretically. There are three phases: 1 Coil: expanded, random walk-like configurations and thus this phase has maximum entropy. 2 Random Globule: conformations are globular condensed, high density, but an exponentially large number of conformations are still relevant. 3 Frozen Globule: target conformation dominates equilibrium. Phase transitions in density between the random globule and the coil are second order, whereas the density transition between the frozen globule and the coil are first order. Also, the transition in the overlap with the ground state is first order and occurs at boundaries of the frozen globule phase. B p B/T p and constructing phase diagram in terms of B, B with C and B p considered as parameters. Such a diagram is displayed in Fig. 1. This diagram can be described as follows. For B 0 B c 0, there is a line of first order transitions in q at B B fp and a second transition from maximally compact states at B B m. These two lines meet at a tricritical point (B c 0, B fp ). For B 0 B c 0, there is a single line of first order transitions in both q and. As the coupling constant of and q is B p, better designed sequences will exhibit more pronounced jumps in in the frozen globule to coil transition. Summarizing the results of our analytic model, we find three phases coil, random globule, and frozen globule. This is not new and has been previously described. 13,14 However, we predict that well designed sequences will have a first order transition in density between the coil and frozen globule phases. This first order transition has been neither predicted and nor seen in computer simulations. In the next section, we will verify these analytic predictions by exactly calculating the partition function for a short heteropolymer chain computationally. III. COMPUTER SIMULATIONS A. The model We now compare the results of the theory to computer simulations. In the study of heteropolymers, one must be careful as a few conformations can dominate the partition function. Indeed, this phenomenon is freezing and is one of the most interesting aspects of heteropolymers and the central aspect of this paper. To avoid computational pitfalls, the exhaustive enumeration of all conformations is performed in order to exactly calculate the partition function. Exhaustive enumeration for all maximally compact chains can only be performed for relatively short lengths 19 and the 27-mer on the 3 3 3 has become a common model system. Unfortunately, the enumeration of all not just maximally compact 27-mer conformations is out of reach of current computational power. One approach is to consider only two dimensional conformations for example, this approach was used in Ref. 14, in which all 16-mer conformations in two-dimensional 2D were enumerated. However, we have been able to enumerate all 5,577,317,124 18-mer conformations. One may be suspicious that the 18-mer is simply too small to give representative information about heteropolymers. To test this, we examined several heteropolymer properties of 18-mers and compared these results to that of 27- mers. Upon examining the density of states, and nature of the freezing transition, 18-mers were not qualitatively different from 27-mers. Also, one must choose what types of interactions between monomers one wishes to model. As the theory includes terms only to lowest order, it cannot discriminate between differences in interactions and considers only the variance of the interaction matrix. Thus, differences in interactions modeled computationally cannot be described analytically. To make the best connection with the analytic theory, we designed sequences with a variety of degrees of optimization for the Independent Interaction Model IIM. We found that the range of ground state energies from the best optimized sequences to that of random sequences ranged from 12 to 2. In particular, we have examined 6 values of ground state energies. For comparison, we also designed sequences with 2 and 5 letter Potts model and found no major qualitative differences. B. Results The result of our enumeration for a given sequence and set of interactions is the density of states P(E,Q,q ), i.e., how many conformations have energy E, total number of contacts Q, and percent of contacts q in common with. From the density of states, we can calculate the free energy as a function of B, Q, and q. In Fig. 2, we examine the free energy for a poorly designed sequence in a poor solvent. We see that there is first a first order freezing transition i.e., a jump in q ) and then a second order globule to coil transition as we lower B. In a good solvent, the transition in q and Q go together and are both first order Fig. 3. In general, we find strong qualitative agreement with the theoretical prediction. First, consider the phase diagram of an optimally designed sequence Fig. 4 : we see that all elements of the analytic phase diagram are present. It is not surprising that our lowest order approximation to the free energy in the high temperature expansion does not completely capture all quantitative aspects of the curves, but certainly the nature of the transitions and the general shape of

5122 Pande, Grosberg, and Tanaka: Coil to globule transition FIG. 2. Free energy for a poorly designed sequence in a poor solvent. We see a first order transition from frozen globule a to random globule b and then a second order transition to the coil state c dark areas have lower free energy. Frame d shows F(q ) for these three regimes. the curves agree well with theory. Unlike the 2D case, 14 we computationally find a coil to frozen globule transition with one conformation dominating equilibrium for B 0 0, even for random sequences. Note that the assumption of constant density does not seem to lead to disastrous effects in the prediction of 18-mer behavior. While it is not clear whether this approximation would hold as well for larger systems, proteins are not very FIG. 4. The phase diagram from computer simulations. Shown are the results from enumeration of all 18-mer conformations with IIM interactions and B p 0.5. All aspects of the theory are seen in the computer simulations, including order of the transitions, etc. Note that the B 0 scale here is for mean interactions and does not include the excluded volume present in the nature of conformations used. Thus, B 0 0 corresponds to a very good solvent. long either and, based upon the number of non-local contacts present, 18-mers potentially correspond to short proteins, i.e., 40 50 residues. In Fig. 5, we plot the phase boundaries for sequences with different ground state energies. We see that increasing B p decreases the stability of the ground state, but does not FIG. 3. Free energy for a well designed sequence in a good solvent. We see a first order transition from a frozen globule a to a random globule c with the coexistence between the two shown in b dark areas have lower free energy. Frame d shows F(q q) for these three regimes. FIG. 5. In addition to the data of Fig. 4, we now include several other values for B p the symbols are numbers referring to the relative ground state energy. We see that lowering B p makes the frozen state less stable and thus moves the boundary between the frozen and the random globule. On the other hand, changing B p does not have any effect of the homopolymer-like random globule to coil transition a globule is here defined to be maximally compact.

Pande, Grosberg, and Tanaka: Coil to globule transition 5123 FIG. 6. In addition to the data of Fig. 5, we now include curves delineating the random globule to coil phase boundary for different definitions of a globule the number of contacts defining a globule is encircled; maximum overlap has 16 contacts. For all definitions of the phase boundary, the random globule to coil transition is independent of the nature of design. have any significant effect on the globule to coil transition. Finally, in Fig. 6, we define the boundary between the coil and the globule in different ways with different threshold Q); again, we find that the aspects of the coil to either random or frozen globule transition are not affected by the nature of design. FIG. 7. Full phase diagram of the system, including the dependencies on B 0, B, and ground state energy E. IV. CONCLUSIONS We have described the phase transitions between the three phases for a single heteropolymer chain: coil, globule, and frozen globule. We found that heteropolymeric aspects have two effects: 1 due to good contacts, a heteropolymer can be collapsed even though the mean interaction is repulsive as was shown previously 11 ; 2 the freezing transition is coupled to the collapse transition and, for sufficiently good solvents, can lead to a first order transition in density Fig. 7. Heteropolymers are interesting both for their relevance to biological systems as well as biologically inspired systems. As for the latter, heteropolymer gels have attracted great interest as many microscopic properties of single polymer chains, such as the coil to globule transition, have macroscopic analogs in gels, which are macroscopic macromolecules. For example, in good solvents, the freezing transition will be accompanied by a first order transition in density; this may be a simple means to identify the freezing transition in these materials. As for proteins, recently protein folding kinetics from the random globule to the frozen globule has been described using a similar non-rem theory and physical arguments which link the nature of the free energy as a function of q i.e., a single free energy barrier with the nature of kinetics. However, if the transition is from the coil to the frozen globule, one must consider the free energy barrier not in the one dimensional q phase space, but the two dimensional (q,q) space. This adds complications to the theory, but may be important in the understanding of many proteins whose denatured states are coils, not globules. ACKNOWLEDGMENTS The work was supported by the National Science Foundation DMR 90-22933. VSP acknowledges support from the Miller Institute for Basic Research. AYG acknowledges the support of a Kao Fellowship. APPENDIX: ON THE EQUATION (8) As we expect freezing to occur at temperatures higher than that of the phase segregation of monomers, we can use a high temperature expansion to describe freezing. 12 This yields F seq,q H,q 1 2T H2,q H,q 2 T ln M, A1 where (...),q means the average over all conformations with the given density and overlap to the ground state, and M is the total number of conformations. We have kept terms to O (1/T 2 ), which is exact for the Independent Interaction Model in which the elements of B ij are taken from a Gaussian distribution ; 11 higher order terms may be necessary to quantitatively model a particular set of interactions B ij. As one can only analytically perform a statistical analysis, we have to average over the sequences. In principle, one could directly average over the ensemble of sequences with

5124 Pande, Grosberg, and Tanaka: Coil to globule transition the given value of the ground state energy. This ensemble of sequences is in a way similar to the microcanonical ensemble in regular statistical mechanics. It is, however, technically simpler to use analog of the canonical ensemble where native state energy E is not fixed, but rather controlled through a temperature T p equivalently, 1/T p is a Lagrange multiplier to constrain E ), and thus it has the Gibbs probability distribution: P seq exp E NS seq /T p 1 1 T p H, A2 where in the last transformation we have resorted to high T p expansion which is justified for the same reason as high temperature expansion above. Thus, we characterize a given ensemble of sequences by the value of T p : for lower T p,we model sequences whose native states are better optimized energetically. This prescription for energetic optimization of the native state has many incarnations, including minimal frustration, 10 sequence selection, 17 and Imprinting. 12 When one averages free energy A1 over the probability distribution A2, the result appears very simple, because, due to the structure of the Hamiltonian, the energy depends directly on q, and thus it is very easy to implement the condition of the fixed q, and the result can be written in the form F,q P seq F seq,q E,q TS,q, A3 seq where the q-dependent part of the energy E(,q) for the maximally compact conformations is given by E,q max B2 q B2 1 q, A4 T p T where B 2 ij p i p j (B ij kl p k p l B kl ) is the variance of the elements of the interaction matrix, 12 p i is the fraction of monomers of type i, and the entropy S(q) of conformations with q contacts in common with is S q, ln M q s 3 2 ln q q ln q 1 q ln 1 q. A5 The first energy term comes from the mean of the density of states H and describes how the energy is pulled down due to selection; indeed, to this order, the ground state energy is E B 2 /T p, and thus this term says that on average, the energy is given by the energy of the native state times the fraction of native contacts. The second energy term comes from the width of the density of states, resulting from the H 2 H 2 terms; the q dependence of this width enters from the correlator H 2, 20 as two conformations with a given overlap q with also have q contacts in common, on average. Higher order terms can modify the q dependence of the width, depending on the nature of interactions. Note that the q dependence of the mean and width of the density of states is a correction to REM, describing the nature of energy correlations. One can understand the energy terms in a simple manner: each native contact yields an energetic bonus of E /Q max and each non-native contact yields B 2 /T, i.e., is essentially annealed at temperature T. The first two entropy terms detail how many conformations exist for a fixed set of q contacts; as each common contact pins two parts of the chain together, for q contacts, the number of conformations is the loop entropy 1 of O (Q) loops of length O (1/Q). 21,22 The third entropy term results from how many ways one can choose q contacts out of the total Q max using Sterling s formula. 23 The parameter s is roughly the entropy per contact; its relevance to freezing has been detailed extensively elsewhere. 18 Finally, note that the contribution from the loop factor term in d dimensions is (d/2)q ln q. This term cancels exactly with the q ln q term from the native contact mixing entropy at d 2 and predicts very different behavior for two dimensions. 24 1 A. Yu. Grosberg and A. R. Khokhlov, Statistical Physics of Macromolecules American Institute of Physics, New York, 1994. 2 A. Yu. Grosberg and D. V. Kuznetsov, Macromolecules 25, 1970 1992. 3 B. Chu, J. Polym. Sci., Part B Polym. Phys. 31, 2019 1993 ; B. Chu, Q. Ying, and A. Yu. Grosberg, Macromolecules 28, 180 1995. 4 C. Wu and S. Q. Zhou, Macromolecules 28, 5388 1995. 5 A. Yu. Grosberg, Biofizika 29, 569 1984 ; T. Garel, L. Leibler, and H. Orland, J. Phys. II 4, 2139 1994. 6 Y. Kantor and M. Kardar, Phys. Rev. E 52, 835 1995 ; A. V. Dobrynin and M. Rubinstein, J. Phys. II 5, 677 1995. 7 L. Noirez, P. Keller, and J. P. Cotton, Liq. Cryst. 18, 129 1995. 8 C. Wu and S. Zhou, Phys. Rev. Lett. 77, 3053 1997. 9 O. B. Ptitsyn, Adv. Protein Chem. 47, 83 1995. 10 J. D. Bryngelson and P. G. Wolynes, Proc. Natl. Acad. Sci., USA 84, 7524 1987. 11 E. Shakhnovich and A. Gutin, Biophys. Chem. 34, 187 1989. 12 V. S. Pande, A. Yu. Grosberg, and T. Tanaka, Macromolecules 28, 2218 1995. 13 J. D. Bryngelson and P. G. Wolynes, Biopolymers 30, 177 1990. 14 A. Dinner, A. Sali, M. Karplus, and E. Shakhnovich, J. Chem. Phys. 101, 1444 1994. 15 S. Plotkin, J. Wang, and P. G. Wolynes, J. Chem. Phys. 106, 2932 1997. 16 B. Derrida, Phys. Rev. Lett. 45, 79 1980. 17 E. I. Shakhnovich and A. M. Gutin, Proc. Natl. Acad. Sci. USA 90, 7195 1993. 18 V. S. Pande, A. Yu. Grosberg, and T. Tanaka, Folding & Design 2, 109 1997. 19 V. S. Pande, C. Joerg, A. Yu. Grosberg, and T. Tanaka, J. Phys. A 27, 6231 1994. 20 V. S. Pande, A. Yu. Grosberg, C. Joerg, and T. Tanaka, Phys. Rev. Lett. 77, 3565 1996. 21 A. M. Gutin and E. I. Shakhnovich, J. Chem. Phys. 100, 5290 1994. 22 Note that the validity of the loop factor relies on the assumption of Gaussian statistics of the chains. For coiled conformations, this holds to a reasonable approximation and for globular conformations, the chains behave as chains in a melt, again with Gaussian statistics. Thus, the entropy A5 can be considered as the interpolation of the entropy for these two regimes and thus is most questionable in the region in between swollen and collapsed conformations; as this region is not of principle interest, we expect A5 to be sufficient. Another approximation implicit in A5 is that loops lengths are treated in a mean field manner and are assumed to be of the same length; this is valid for short chains, such as the 18-mers used in the computer simulation section. Inclusion of the fluctuation of loop lengths is an interesting correction to our theory, but beyond the scope of this paper. 23 S. Plotkin, J. Wang, and P. G. Wolynes, Phys. Rev. E 53, 6271 1996. 24 We are indebted to E. Shakhnovich for this point.