arxiv:cond-mat/ v1 [cond-mat.soft] 23 Mar 2007

Similar documents
arxiv: v1 [cond-mat.soft] 22 Oct 2007

arxiv:cond-mat/ v1 [cond-mat.soft] 19 Mar 2001

arxiv:cond-mat/ v1 [cond-mat.soft] 5 May 1998

arxiv:cond-mat/ v1 2 Feb 94

A Quasi-physical Algorithm for the Structure Optimization Off-lattice Protein Model

arxiv: v1 [physics.bio-ph] 26 Nov 2007

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics

Generalized Ensembles: Multicanonical Simulations

First steps toward the construction of a hyperphase diagram that covers different classes of short polymer chains

Monte Carlo simulation of proteins through a random walk in energy space

Lecture 8: Computer Simulations of Generalized Ensembles

Multicanonical parallel tempering

Exploring the energy landscape

Parallel Tempering Algorithm for. Conformational Studies of Biological Molecules

arxiv: v1 [cond-mat.soft] 15 Jan 2013

arxiv:cond-mat/ v1 [cond-mat.soft] 16 Nov 2002

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 6 Aug 2003

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

It is not yet possible to simulate the formation of proteins

Lecture V: Multicanonical Simulations.

Simulation of mutation: Influence of a side group on global minimum structure and dynamics of a protein model

Computer simulation of polypeptides in a confinement

Computer Simulation of Peptide Adsorption

arxiv: v1 [cond-mat.soft] 24 Jun 2015

Master equation approach to finding the rate-limiting steps in biopolymer folding

Short Announcements. 1 st Quiz today: 15 minutes. Homework 3: Due next Wednesday.

3D HP Protein Folding Problem using Ant Algorithm

Computer Physics Communications

Folding of small proteins using a single continuous potential

Effective stochastic dynamics on a protein folding energy landscape

Polymer Simulations with Pruned-Enriched Rosenbluth Method I

Proteins polymer molecules, folded in complex structures. Konstantin Popov Department of Biochemistry and Biophysics

A Microscopically-Based, Global Landscape Perspective on Slow Dynamics in. Condensed-Matter

chem-ph/ Feb 95

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

Long Range Moves for High Density Polymer Simulations

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 2 Nov 2006

Computer simulations of protein folding with a small number of distance restraints

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Local Interactions Dominate Folding in a Simple Protein Model

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Energy Landscape and Global Optimization for a Frustrated Model Protein

V(φ) CH 3 CH 2 CH 2 CH 3. High energy states. Low energy states. Views along the C2-C3 bond

Multicanonical jump walk annealing: An efficient method for geometric optimization

Multicanonical methods

Stretching lattice models of protein folding

Wang-Landau algorithm: A theoretical analysis of the saturation of the error

Phase Transition in a Bond. Fluctuating Lattice Polymer

Influence of temperature and diffusive entropy on the capture radius of fly-casting binding

arxiv:cond-mat/ v2 [cond-mat.soft] 29 Nov 2004


Elucidation of the RNA-folding mechanism at the level of both

Lecture 34 Protein Unfolding Thermodynamics

Random heteropolymer adsorption on disordered multifunctional surfaces: Effect of specific intersegment interactions

Analysis of the Lennard-Jones-38 stochastic network

Frontiers in Physics 27-29, Sept Self Avoiding Growth Walks and Protein Folding

FREQUENCY selected sequences Z. No.

Energy Landscape Distortions and the Mechanical Unfolding of Proteins

Temperature dependence of reactions with multiple pathways

Designing refoldable model molecules

Supporting Online Material for

The role of secondary structure in protein structure selection

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 8 Nov 1996

Design of sequences with good folding properties in coarse-grained protein models Anders Irbäck*, Carsten Peterson, Frank Potthast and Erik Sandelin

Available online at ScienceDirect. Physics Procedia 57 (2014 ) 82 86

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds)

Reliable Protein Folding on Complex Energy Landscapes: The Free Energy Reaction Path

PCCP PAPER. Interlocking order parameter fluctuations in structural transitions between adsorbed polymer phases. I. Introduction

DesignabilityofProteinStructures:ALattice-ModelStudy usingthemiyazawa-jerniganmatrix

Topological defects and its role in the phase transition of a dense defect system

Multicanonical methods, molecular dynamics, and Monte Carlo methods: Comparison for Lennard-Jones glasses

Computer simulations as concrete models for student reasoning

Markov Chain Monte Carlo Method

Simulating Folding of Helical Proteins with Coarse Grained Models

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Autocorrelation Study for a Coarse-Grained Polymer Model

Efficient Combination of Wang-Landau and Transition Matrix Monte Carlo Methods for Protein Simulations

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

arxiv:chem-ph/ v1 11 Nov 1994

Structuraltransitionsinbiomolecules-anumericalcomparison of two approaches for the study of phase transitions in small systems

Optimized statistical ensembles for slowly equilibrating classical and quantum systems

A simple theory and Monte Carlo simulations for recognition between random heteropolymers and disordered surfaces

Simulated Tempering: a New Monte Carlo Scheme.

Folding of a Small Helical Protein Using Hydrogen Bonds and Hydrophobicity Forces

Equilibrium Study of Spin-Glass Models: Monte Carlo Simulation and Multi-variate Analysis. Koji Hukushima

Protein folding. Today s Outline

Optimized multicanonical simulations: A proposal based on classical fluctuation theory

Polymer Collapse, Protein Folding, and the Percolation Threshold

SIMULATED TEMPERING: A NEW MONTECARLO SCHEME

Energy functions and their relationship to molecular conformation. CS/CME/BioE/Biophys/BMI 279 Oct. 3 and 5, 2017 Ron Dror

Predicting free energy landscapes for complexes of double-stranded chain molecules

Visualizing folding of proteins (1 3) and RNA (2) in terms of

A first-order transition in the charge-induced conformational changes of polymers

Protein Folding experiments and theory

Monte Carlo Methods for Rough Free Energy Landscapes: Population Annealing and Parallel Tempering. Abstract

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University

Biopolymer structure simulation and optimization via fragment regrowth Monte Carlo

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Transcription:

The structure of the free energy surface of coarse-grained off-lattice protein models arxiv:cond-mat/0703606v1 [cond-mat.soft] 23 Mar 2007 Ethem Aktürk and Handan Arkin Hacettepe University, Department of Physics Engineering Beytepe, Ankara, Turkey Tarık Çelik Turkish Academy of Sciences, Piyade Sok. No: 27, Çankaya, Ankara, Turkey Abstract We have performed multicanonical simulations of hydrophobic-hydrophilic heteropolymers with a simple effective, coarse-grained off-lattice model to study the structure and the topology of the energy surface. The multicanonical method samples the whole rugged energy landscape, in particular the low-energy part, and enables one to better understand the critical behaviors and visualize the folding pathways of the considered protein model. Keywords: off-lattice protein models, Conformational Sampling, multicanonical Simulation. PACS numbers: 05.10.-a, 87.15.Aa, 87.15.Cc Electronic address: handan@hacettepe.edu.tr (corresponding-author) 1

I. INTRODUCTION Predicting the proteins structure and the folding mechanism is an important goal in structural biology. Although the physical principles are known, the complexity of proteins as being macromolecules consisting of numerous atoms makes an accurate analysis of the folding process of realistic proteins extremely difficult. The configuration space of proteins presents a complex energy profile consisting of tremendous number of local minima, barriers and further topological features. Because of energy barriers, the commonly used thermodynamic simulation techniques are not very efficient in sampling of a rugged landscape. The simulated molecule often remains in its starting wide microstate or move to a neighboring wide microstate, but in practise it will hardly reach the most stable one. The system may be trapped in a basin for a long time, which results in nonergodic behavior. On the other hand, the topography of the energy landscape, especially near the global minimum is of particular importance; this is because the potential energy surface defines the behavior of the system. Consequently, a visualization of the whole rugged landscape, covering the entire energy and temperature ranges, would be helpful in developing methods to allow one to survey the distribution of structures in conformational space and also gives the details of the folding pathway. Such a goal can be achieved with the multicanonical approach. The problem of protein folding entails the study of a nontrivial dynamics along pathways embedded in a rugged energy landscape. Therefore, one of the important aspects in this field is studying simple effective, coarse-grained models that allows a more global view on the relationship between the sequence of amino acid residues and the existence of a funnel-like structure along a pathway towards the energy minimum in a rugged free-energy landscape [1]. One of the most known examples is the HP model of lattice proteins, which has been exhaustively investigated [2, 3]. In this model, only two types of monomers are considered, with hydrophobic (H) and polar (P) character. Chains on the lattice are self-avoiding to account for the excluded volume. The only explicit interaction is between non-adjacent but next-neighbored hydrophobic monomers. This interaction of hydrophobic contacts is attractive to force the formation of a compact hydrophobic core which is screened from the (hypothetic) aqueous environment by the polar residues. Statistical mechanics aspects of this model are being subject of recent studies [4, 5, 6]. 2

Another off-lattice generalisation of the HP model is the AB model [7], where the hydrophobic monomers are labelled by A and the polar or hydrophilic ones by B. The contact interaction is replaced by a more realistic distance-dependent Lennard-Jones type of potential accounting for short-range excluded volume repulsion and long-range interaction; the latter being attractive for AA and BB pairs and repulsive for AB pairs of monomers. An additional interaction accounts for the bending energy of any pair of successive bonds. This model was first applied in two dimensions [7] and generalized to three-dimensional AB proteins [8, 9], with modifications by taking implicitly into account the additional torsional energy contributions of each bond. II. THE MODEL In this work, we study the effective off-lattice AB model of heteropolymers with N monomers. AB model as proposed in Ref. [8] has the energy function N 2 E = κ 1 k=1 N 2 4 i=1 j=i+2 N 3 b k b k+1 κ 2 N k=1 b k b k+2 + ( 1 C(σ i, σ j ) 1 rij 12 rij 6 ), (1) where b k is the bond vector between the monomers k and k+1 with length unity. In Ref. [8] different values for the parameter set (κ 1, κ 2 ) were tested and finally set to ( 1, 0.5) as this choice provide both the distributions for the angles between bond vectors b k and b k+1 and the torsion angles between the surface vectors b k b k+1 and b k+1 b k+2 giving the best agreement with the distributions obtained for selected functional proteins. Since b k b k+1 = cosϑ k, the choice κ 1 = 1 makes the coupling between successive bonds antiferromagnetic. The second term in Eq. (1) takes torsional interactions into account. The third term contains a pure Lennard-Jones potential, where the 1/r 6 ij long-range interaction is attractive whatever types of monomers interact. The monomer-specific prefactor C(σ i, σ j ) only controls the depth of the Lennard-Jones valley: +1, σ i, σ j = A, C(σ i, σ j ) = +1/2, σ i, σ j = B or σ i σ j. 3 (2)

For technical reasons, we have introduced in both models a cut-off r ij = 0.5 for the Lennard- Jones potentials, below which the potential is repulsive hard-core (i.e., the potential is infinite). For updating a conformation, as shown in Fig. 1, the length of the bonds are fixed ( b k = 1, k = 1,...,N 1). The (i+1)th monomer lies on the surface of a unit sphere centered on the ith monomer. Therefore, spherical coordinates are the natural choice for calculating the new position of the (i + 1)th monomer on this sphere. For the reason of efficiency, all the points on the sphere are not selected for updating, but restricted the choice to a spherical cap with maximum opening angle 2θ max (the dark area in Fig. 1). Thus, to change the position of the (i+1)th monomer to (i+1), we select the angles θ and ϕ randomly from the respective intervals cosθ max cosθ 1 and 0 ϕ 2π, which ensure a uniform distribution of the (i+1)th monomer s positions on the associated spherical cap. After updating the position of the (i + 1)th monomer, the following monomers in the chain are simply translated according to the corresponding bond vectors which remained unchanged in this update. This is similar to single spin updates in local-update Monte Carlo simulations of the classical Heisenberg model with the difference that, in addition to local energy changes, long-range interactions of the monomers are to be computed anew after the update,due to changing their relative position. III. THE METHOD The multicanonical ensemble is based on a probability function in which the different energies are equally probable. However, implementation of the multicanonical algorithm (MUCA) is not straightforward because the density of states n(e) is a priori unknown. In practice, one only needs to know the weights ω, w(e) 1/n(E) = exp[(e F T(E) )/k B T(E)], (3) and these weights are calculated in the first stage of simulation process by an iterative procedure in which the temperatures T(E) are built recursively together with the microcanonical free energies F T(E) /k B T(E) up to an additive constant. The iterative procedure is followed by a long production run based on the fixed w s where equilibrium configurations 4

are sampled. Re-weighting techniques (see Ferrenberg and Swendsen [10] and literature given in their second reference) enable one to obtain Boltzmann averages of various physical variables over a wide range of temperatures. As pointed out above, calculation of the a priori unknown MUCA weights is not trivial, requiring an experienced intervention. For lattice models, this problem was addressed in a sketchy way by Berg and Çelik [11] and later by Berg [12]. An alternative way is to establish an automatic process by incorporating the statistical errors within the recursion procedure. IV. RESULTS AND DISCUSSIONS We first carried out canonical (i.e., constant T) MC simulations of six different 20-monomer sequences [13] at temperatures T = 0.3, 0.6 and 2.4, as well as MUCA test runs to determine the required energy ranges for each sequence. The multicanonical weights were built after m = 500 recursions during a long single simulation, where the multicanonical parameters were iterated every 10000 sweeps. After having fixed the MUCA weight factors, a production run was carried out with 5 10 7 sweeps. Fig. 2 shows, as a sample, the histograms of the multicanonical run for the sequence AAAAB- BAAAABAABAAABBA. We can see from the figure that the whole temperature range, including the hard-to-reach low-energy region, is equally well sampled. Our simulations with these multicanonical parameters reveiled the lowest energy conformation for this sequence (our suspected GEM) at E = 59.105. Fig. 3 and Fig. 4 display the energy and the specific heat vs. temperature for the same sequence. The specific heat possess a peak about the temperature T = 0.28, which corresponds to the transition point where the system undergoes a structural change from random coil to more compact globular state. We define an order parameter (OP) [14] OP = 1 1 90 n F n F i=1 α (t) i where n F is the number of bond and torsion angles, α (RS) i α (RS) i, (4) and α (t) i are the bond and torsion angles of the considered configuration and of the choosen reference state, which is usually 5

taken as the global energy minimum (GEM) state. The difference α (t) i α (RS) i the interval [ 180, 180 ], which in turn gives is always in 1 < O > T 1 (5) The order parameter may be considered as a measure of coinsedence of the considered conformation with the reference state. The free energy of the system was calculated by the formula [15] and the entropy in the above formula is, F(T, OP) = E TS(OP), (6) S(OP) = log H(OP) (7) where H(OP) is the histogram that the system has an order parameter value OP at a fixed temperature T. The multicanonical algorithm provides a goal sampling of the entire energy (temperature) range and enables one to determine the histograms H without problem. In Fig. 5, we show the free energy surface with respect to the other parameter OP and temperature T, evaluated by utilizig the multicanonical technique. The figure displays the free energy distribution of the created fifty million conformations of the choosen model protein with respect to the order parameter and temperature. Instead of potential energy, we have calculated and displayed the free energies in order to include the entropic effects, which are more meaningfull in considerations of the folding pathways. the free energy surface displays a valley structure and clearly pictures the existing funnel towards the state of global energy minimum. This topographic picture also allows one to realize at what temperature the system go through a structural change. Fig. 5, together with the contour map given in Fig. 6, enable one to visualize where the system under consideration leaves the more structural energy landscape and reaches to a funnel towards the lower energy states. This visualization is very helpfull from point of designing more effective generic simulation algoritms specific to the system under consideration. 6

V. CONCLUSIONS We have simulated different 20-monomer sequences of the off-lattice AB model proteins by utilizig multicanonical ensemble approach and investigated the free energy surface. The obtained picture serves as a usefull tool for visualization of the behaviour of considered system in the configuration space and provide helpfull information in designing more effective simulation algorithms. VI. ACKNOWLEDGEMENTS H.A. acknowledges support by The Scientific and Tecnological Research Council of Turkey (TÜBİTAK) under the project number 104T150 and from The Turkish Academy of Sciences (TÜBA) under the Programme to Reward Successful Young Scientists. T.Ç. acknowledges the Turkish Academy of Sciences (TÜBA). Fruitful discussion with G. Gökoğlu is acknowledged. [1] J. N. Onuchic, Z. Luthey-Schulten, and P. G. Wolynes, Annu. Rev. Phys. Chem. 48, 545 (1997); J. N. Onuchic and P. G. Wolynes, Curr. Opinion in Struct. Biol. 14, 70 (2004). [2] K. A. Dill, Biochemistry 24, 1501 (1985); K. F. Lau and K. A. Dill, Macromolecules 22, 3986 (1989). [3] R. G. Larson, L. E. Scriven and H. T. Davis, J. Chem. Phys. 83, 2411 (1985). [4] G. Chikenji, M. Kikuchi, and Y. Iba, Phys. Rev. Lett. 83, 1886 (1999), and references therein. [5] H.-P. Hsu, V. Mehra, W. Nadler, and P. Grassberger, J. Chem. Phys. 118, 444 (2003); Phys. Rev. E 68, 021113 (2003). [6] M. Bachmann and W. Janke, Phys. Rev. Lett. 91, 208105 (2003); J. Chem. Phys. 120, 6779 (2004). [7] F. H. Stillinger, T. Head-Gordon, and C. L. Hirshfeld, Phys. Rev. E 48, 1469 (1993); F. H. Stillinger and T. Head-Gordon, Phys. Rev. E 52, 2872 (1995). [8] A. Irbäck, C. Peterson, F. Potthast, and O. Sommelius, J. Chem. Phys. 107, 273 (1997). [9] A. Irbäck, C. Peterson, and F. Potthast, Phys. Rev. E 55, 860 (1997). [10] A.M. Ferrenberg, R.H. Swendsen, Phys Rev Lett 61, 2635 (1988); ibid 63, 1658 (1989). 7

[11] B.A. Berg, T. Çelik, Phys Rev Lett 69, 2292 (1992). [12] B.A. Berg, Nucl Phys B (Proc. Suppl.) 63A-C, 982 (1998). [13] M. Bachmann, H. Arkin, W. Janke, Phys. Rev. E 71, 031906 (2005). [14] U.H.E. Hansmann, Y. Okamoto and J.N. Onuchic, Proteins 34, 472 (1999); U.H. E. Hansmann and J. N. Onuchic, J. Chem. Phys. 115, 1601 (2001). [15] J. N. Onuchic, P. G. Wolynes, Z. Luthey-Schulten and N. D. Socci, Proc. Natl., Acad. Sci. USA 92, 3626 (1995); S. S. Plotkin, J. Wang and P. G. Wolynes, J. Chem. Phys. 106(7), (1997). 8

FIG. 1: Spherical update of the bond vector between the ith and (i + 1)th monomer. Histogram x 1000 1000 900 800 700 600 500 400 300 200 100 0-60 -50-40 -30-20 E FIG. 2: Histogram of multicanonical simulations at T = 2.4K for the sequence AAAAB- BAAAABAABAAABBA. 9

-20-25 -30-35 <E> -40-45 -50-55 -60 0 0.2 0.4 0.6 0.8 1 T FIG. 3: The Boltzmann average energy as a function of temperature obtained from multicanonical simulation for the sequence AAAABBAAAABAABAAABBA. 70 60 50 Cv 40 30 20 10 0 0 0.2 0.4 0.6 0.8 1 T FIG. 4: Specific heat obtained from multicanonical simulation as a function of temperature. 10

20 25 30 35 Free Energy 40 45 50 55 60 0 0.2 Overlap 0.4 0.6 0.8 1 2.5 2 1.5 Temperature 1 0.5 0 FIG. 5: Free energy surface in the configuration space of sequence AAAAB- BAAAABAABAAABBA. 11

0.9 0.8 0.7 0.6 Overlap 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 Temperature FIG. 6: Free energy surface contour in the configuration space of sequence AAAAB- BAAAABAABAAABBA. 12