Simulating Folding of Helical Proteins with Coarse Grained Models

366 Progress of Theoretical Physics Supplement No. 138, 2000 Simulating Folding of Helical Proteins with Coarse Grained Models Shoji Takada Department of Chemistry, Kobe University, Kobe 657-8501, Japan (Received October 11, 1999) We describe how potential parameters in a coarse graind model of proteins can be optimized with use of available protein three dimensional database. With this optimized potentials, we simulated a three helix bundle protein and found that all trajectories reach at the native structure within 1 microsecond. Interestingly, a quasi-mirror image is successfully discriminated from the native topology. 1. Introduction Protein folding has been intensively studied for about 40 years since Anfinsen s famous experiments. 1) The problem includes two aspects; physical understanding of folding mechanisms 2), 3) and predicting the three dimensional structure.for both aspects, it is crucial to construct a model that is realistic enough to discriminate the native structure from many non-native ones and that is simple enough to be able to sample wide range of conformational spaces with currently available computers. Models that include all atoms and solvent molecules, such as CHARM or AMBER, are still somewhat too demanding for this purpose.on the other hand, so called minimal models that include one bead per amino acid seems to be too crude for, at least, prediction. Recently, models that are in between above mentioned two limits have been proposed and studied.in this paper, we describe our recent work to this direction. Our model includes 4 united atoms per amino acid (3 for glycine), by which backbone dynamics is modeled quite realistically, while side chain atoms are grouped into a bead and solvent effects are taken into account only indirectly.functional form of interactions is devised to be consistent with physico-chemical knowledge; especially solvent effects are carefully taken into account via an idea of context dependent dielectric constant. With this model, we performed simulation of a 54 residue long protein-like peptide made of three kinds of amino acids (PRO54), where three types of amino acids include hydrophobic, polar, and flexible ones. 4) The sequence of PRO54 is designed to have three helix bundle structure imitating a laboratory designed four helix bundle of DeGrado s group.for PRO54, restricting the ranges of parameters not too different from experimentally anticipated values, we tried to tune parameters empirically so that the peptide can reach at three helix bundle form starting from any random coil structure.after months of trial, we ended up with a set of parameters that indeed enables the peptide to fold within a microsecond. Although its promising result, we found some significantly different properties for the simulated PRO54 comparing with natural proteins. 4) Among them are 1) native

Simulating Folding of Helical Proteins with Coarse Grained Models 367 like state of PRO54 has significant residual fluctuation, in which three helix bundle form is kept, but their relative alignment changes, 2) folding-unfolding transition is much less cooperative for PRO54, and 3) the most seriously, two quasi-mirror images of three helix bundle forms have almost same stability for PRO54.The first two characters were actually observed in laboratory designed peptide of DeGrado s, too. The third one may need more explanation: For three helix bundle topology, there can be two different ways of alignment of three helices.for both, all amino acids in the core are hydrophobic ones and the surface amino acids in helices are polar ones.thus it is natural not to be able to have energy gap between the two topology.thinking these together, we concluded that major reason for above mentioned differences is due to three letter codes, instead of due to inappropriate modeling. Now we go forward in trying to simulate a natural protein, namely that made of 20 types of amino acids (actually 17 amino acids exist in a protein studied in this paper, though).apparently, the model includes many more parameters and ad hoc determination of them is hopeless.thus, we need some systematic ways to determine them.we use an idea developed by Wolynes and his co-workers; 5) namely optimize parameters so that relative stability of the native structure against misfold ones normalized by the standard deviation of energy fluctuation is maximal.this will be discussed in detail in the next section.with the optimized potential parameters, folding simulation is performed for a three helix bundle protein, albumin binding domain with 47 residues (pdb code; 1prb).Some preliminary result is reported in 3.Conclusion is given in the last section. 2. Optimization of energy parameters Here, we start with a short summary of the model used; an amino acid is modeled as three backbone united atoms, NH, CH, and CO, and a bead for the side chain. The latter is located near the center of mass of non-hydrogen atoms.molecular dynamics simulation is performed by the position Langevin equation, where the Stokes law is utilized to decide the friction coefficients of atoms.all chemical bond lengths (real for backbone, and virtual between CH and a side chain) and bond-bond angles are fixed by the LINCS algorithm, 6) which is significantly better than the socalled SHAKE, the well-known algorithm.the systematic force in the Langevin equation is calculated by the derivative of the potential function, which consists of various interactions, V = V ω + V φ + V ψ + V Rama + V vdw + V HB + V HP + V EL. (2.1) Meaning of each term is the following; (in order) the hindered rotation around ω dihedral angle (1st), that around φ (2nd), that around ψ (3rd), the side chain entropy effect representing the secondary structure propensity (4th), the van der Waals potential (5th), the hydrogen bonding interaction (6th), the hydrophobic interaction (7th), and the electrostatic interactions (8th).The explicit expression will be described elsewhere. The potential function includes many energetic parameters ɛ in linear form in

368 S. Takada Fig. 1. Z score optimization procedure as a function of Monte Carlo steps. The top curve is for Z max, the dashed curve is for average of 39 Z scores, and other three are Z scores of the first three proteins out of 39 s. For the first 50000 step, only the hydrophobic interaction parameters are optimized that is followed by the optimization of the rest of parameters with fixed hydrophobic ones. the potential energy term; V (r, ɛ)= i ɛ i u i (r), (2.2) where ɛ i is a parameter and u i (r) is a function of protein conformation collectively denoted as r.now we introduce the so-called Z score, Z(ɛ) = V (r nat,ɛ) V (r, ɛ) D, (2.3) V (ɛ) the (potential) energy of the native structure V (r nat ) relative to average energy V (r) D of denatured ensemble divided by the standard deviation of energy fluctuation V.(Note that the opposite sign to this definition is sometimes used.) It was theoretically analyzed that for the protein to fold quickly avoiding severe trap in misfolded states the protein has to have reasonably small Z score, i.e., negative and large in absolute value in the current definition. 5) Our strategy to optimize parameters is as follows; We first choose some training set of proteins for which native three dimensional structures are known from experiments.we use 39 proteins in this paper.our goal is to find a set of potential parameters that can be used for simulation of any proteins.therefore we decided

Simulating Folding of Helical Proteins with Coarse Grained Models 369 to use the maximum value of Z score, Z max, as an index representing quality of the energy function.we performed simulated annealing runs in parameter space ɛ with use of Z max as a scoring function; Namely, assuming some initial set of parameters ɛ, we compute Z score of training proteins and get the maximal value of them, Z max. We then make small change in ɛ and recompute Z max.metoropolis criteria is used for Z max to decide whether the change is accepted or rejected.the procedure is repeated with decreasing temperature until getting an annealed parameters. Figure 1 represents a trajectory of an annealing run, where in addition to Z max, average of Z scores for 39 proteins, and Z scores of the first three proteins are plotted as a function of Monte Carlo step. With use of the optimized parameter set, we computed Z scores of several small proteins that are not in the training set.the Z values are as follows; 3.48 for 1bdd (the pdb code), 2.64 for 1r69, 0.75 for 1coa, 1.20 for 2gb1, 1.39 for 1srl, and 0.82 for 1nmg where the first two are all α proteins, while the others include β sheet.namely, for all-α proteins, the current energy function seems to be useful even if they are not involved in the training set.unfortunately, this is not the case for proteins with β sheet. 3. Simulating protein folding with 20 letter code: Albumin binding domain Since the current energy function is supposed to be good for helical proteins, we performed folding simulation of a three helix bundle protein, 47 residue of albumin binding domain (6 residues are cleaved out from the sequence in the pdb file 1prb). We tune up one parameter that is responsible for strength of overall collapse.after tuning up this one parameter, we found that almost all trajectories (13/14 at this moment) can reach at the native like structure within 1µs, starting from random conformations.the simulated native structure, after quenching, has about 3 A root mean square deviation (RMSD) from the experimental structure.figure 2 shows snapshots of a typical folding trajectory; after several tens of nanoseconds, collapse and helix formation simultaneously occured.after forming about-right topology, it takes somewhat long time to reach at the native structure.in contrast to the simulation result for PRO54, the protein-like peptide with three types of amino acids, we found that a quasi-mirror image is seldom reached through folding runs.energetic analysis suggested that quasi-mirror image has total energy about 7 kcal/mol higher than the right topology.this difference arises from the vdw interaction, the HB interaction, and the HP interaction. 4. Conclusions An automated optimization of potential parameters is proposed and is tested. We found that training of parameters with 39 protein database leads to better modeling not only for proteins in the database but also many other proteins.in particular all-α proteins have better score with the current energy function.langevin dynamics simulation for a three helix bundle protein is performed finding that folding run from

370 S. Takada Fig. 2. Snapshots of a typical folding trajectory for albumin binding domain (drawn with Molscript 7) ). any random conformations always reaches at a native-like structure within about 3 A RMSD.A quasi-mirror image where helix alignment is opposite to the native structure is discriminated about 7 kcal/mol and almost all trajectories fall into the right conformation. Acknowledgments I would like to appreciate Peter G.Wolynes and Zaida Luthey-Schulten for useful discussions.this work has been supported by JSPS Research for the Future Program Photo Science and by the Grant-in-Aid on Priority Areas Molecular Physical Chemistry.

Simulating Folding of Helical Proteins with Coarse Grained Models 371 References 1) A. R. Fersht, Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding (WH Freeman and Co NY 1999). 2) J. D. Bryngelson, J. N. Onuchic, N. D. Socci and P. G. Wolynes, PROTEINS: Struct, Funct, Genetics. 21 (1995), 167. 3) J. J. Portman, S. Takada and P. G. Wolynes, Phys. Rev. Lett. 81 (1998), 5237. 4) S. Takada, Z. Luthey-Schulten and P. G. Wolynes, J. Chem. Phys. 110 (1999), 11616. 5) R. Goldstein, Z. Luthey-Schulten and P. G. Wolynes, Proc. Natl. Acad. Sci. USA 89 (1992), 4918. 6) B. Hess, H. Bekker, H. J. C. Berendsen and J. G. E. M. Fraaije, J. Comp. Chem. 18 (1997), 1463. 7) P. J. Kraulis, Molscript, J. Appl. Crystallogr. 24 (1991), 946.