Limitations of temperature replica exchange (T-REMD) for protein folding simulations Jed W. Pitera, William C. Swope IBM Research pitera@us.ibm.com
Anomalies in protein folding kinetic thermodynamic 322K 305K Yang & Gruebele, Nature 2003 1.0 Ma & Gruebele, PNAS 2005 ~320K ~325K Scaled SVD 0.8 0.6 0.4 0.2 Fl. data, 63 C Fl. fit IR data, 63 C IR fit 0.0 Garcia-Mira et al,science 2002 0 50 100 Time (µs) 150
Hunting anomalies with a computational model Do we have the correct/converged answer for our model? How do we know? Do we have a model that reflects reality? How do we compare the model against experiment? What is missing from our model?
Replica exchange molecular dynamics (REMD) 400K 350K 300K Multiple simulations of the same system are run in parallel at different temperatures (T-REMD), state points or Hamiltonians Monte Carlo moves periodically exchange systems between adjacent temperatures Allows escape from local minima Ideal for cases where we want temperature-dependent properties 350K ensemble { }
T-REMD motivations T high realm of perfect sampling Interested in a system at a temperature/state point where sampling is slow (T low ) Long correlation times Broken ergodicity Assume that sampling is fast at some other temperature/state point (T high ) Simulate as many intermediate state points as necessary to bridge T low and T high Trajectories (MD) or Markov Chains (MC) decorrelate at T high, importance sample at T low In many cases T was an interesting variable anyway Experiments often provide A(T) or perturb a system by T T-dependent phenomena of biological interest T low broken ergodicity
Model system T-REMD 1-D double well potential, compare MD and T-REMD Energetic barrier; activated process with Arrhenius kinetics (ln(k) linear in 1/T) ln k -6 MD0 0.001 0.002 0.003 0.004 0.005 0.006-7 1-8 10-9 -10 100-11 1000-12 MD 1 10 100 1000 10000 transitions 3000 2500 2000 1500 1000 MD 1 10 100 1000 10000-13 10000 500-14 0.95 1 1.05-15 aggregate transitions in 2.5x10^7 steps vs MD (4.5k/4.5k) 1/T 0 1 2 3 4 5 replica Similar rates vs. 1/T All replicas undergo transitions
trpzip2 β-hairpin in explicit solvent 12 amino acid peptide trpzip2 in explicit solvent TIP3P, AMBER parm96 or parm99sb 3605 waters, 11034 atoms Cubic box (equil.10ns NPT @ 310K, 1 atm) edge length 48.095 Å PME electrostatics 9Å Switch for vdw/direct; long range vdw correction Replica exchange molecular dynamics (80 replicas at a range of temperatures) Exchanges every 40 ps; Andersen collisions every 10 ps 2 independent calculations with different initial conditions 80 representative conformations from implicit solvent folded (0.68 µs/replica, aggregate 54 µs) All 80 replicas started in the same fully extended conformation unfolded (1.45 µs/replica, aggregate 116 µs)
Effect of exchange period on relaxation from the folded state
The hazards of non-thermalized initial conditions Rapid unfolding from folded initial conditions Exchange period shorter than the relaxation time of the potential energy
Stability of folded initial conditions
Convergence from unfolded initial conditions
SASA (Å 2 ) IBM Research trpzip2 thermodynamics SASA PMF vs. T Continuous, weak collapse transition fr from folded temperature (K) from unfolded
Cα RMSD (Å) IBM Research trpzip2 thermodynamics CαRMSD PMF vs. T Measure of the backbone deviation from the NMR structure from folded temperature (K) from unfolded
heavy atom RMSD (Å) IBM Research trpzip2 thermodynamics heavy atom RMSD PMF vs. T Spurious absence of a barrier from folded temperature (K) from unfolded
Comparison of cluster populations Approximate stochastic k-medoid clustering of merged & downsampled data set to produce a set of 40 clusters; metric was distance matrix error of C α, trp C δ and C ζ3
The energy landscape Markov model of 425K kinetics (N. Singhal) Compact states are isolated local minima connected by unfolded state Kinetics in/out of these minima are slow
Apparent folding rates RMSD criteria folded: < 2.5 Å C α - RMSD from NMR unfolded: > 6 Å C α - RMSD from NMR Track perreplica transitions, record T Exp tl k f 5x10-7, k u 5x10-8 ps -1 @ 296 K (Snow et al PNAS 2004)
Apparent folding rates cluster membership Transitions to/from cluster #1 Order of magnitude difference in rates Different T- dependence
Apparent folding rates nonequilibrium data Successive block averages of same data set Started folded, parm99sb Systematic ~2x change in unfolding rate
Conclusions T-REMD is useful but not a panacea No increase in aggregate # of transitions Many interesting barriers entropic rather than energetic No T where sampling is infinitely fast Explicit solvent REMD of proteins has limitations Decoupling of D.O.F. of interest (protein) from extended variables (T, U, etc.) Large N small T, limiting replica motion in T Sampling limited by intra-replica correlation time Folding not a simple activated process
Acknowledgements William Swope, Hans Horn, Julia Rice (ARC) Robert Germain (YKT), Blue Gene Science & Application Team Martin Gruebele & Wei Yang (UIUC) Vijay Pande, Nina Singhal, Michael Shirts (Stanford) John Chodera & Ken Dill (UCSF)
Gordon Research Conference in Computational Chemistry, July 27 Aug 1 2008 Mount Holyoke, MA Chair: Dr. Jed W. Pitera, IBM Research Vice Chair: Prof. Dr. Walter Thiel, MPI-Kohlenforschung Force fields, electronic structure, quantum dynamics, chemical reactions, drug design, docking, coarse-grained simulation http://www.grc.org accuracy time and space