Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot) and a few thousand of structures (protein data bank) 2 Proteins are tightly packed 3 The screen versions of these slides have full details of copyright and acknowledgements 1
Proteins can fold in vivo and in vitro (Anfinsen 59): protein folding problem 4 Protein physical properties LDAPSQIEVKDVTDTTLAI... ~1 ms 1 s ~10 Å 1. Protein sequence uniquely defines protein native (ground state) structure 2. Native state is thermodynamically stable 3. Native state is kineticall y accessible reachable in a biologically reasonable time 5 Calorimetry: important experimental test of cooperativity Calorimetric study of the lysozyme heat denaturation at various ph; The position of the heat capacity (Ср) peak determines transition temperature T0, the peak width gives the transition width DT, and the area under the peak determines heat DН absorbed by a gram of the protein; The values DT, DH protein s_m.w., and T0 satisfy van t Hoff equations indicating that the denaturation occurs as an all-or-none (first order) transition; The increased heat capacity of the denatured protein ( Ср) originated from the enlarged interface between its hydrophobic groups and water after denaturation; Adapted from P.L.Privalov & N.N.Khechinashvili, J. Mol. Biol. (1974) 86: 665-684 6 The screen versions of these slides have full details of copyright and acknowledgements 2
Small proteins are cooperative two state systems Transition state Free Energy: F = E-TS Folded: low energy E Unfolded: high entropy S 7 Theoretical analysis identified single thermodynamic parameter, energy gap ( ), as a universal predictor of folding thermodynamics and kinetics Major insights from theoretical studies: It was found that only evolutionary selected sequences that have large energy gap can fold cooperatively In kinetics, theory and simulations identified nucleation as a major kinetic event in folding, consistent with first-order cooperative-character of its thermodynamics Folding nuclei were found and characterized Review: Chemical Reviews, 106, pp.1559-88 (2006) 8 Why energy gap is important? random and evolutionary selected sequences 9 The screen versions of these slides have full details of copyright and acknowledgements 3
Large GAP design Test of protein folding theory: the importance of energy gap Energy Q Monte Carlo Steps 10 Finding folding nucleus in simulations Q FREE ENERGY 1x10 6 MC steps 4x10 6 Abkevich et al., Biochemistry 33, 10026-10036 (1994) Shakhnovich et al., Nature 379, 96-98 (1996) 11 Protein engineering: Φ -value analysis Method: engineer a protein with an altered amino acid at a target position and test to which extent the transition state is affected compared to the native state mutant wild type G T G U Transition States Φ = 1: Residue is kinetically important Φ = 0: Residue is kinetically unimpor tant Unfolded States Fersht, Curr. Opin. Struct. Biol. 7, 3-9 (1997) G N Native State 12 The screen versions of these slides have full details of copyright and acknowledgements 4
Folding nucleus in SH3 domains 13 Evolutionary control of folding rates and stability The idea: nucleus residues may determi ne folding rate Therefore if evolution cared about folding kinetics it could have exerted extra pressure on nucleus residues Nucleus residues can be found from the analysis of conservation in sequences of structurally aligned proteins (Mirny and EIS, J.Mol.Biol., 299, p.177 (1999) 14 Evolutionary analysis correctly predicts folding nucleus in Ig-fold proteins Prediction: Mirny and EIS, JMB, 1999 Experiment: J.Clarke and coauthors (2001) 15 The screen versions of these slides have full details of copyright and acknowledgements 5
An all-atom Monte-Carlo folding simulation Unfolded (random coil) Folded (native state) 16 Protein G folding of a small protein in all-atom detail Go model Black first beta-hairpin Red alpha helix Green second beta-hairpin J.Shimada and EIS, PNAS, 99, p. 11175 (2002) 17 Protein G folding pathways: summary Helix-hairpin 1 (accumulates) Helix-β1 Unfolded Helix-β1 or β2 β1-β4 sheet (does not accumulate) Helix-β2 Green circle/box means native-like structure Helix-hairpin 2 (does not accumulate) Native 18 The screen versions of these slides have full details of copyright and acknowledgements 6
What about TSE in protein G? A protocol using Pfold identifies conformations that are committed to fold very fast, downhill, in less than 10 7 steps 19 A structure belonging to the transition state ensemble Green = important in WT Red = important also in mutant 20 From sequence to structure (i.e. non-go) All atom Low RMSD. (wishlist.) How to fold a protein? Approach: All-atom statistical potentials (2-body + hydrogen bonds) Kussell, ES PNAS 02, Hubner, Deeds, ES, PNAS 05 21 The screen versions of these slides have full details of copyright and acknowledgements 7
The potential Energy function: E tot = E contact + a E h-bond Hydrogen bonding potential working for α proteins 22 Contact term: µ-potential Considers only side-chain side-chain interactions; 79 different atom type in total 0.75 1.35 µ A i : atom type of atom i (79 different types) : no. of contacts between A & B in the DB : no. of contacts in pairs not in contact : chosen to make the net interaction zero EAB d ij/(r i+r j) E ij In contact E.Kussell, PNAS 02 Hubner Deeds, ES, PNAS 05 23 Methods 4000 folding runs from fully unfolded chain At constant T Graph analysis of massive data Clustering in multiple order parameters: a multidimensional comprehensive view Analysis of the transition state ensemble 24 The screen versions of these slides have full details of copyright and acknowledgements 8
Folding at physiological T ~25 C 25 Identifying the native state Lowest E prediction is 2.44 Å (best of 4000) Of 4000, 44 trajectories sampled the 2 Å range, 523 3 Å range, 1685 4 Å range, 2700 5 Å range, and 3331 6 Å range This is consistent with usual exponential distributions of FPT 26 A network ensemble view folding Construct a structural graph by clustering confor mations observed in all trajectories Allows combination of multiple trajectories Multidimensi onal view: cluster conformations based on various properties: RMSD, Rg, drms That will allow to fully characterize the folding mechani sm, while any single order parameter may be misleading How to introduce ensemble kinetics into the graph description: idea of flux! 27 The screen versions of these slides have full details of copyright and acknowledgements 9
Example: RMSD graph Each node represents a protein conformation; Colored by RMSD to the native state: from blue (closest) to red (most distant) 28 Flux: putting all runs together 29 Folding scenario: summary 30 The screen versions of these slides have full details of copyright and acknowledgements 10
Conformati ons of the drms intermediar y cluster perfectly Atomistically resolved structural intermediate with NMR-derived structures of L16A model of the intermedi ate (25 structures 1UZC, A. Fersht and coworkers) Average RMSD between drms cluster and 25 1UZC is 4.6 Å, some conformations as low as 1.5 Å 31 Successful ab initio ensemble folding of a small alpha helical domain at constant physiological T < T m Ensemble folding pathway at atomic detail Conclusions Various earlier proposed mechanisms at work: collapse, framework intermediate, nucleation (late) in a fully resolved pathway A graph is useful to conceptualize and organize protein structural space 32 33 The screen versions of these slides have full details of copyright and acknowledgements 11