Dominant Paths in Protein Folding

Dominant Paths in Protein Folding Henri Orland SPhT, CEA-Saclay France work in collaboration with P. Faccioli, F. Pederiva, M. Sega University of Trento Henri Orland Annecy meeting 2006

Outline Basic notions on Proteins Langevin dynamics Dominant paths Example: Villin The Folding Path problem Path integral representation Hamilton-Jacobi representation

1. What is a Protein Biological Polymers (biopolymers): Proteins, Nucleic Acids (DNA and RNA), Polysaccharides! catalytic activity: enzymes! transport of ions: hemoglobin (O 2 ), ion channels! motor protein! shell of viruses (influenza, HIV, etc...)! prions! food, etc Proteins have an active site: biological activity

Polymers built with amino-acids! 20 types of amino acids! all left-handed! Ala, Ile, Leu, Met, Phe, Pro, Trp, Val, Asn, Cys, Gln, Gly, Ser, Thr, Tyr, Arg, His, Lys, Asp, Glu! 10! Number of Monomers! 500 H H O N C C H OH R residue

Among the 20 amino-acids:! 12 hydrophilic (polar)! 8 hydrophobic (non polar) 8 uncharged 4 charged In a typical protein:! polar! hydrophobic Examples of residues: H : glycine C H H : alanine H : phenylalanine

Polymerisation (polycondensation) NH2---CHR1---COOH + NH2---CHR2---COOH +... NH2---CH (CO NH)-- CH (CO NH)-- CH (CO NH)--- \-----------/ R1 peptide R2 R3 bond + H2O! weakly branched polymer

! Hard degrees of freedom: covalent bonds valence angles peptide bonds improper dihedrals! Soft degrees of freedom " torsion angles : #, $, % very small energies

Proteins exist under two states:! Denatured = Unfolded Random Coil (swollen) Molten Globule (compact) No biological activity! Native = Folded = Unique compact structure Biologically active Number of compact structures of a polymer : N ~! Puzzle: below folding transition temperature, the protein seems to exist under a unique conformation (zero conformational entropy). Folding transition: depends on temperature, ph, denaturant agent, salt, etc Time scales: Microscopic time : 10-15 s Folding time: 10-2 to 1 s

Tertiary structure: 3d structure of the folded protein!compact packing of secondary structures.

HIV protease (199 residues)

The Chemist s Approach 1. Look for effective atom-atom interactions semi-empirical Hamiltonian 2. Molecular dynamics or Monte Carlo. What interactions are present? bonded -covalent bond -sulfur bridges (cysteins) non bonded solvent. -Coulomb (with partial charges) -Van der Waals (steric repulsion) -Hydrogen bonds : intra-molecular or with the The solvent is polar (Water) and induces hydrophobic interactions which might be responsible for the collapse transition.

Energy Scales 1 ev = 23 kcal/mole = 10000! K 300! K = 0.6 kcal /mole " Covalent bond: 50-150 kcal /mole " Sulfur Bridge: 51 kcal/mole " Hydrogen bonds: 5-8 kcal/mole (non polar solvent) 1-2 kcal/mole (polar solvent) " Van der Waals: 1 kcal/mole " Coulomb: 1-2 kcal/mole Denaturation temperature! 1 kcal/mole Chemical sequence is frozen and only non-covalent interactions drive the folding.

Parametrization (CHARMM, AMBER, OPLS, ) E % " bonds k b ( b ) b 0 ) 2 $ $ " ( & ' )/ " ij " & 12 ij 6 4* ) # ij ( ) ( ) $ " i# j ij ij i# j $ % # $ (1 $ cos( n. ) - )) 2 k/ (/ 0) k. k, valence angles dihedrals impropers + r + r 332 * q q i r ij j $ " (, ), ) 0 2 Use Newton or Langevin dynamics m % E... i ri " $ i ri "!! i % ri ( t) where! i (t) is a Gaussian noise satisfying the fluctuation-dissipation theorem: $! ( t)! ( t) #! 2$ k T" " ( t t') i j i B ij,

Then, it is well known that P({ r }, t) i + - -. -. exp ) t, * ({ r}) Ek i - - / B T ( & ' T o discretize, one m ust use "t ~ 10-15 10-13 s N um ber of degrees of freedom : N # 1000 L ongest available runs (w ith w ater) t ~ 10-8 s W e see that t < < folding tim e. Reason: system is trapped in an exponential number of metastable traps.

The protein folding problem is too complicated Simpler problem: how do proteins go from the unfolded state to the native state?

Denaturation curves [Fraction Native] 1 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 [Denaturant] In given denaturant conditions, a fraction of the proteins are native, and the rest are denatured

This means that in given denaturant conditions, a protein spends a fraction of its time in the native state and a fraction of its time in a denatured state.

The Folding Pathway Problem The problem: Assume a protein can go from state A to state B. Which pathway (or family of pathways) does the protein take? Is there a transition state ensemble? Examples: from denatured to native in native conditions Allosteric transition between A and B

Langevin dynamics The case of one particle in a potential at temperature Use Langevin dynamics T U(x) where γ is the friction and ζ(t) is a random noise m d2 x dt 2 + γ dx dt + U x = ζ(t)

Overdamped Langevin dynamics At large enough time scale, mass term negligible mω 2 γω τ 2π m γ γ = k BT D τ 10 13 s D = 10 5 cm 2 /s m 5.10 26 kg

Take overdamped Langevin (Brownian) dynamics x t = D k B T U x + η(t) Gaussian noise with zero av with Gaussian noise: s a Gaussian noise with zer η(t)η(t ) = 2Dδ(t t ) constant of the particle in

Equation of motion is a stochastic equation The Probability to find the particle at point x at time t is given by a Fokker-Planck equation t P(x,t) = D x ( 1 k B T U(x) x P(x,t) ) + D 2 x 2 P(x,t) P (x, 0) = δ(x x i )

Fokker-Planck equation looks very much like a Schrödinger equation, except for 1st order derivative. Define P (x, t) = e βu(x) 2 Q(x, t) The function Q(x, t) equation with a Hamiltonian H satisfies a Schrödinger

Using the notations of Quantum Mechanics P (x f, t f x i, t i ) = e U(x f ) U(x i) 2k B T < x f e (t f t i )H x i > where H is a quantum Hamiltonian given by H = D( 2 x 2 + 1 2 Spectral decomposition U(x) (β x )2 β 2 U(x) x 2 ) < x f e (t f t i )H x i >= α e (t f t i )E α Ψ α (x f )Ψ α (x i )

At large time, the matrix element is dominated by the ground state with so that Ψ 0 (x) = e βu(x)/2 Z Z = e βu(x) HΨ 0 = 0 P (x f, t f x i, t i ) e βu(x) Z + e β U(x f ) U(x i ) 2 e (t f t i )E 1 Ψ 1 (x f )Ψ 1 (x i )

Stationary distribution: the Boltzmann distribution lim t + ( ) General form: Path Integral Boundary conditions: at the stationary solution o P (x, t) = P(x) exp( U(x)/k B T ) he boundary conditions x P(x f,t f x i,t i ) = e U(x f ) U(x i ) 2k B T Z x f Dx(τ)e S e f f [x]/2d, x i R ( ) ) x(t i ) = x i ntegral: x(t f ) = x f

The effective action is given by S e f f [x] = R t t i d τ ( ẋ2 (τ) Z and the effective R ( ) potential is given by Z ) 2 +V e f f [x(τ)] ( ) V e f f (x) = D2 2 ( 1 k B T ) U(x) 2 D2 x k B T 2 U(x) x 2.

U(x) = x 2 (5(x 1) 2 0.5) -0.5 V eff (x) = U (x) 2 /2 T U (x) 1.25 1 0.75 0.5 0.25-0.5 0.5 1 1.5 2-0.25 N x V eff (x) = U (x) 2 /2 T U (x) 6 5 4 3 2 T = 0 T = 0.5 20 15 10 5 1-0.5 0.5 1 1.5 2 Henri Orland Annecy meeting 2006 N -0.5 0.5 1 1.5 2-5 -10 N

Effective Native States and Transition States It seems natural to define the native state as the minimum of anharmonicity V eff (x). Shift due to x N (T ) x N (0) + T U 0 U 2 0

V eff (x) 4 3 T = 0.02 2 1-0.2 0.2 0.4 0.6 0.8 1 1.2 x Denatured state Henri Orland Annecy meeting 2006 Native

Dominant trajectories: classical trajectories with correct boundary conditions. Problem: one does not know the transition time. Solution: go from time-dependent Newtonian dynamics to energy-dependent Hamilton-Jacobi description. d 2 x dt 2 = ( V eff [x]) x

10 5-0.5 0.5 1 1.5-5 N -0.2 0.2 0.4 0.6 0.8 1 1.2-1 -2 N -10-15 -20 T = 0.5 T = 0.02-3 -4 E eff = ẋ2 2 V eff (x)

The method: minimize the Hamilton-Jacobi action S HJ = Z x f over all paths joining to x i dl 2(E e f f +V e f f [x(l)]), The total time is determined by t f t i = Z x f x i dl x i x f dl is an infinitesimal displacement along the path y. E e E f f is ais free a free parameter parameter which determines the to psed during the transition, Z 1 2(E e f f +V e f f [x(l)]). ld be stressed that the conserved quantity E

S e f f [x] = R t t i d τ ( ) ẋ2 (τ) 2 +V e f f [x(τ)] For classical trajectories One obtains ( ) E eff = ẋ2 2 V eff (x) S eff [x] = E eff (t f t i ) + x f x i dx 2(E eff + V eff (x))

E e f f ing tr is not the true energy of the system If the final state is an equilibrium state, then imulations). In the E e f f = V e f f (x f ), time. However, w

The HJ method is much more efficient than Newtonian mechanics because proteins spend most of their time trying to overcome energy barriers. No waiting-times in HJ: work with fixed interval length dl

For a Protein, minimize S HJ = N 1 n 2(E e f f +V e f f (n)) l n,n+1 + λp, where e P = N 1 i ( l i,i+1 l ) 2 and λ is a Lagrange multiplier to fix the interval length V e f f (n) = i ( l) 2 n,n+1 D2 D2 2(k B T) 2 k B T j ( ( j u(x i (n),x j (n)) j ] 2 ju(x i (n),x j (n)) = (x i (n + 1) x i (n)) 2, i ) 2

Go potential U X u x i ;x j X 1 2 j;i 1 i<j i<j 2 K b jx i x j j a i;j R0 12 2R 0 6 Rr 12 r ij r ij R 0 6 2R 0 6 1 i;j ; r ij Initial conditions: 6 high temperature denatured states from MD the Villin Headpiece Subdomain

Results for the Villin Headpiece Go Model 2 60 Gyration radius (nm 2 ) 50 40 30 20 0 20 40 60 80 100 Percentage of configurational steps

Percentage of monomers in alpha helix conformation 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 Percentage of configurational steps 400 350 Number of contacts 300 250 200 150 100 0 20 40 60 80 100 Percentage of configurational steps

Conclusions Natural definition of Folding pathways No need for reaction coordinate Transition states Calculation of rates Working on using atomic potentials and including solvent.