Int roduct ion t o Free Ene rgy Calculat ions ov. 2005 Julien Michel
Predict ing Binding Free Energies A A Recept or B B Recept or Can we predict if A or B will bind? Can we predict the stronger binder? Can we do this reliably and quickly?
W hat is in t he t oolbox? Rigorous Free Energy M e t hods Accuracy Em pirical Scoring Funct ions Approxim at e Free Energy M e t hods seconds / com pound CPU t im e days / com pound
The im port ance of t he ent ropy Better Lennard Jones interactions with the protein But configurat ionaly rest rict ed Experim ental Binding Free Energy difference : + 1.8 kcal.m ol -1 Free Energy calculations capture aspects of ligand binding that are overlooked (or crudely considered) by em pirical scoring funct ions : Ent ropic effect s Solvat ion
The concept of Phase space 1 dim ensional harm onic oscillator U ( x) U ( x,y) y x x U(x) = k 1 (x x o ) 2 2 dim ensional harm onic oscillator x y U(x,y) = k 1 (x x o ) 2 + k 2 (y y 0 ) 2 x Serious sim ulations typically requires thousands of degrees of freedom and hence the phase space is in thousands of dim ensions Mom otina I love you!
St at ist ical Physics in one equat ion P ens = P( r exp ) exp ( βu ( r )) ( ) βu ( r ) dr dr P P ens = ( r ) π ( r ) dr Boltzm ann dist ribut ion Ensem ble Average of P Value of P for one configuration r P = P (Post ulat e) macroscopic ens Point in Phase space
um erical int egrat ion : Quadrat ure ( ) Two atom s on a 7*7 checkerboard : 49*49 = 2401 different states contributes to eq (2) Lot s of states! In eq (2), the integral is in 3 dim ensions where is the num ber of atom s sim ulated. And there are thousands of atom s to sim ulate Direct Integration is not feasible Im portant rem ark : Many, m any of the possible states involve overlaps of atom s and hence are highly unlikely
um erical e int e grat ion : M ont e Carlo Generate random ly the position of the atom s on the checkerboard, calculate the property of interest I s (X i ) Repeat tim es The quantity I est is an estim ate of the integrant I I est = 1 i =1 ( X The essential advantage over quadrature techniques is that the m ethod can be applied to system s of high dim ensionality without having to build a m esh that covers space I S i )
W hat is t he volum e of a sphere? 1 2 3 k 2 V π = V R k Γ + 1 2 2 k um ber of dim ensions k k 1 2 3 5 10 50 100 V/V R 1.00E+ 00 7.85E-01 5.24E-01 1.64E-01 2.49E-03 1.54E-27 1.87E-69 As k increases, the vast m ajority of the points in the k-dim ension space lies outside of the sphere Random selection of points doesn t work Strong analogy with system s in the condensed phase. There are few low energy states that contributes m eaningfully to the integral and m any high energy states (e.g atom ic overlaps) that do not contribute.
Im port ance Sam pling Instead of drawing random points from an uniform distribution, draw points from a distribution π. The Monte Carlo integration equation becom es I est = 1 I ( X i i = 1 from π π ( X i ) ) π is selected so that points are in the region of space which contributes the m ost to the integrand (e.g, in the sphere) The bias on the selection of X i is rem oved when the contribution to the integrand is estim ated.
Im port ance Sam pling : Exam ple f ( x) = 3x I = 1 0 2 f ( x) π π ( x) k k ( x) dx π π 0 1 = = 1 4x 3 4 3.5 3 2.5 Estim ate of I after draw ing 100 sam ples w ith tw o different im portance sam pler Y 2 1.5 1 Funct ion Average St d. Dev 0.5 π 0 1.027 0.111 0 π 1 0.999.036 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 X f(x) =3x**2 importance distribution uniform distribution
Im port ance Sam pling in St at ist ical Physics P ens = P( r exp ( βu ( r )) ( βu ( r )) dr ) exp If P(r ) does not dom inate the product in the num erator, then an ideal im portance sam pling function to estim ate equation (2) is : dr π ( r ) i = exp exp ( βu ( r )) ( βu ( r ) ) dr i Problem : Im possible to draw sam ples from π(r i ) without knowing the denom inator, which we can t (as it involves solving directly a very difficult int egral)
M arkov Chains ( in a nut shell) A Markov Chain is a set of probabilistic rules which governs transitions between states and is often represented as a transition m atrix Π p11 p12 p13 Probability of m oving from state 1 Π = p21 p31 p22 p32 p23 p33 to 3 Assum ing Π obeys a num ber of m athem atical properties, then the following is true π (1) ( n) = lim n p Π p (1) represent an arbitrary initial distribution (e.g, a random s starting point) and Π (n) represents n applications of the transition m atrix Π This equations m eans : Repeated applications of the transition m atrix Πconverges any initial arbitrary distribution p (1) towards a unique lim iting distribution π If the transition m atrix is properly constructed, we can draw sam ples from p without having to specify π a priori ( e.g, not have to solve the bad integral )
M arkov Chains M ont e Carlo : M et ropolis M ont e Carlo ( 1 9 5 3 ) 1. Start in state I 2. Attem pt a m ove to state j with probabiliy pij 3. Accept this m ove with probability α ij π j exp( βu j ) / Z = = = exp( β ( U π exp( βu ) / Z i i 6. If the m ove is accepted, set i = j, otherwise i = i 7. Accum ulate any property of interest A(i) 8. Return to 1 or term inate after iterations j U i )) Perhaps the m ost im portant 20th century contribution to num erical science
( U ( r )) π ( r dr A = exp + β ) Absolut e free ene rgy calculat ion : P P ens = ( r ) π ( r ) dr ( ) U ( r ) π ( r dr A = exp + β ) A = exp( + βu ( r ) The first term in the integral is significant com pared to the second term. Regions of space that contributes to the integral are not always those that have a significant value of π (r) Very slow convergence of t he free energy Anyway What is the free energy of a cactus?
* probability would be m ore accurate Relat ive free energy calculat ion : Zwanzig equat ion (1954) 1 AA B = ln exp β β ( ( U ) B ( r ) U A( r ) A If A and B are sim ilar, then m ost configurations of the system A will have a sim ilar energy to the system B and the energy difference is 0 Much better convergence because only a sm all num ber of configurations have to be generated (those that differs in energy* between A and B) Energy Ua Ub (Ub-Ua) Phase Space
The coupling param et er λ : A and B have to be very very sim ilar for the previous equation to yield converged results in a reasonable tim e Solution, m ulti stage the calculation with as m any interm ediates potential as it takes A B A A = A = 1 k= 0 1 β ln exp ( β ( U ( r ) U ( r ))) λ k + 1 λ k λ k The interm ediates potentials U k can be anything you want. Typically taken by linear com bination of the potential for system A and B U k λ = (1 λk ) U A + λ k U B
Therm odynam ic Int egrat ion : A = λ 1 A B = λ = 0 U λ λ dλ Another way to get the free energy difference between A and B The quantity in brackets is the ensem ble average of the gradient of the energy with respect to the coupling param eter Calculated by im plem enting λ derivatives in the sim ulation code Or by finite difference approxim ation U U λ λ If the change A to B is big, the gradients varies a lot and m any points are necessary to obtain a good estim ate of A by num erical integration
Sum m ary G? 0.0 λ 1.0
Sum m ary H λ G = 1 = λ= 0 λ λ dλ λ = 0. 0 λ = 0. 5 λ = 1. 0 G
Relat ive Binding Free e Energy Calculat ions by a The rm odynam ic cycle A in water A in protein G 3 G 1 G 2 G 4 B in water B in protein G = G 4 - G 3 = G 2 - G 1
Problem s and pit falls : Forcefield The potentials U A and U B are approxim ate and this affect the quality of the results (ligand polarisation in water and binding site?) Convergence Long sim ulation tim es required to get a good estim ate of the free energy change (= m any hours of CPUs) Many interm ediate potentials have to be run to get a god estim ate of the free energy change (= m any CPUs) It is im possible to know if your results have converged! Som e calculations give the sam e answer if you start from a different point in phase space and run with a different random num ber. Other never gives the sam e answer This restrict the applicability of the m ethod to system that are very sim ilar.
Diagnosing convergence: Block averaging Cut the raw data into chunks and analyse independently each chunk to get a free energy. Measure the variance of the result ing dist ribut ion of free energies. Independent runs Run the com plete sim ulation m any tim es, from different starting points Therm odynam ic cycle closure A G A B C B G A C G C B G A B - G C B G A C = 0 Deviations are a m easure of lack of convergence one of the m ethods ensure your results are trully converged
W hy is it so difficult? U Phase space In docking, we only care about the single lowest energy point in phase space and the search algorithm s can cut any corner to get there But in statistical physics, you need to know the probability of each low energy point in phase Very difficult to explore quickly phase space and sim ultaneously obtain a good estim ate of the probability of the low energy states
Current scope of free e energy calculat ions: Biot in/ Sprept avidin COX-2. COX-1 Trypsin HIV-1 prot ease P38 ( ) We need Better algorithm s to run m ore quickly Typical error in a successful free energy sim ulation = 1 kcal.m ol -1 (e.g., a 5-6 fold difference in IC 50 at room tem perature) Better forcefields to get m ore accurate results Better sam pling m ethods to get m ore precise results Better ways to sim ulate different ligands to deal with a m ore diverse set of com pounds
Quest ions.