Simulation of molecular systems by molecular dynamics

Simulation of molecular systems by molecular dynamics Yohann Moreau yohann.moreau@ujf-grenoble.fr November 26, 2015 Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 1 / 35

Introduction MD is simulating the evolution of a system of particles along time. - Exploration of the PES: many configurations considered (static QM: only one per calculation) Dynamics algorithms are compatible with any method generating an energy in function of the geometry (non-born-oppenheimer dynamics is not considerd here). - MM QM or QM/MM MD - MM MD will be used for discussion here MD allows the calculation of thermodynamics values (free energy), statistical values (sampling), structural descriptors or study the evolution of a system along time. very important for biological systems (proteins) Few numerals - System containing up to 10 6 atoms - Simulation time from 10ps to 1µs (typically 100ns) Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 2 / 35

History... 1957-1959 : Work of Alder & Wainwright simulation of hard spheres (rare gases) 1974 : first simulation of liquid water by Stillinger & Rahman 1977 : first simulation of a protein (BPTI) by McCammon Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 3 / 35

Classical simulation methods: fundamental principles Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 4 / 35

Classical simulation methods: fundamental principles Basic Statements (for our short lecture) Use of classical mechanics, applied to a molecular system for which energy is a function of geometry - The motion of atoms is made classically (no TD-QM!) The principle is based on the second law of Newton: - The acceleration of a body is proportional to the net force felt by the body divided by its mass, or: i Fi = m a If one knows the forces 1 acting on all the particles at each time, one can calculate the acceleration and make the system evolve along time, using an integration scheme. Values calculated can be related to thermodynamics (internal, free, energies, etc.) 1 which derives from the potential: F = gradv Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 5 / 35

Classical simulation methods: fundamental principles Simulating the evolution of a system along time More than one particle: building the trajectory (integration of Newton s equations) is made numerically. The potential energy has to be computed at regular intervals every δt δt is the time step or integration step To switch from t to t + δt, forces are considered constant during δt δt has to be small enough to keep the total energy constant but should be as large as possible to simulate a longer time For a system with unconstrained X-H bonds, δt = 10 15 s or 1 fs Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 6 / 35

Classical simulation methods: fundamental principles Integrating the Newton s equations of motion All algorithms consider that the geometry at t + δt can expressed as a Taylor series (with: v(t) = x (t) and a(t) = x (t)): x(t + δt) = x(t) + v(t)δt + 1 2 a(t)δt2 +... Hence, one must know the position x(t), velocities v(t) and forces via potential ( F = gradv ) at each t from the previous configuration Different algorithms: Verlet, Velocity Verlet,Leap-Frog, Runge-Kutta, Predictor-Corrector, Etc... Verlet and Leap-Frog are detailed hereafter Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 7 / 35

Classical simulation methods: fundamental principles The Verlet algorithm(1967, by Loup Verlet, born 1931) The 4 th order expansion of the position at (t + δt) and (t δt) are: x(t + δt) = x(t) + v(t)δt + 1 2 a(t)δt2 + 1 6 b(t)δt3 + O(δt 4 ) x(t δt) = x(t) v(t)δt + 1 2 a(t)δt2 1 6 b(t)δt3 + O(δt 4 ) In which the acceleration a(t) depends on forces: a(t + δt) = F (t+δt) m Summing them up leads to cancel the 1 st and 3 rd order terms: - The position at x(t + δt) can then be deduced: x(t + δt) = 2x(t) x(t δt) + a(t)δt 2 + O(δt 4 ) Velocities not needed; acceleration is known (derives from forces) Strength: easier and robust calculation but could be more precise Velocities derive from postions v(t) = x(t+δt) x(t δt) 2 t Also: Velocity-Verlet (equation 1 only, velocity is computed). Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 8 / 35

Classical simulation methods: fundamental principles The Leap-Frog algorithm Velocities of half steps are used ( at t + 1/2δt) : x(t + δt) = x(t) + v(t + 1/2δt)δt v(t + 1/2δt) = v(t 1/2δt) + a(t)δt Velocities and positions are computed every δt but shifted by 1/2δt - Eval. of positions leaps above the ones of velocities and vice-versa Strength: explicit calculation of velocities (precise) Drawback: velocities are not computed at the same time as positions and have to be approximated: v(t) = 1 2 [v(t 1/2δt) + v(t + 1/2δt)] Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 9 / 35

Classical simulation methods: fundamental principles Choice of the step size δt Must be small enough to avoid integration instabilities energy conservation (vide infra) Should be the largest possible to simulate a longer time within the same wallclock time The choice depends on the algorithm chosen and on the system The limit is related to the highest frequency of vibration: Example : ν XH 10 14 s, δt should be 1/10 of it, hence 1fs max.. Fixing X-H bonds ( Shake algo. ) δt = 2fs is good. The wall clock time of calculation directly depends on δt Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 10 / 35

Few words about thermodynamics Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 11 / 35

Few words about thermodynamics Total, potential, kinetic energies, velocities and T For an isolated system, E tot (t) = E kin. (t) + E pot. (t) = constant E kin. (t) = i 1/2m iv 2 i (t) Equipartition of energy implies E kin. = 3/2Nk b T With a large enough sampling, one can estimate T At the beginning of a simulation, individual velocities of each particle is given following a Maxwell-Boltzmann distribution with p(v ix ) the probability for atom i to have a velocity of v ix : p(v ix ) = ( mi 2πk b T ) 1 2 exp ( m i v ix 2k b T Moreover, the total momentum should be zero ( flying ice cube ) p = i p i = i m iv i = 0 ) Temperature can be kept constant in a simulation with a thermostat : - Berendsen (periodic rescaling of velocities), Nosé-Hoover Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 12 / 35

Few words about thermodynamics Pressure and Volume Starting from the virial theorem, pressure and volume can be related The total virial is the sum of an ideal gas part and an contribution due to interactions between particles Skipping theory, pressure is related to volume via: P = 1 Nk B T 1 N N r ij f ij V 3k B T i=1 j=1+1 - r ij is the distance between particles i and j (computed) - f ij is the force acting between particles i and j (computed too) Pressure kept constant by adjusting the volume of the simulation cell If volume is kept constant, pression has ro fluctuate Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 13 / 35

Few words about thermodynamics Thermodynamic ensembles A system can be simulated under different conditions: - The common ensembles are named using fixed quantities: NVE (microcanonical) : - isolated system in a periodic cell with a fixed volume NVT (canonical) : - system in a periodic cell of fixed volume, coupled to a thermostat NPT : - barostatic and thermostatic system: the volume of the simulation cell can vary µpt/µvt (grand canonical): - the chemical potential is fixed, the nb. of particles can vary Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 14 / 35

Few words about thermodynamics The ergodic hyopthesis, simulation time Experimentally, an obsevable A corresponds to an average over a large number of replicas of the system taken simultaneously With MD, we study a single system, which makes impossible to calculate an ensemble average The ergodic hypothesis states that the ensemble average is equal to the time average: A ensemble = A time By allowing the system to evolve indefinitely, it will eventually pass through all the possible states. - Experimentally relevant informations (structure, thermodynamics) can the be calculated. Practically, simulations have to be long enough to generate enough representative configurations Typical simulation time: 1 to more than 100 ns. Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 15 / 35

Periodic Boudnary Conditions and long range interactions Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 16 / 35

Periodic Boudnary Conditions and long range interactions Periodic Boundary Conditions: principles Goal: simulating a system without boundary effects The opposite sides of the simulation cell (box) are joined 2 A particle (or an interaction!) that exits on the right enters on the left In order to avoid the self-interaction of a molecule with its image one uses the minimum-image convention: Interactions are truncated beyond a distance lower or equal to the half the cell parameter (box size) Constraints: All the sides must be joined (i.e. the replication of the cell must fill the space without empty space) The size of the box must be large enough so that interaction truncation is made at a long distance Better to keep the simulation box electrically neutral (compulsory for Particle-Mesh-Ewald) 2 pacman Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 17 / 35

Periodic Boudnary Conditions and long range interactions PBC and minumum-image convention: illustration in 2D Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 18 / 35

Periodic Boudnary Conditions and long range interactions Dealing with long range interactions Non bonded terms: Van der Waals and electrostatic Have to cancel according to PBC and the minimum image convention Different ways to apply a cutoff distance 3 : - Set to zero abruptly beyond a given distance (not shown) - Smooth bias between a switching distance and the cutoff (preferred) - shift the whole potential to be zero at the good distance and beyond. 3 picture from K. Shulten s website Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 19 / 35

Periodic Boudnary Conditions and long range interactions Ewald Summations for electrostatics (not detailed) Because electrostatic interactions are long ranged, they are likely to extend outside the box one way to work around is to split the description using Ewald summation (not detailed here) Practically: - a short-range term for direct interactions (every step) - a long-range term, computed as for periodic systems not every step Calculation is made in the inverse space (like in solids) Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 20 / 35

How to conduct an MD simulation? (Example of proteins) Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 21 / 35

How to conduct an MD simulation? Step one: Generate a starting system of good quality 1 Get (or build) the geometry of a system (from the PDB, e.g.) - Rebuild the structure (some amino-acids may be missing) - Structure from X-Ray: hydrogens have to be added! - Correctly protonate the titrable residues (ASP, HIS, ASP, etc.) Minimization of the structure for added atoms 2 Addition of the environment (solvent, lipidic bi-layer) Often: superimposition of a water box with the protein, deletion of water overlaping with protein Minimization of solvent molecules The size of the simulation box is fixed for PBC 3 Counter-ions are added in the solvent to nuetralize the whole system Na + or Cl are used in general Minimization of the solvent and ions Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 22 / 35

How to conduct an MD simulation? 2- Dynamical treatment of the system 1 Gradual heating of the system by slow increase of velocities Example: 30ps of heating from 0 à 300K by steps of 10K every 0.1 ps (every 100 steps of 1fs) 2 After heating the system, the MD simulation can be started Phase 1 : Equilibration of the system Goal: Assess the stability (energy, structure) of the system The length of equilibration is in function of the size of hte system Phase 2 : Production After equilibration, positions and velocities are savec relgularly to compute quantities afterward Remark : Since each point of the trajectory depends on the previous only, it is possible to restart an MD without redoing all the previous calculations (positions and forces must be available) Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 23 / 35

How to conduct an MD simulation? Trajectory anlaysis and computable properties All properties not implying chemical reactivity Statistical and or structural properties: Radial distribution function (average nb. of first neighbors), heat of vaporization, etc. Dynamic properties: diffusion constants Evolution of the structure of a system: (un)folding of (small) proteins Free energy profile of a transformation To constraint the dynamics some constraints and/or restraints can be added See next part for examples of computed properties Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 24 / 35

Other methods related to Classical Molecular Dynamics Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 25 / 35

Other methods related to Classical Molecular Dynamics Different possibilities Intrinsically, every system is dynamic (T > 0 K) 1 Born-Oppenheimer MD an a QM or QM/MM PES Allows to simulate of the evolution of the electronic structure (reactivity) - Software: CP2K 2 Dynamics made by propagating elecgronic degrees of freedom Carr-Parinello (CPMD) Atomic Density Matrix Propagation (ADMP : Gaussian) 3 Meta-dynamics Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 26 / 35

Other methods related to Classical Molecular Dynamics Metropolis Monte-Carlo: a random-based approach Instead of building a trajectory, with each point depending on the previous, random configurations are ganarated. Usually, config. n is generated by a random change to config. n 1 Needs a smart algorithm to modify the structure of the system 1 Strength: only energy is evaluated, no need to integrate equations of motion or evaluate velocities - fast many points can be generated 2 Drawback: Not all the configs generated are kept (some are discarded) Discarding test: Metropolis (Boltzmann criterion) 3 Complex: How to choose the way to modify the system? The algorithm must give an unbiased statistical sampling 4 A Metropolis-MC simulation naturally generates ans NVT ensemble - But other ensembles are possible Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 27 / 35

Other methods related to Classical Molecular Dynamics Flowchart for a Metropolis Monte-Carlo simulation Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 28 / 35

Simulations Results: computing properties of a system Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 29 / 35

Simulations Results: computing properties of a system Visual analysis of the result, with VMD, e.g. Helpful to understand and describe some phenomena - How a protein unfolds e.g. Example: Oil and water separation by molecular dynamics simulation - https://www.youtube.com/watch?v=xcmshy3cqxa Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 30 / 35

Simulations Results: computing properties of a system Energetic Properties Energy - Average energy: < E >= 1 N N i=1 E i - Total energy (terms) along time - With a force field, one can separate the different components: the average value for different energies can be calcualted - Total energy, bonding energies, etc... heat capacity at constant volume is the variation of internal energy U with respect to temperature : C v = ( ) U T V in an NVT ensembe, it can be computed as: C v = 1 (U U m kt 2 ) 2 = 1 ( U 2 ) kt 2 M U 2 M Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 31 / 35

Simulations Results: computing properties of a system Structural properties: The Radial Distribution Function (RDF) Example : the O-O RDF for three models of water (Jorgensen et al. 1979) The maximum of a peak gives the average distance of a solvation shell Integral under the peak yields to the number of molecules in the associated shell Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 32 / 35

Simulations Results: computing properties of a system Structural properties: How mobile a structure is RMSD : Root Mean Square Deviation - RMSD compared to a given state in function of time 1 - rmsd(t) = n n i=1 r i(t) r i (0) 2 Allows to quantify the variation of global geometry along simulation time - Can be made on the whole system or on a part of it (not on solvent) RMSF : Root Mean Square Fluctuation - Variation of the position of a particle around a given position or in a given state 1 - rmsf (i) = T T t=1 r i(t) r(av.) i 2 Allows to determine which atoms move the most Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 33 / 35

Simulations Results: computing properties of a system Dynamic quantities based on the time correlation function: Correlation of two quantities along time: - C AB (t) =< A(t).B(0) > quantities can be individual or collective - C AA (t) =< A(t).A(0) >: autocorrelation function of quantity A. An autocorrelation function measures how a variable is correlated with its initial value along time. -at t=0, C=1; at t, C=0 The Fourier transform of some autocorrelation functions can be related to spectroscopic data! Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 34 / 35

Simulations Results: computing properties of a system Dynamic quantities based on the time correlation function: Diffusion coefficient as the integral of velocities autocorrelation functions : D = 1 3 0 V (t).v (0) dt The Fourier transform of the total dipole moment autocorrealtion function (with some factors) can be used to obtain the IR spectrum I (ω) of the system: I (ω) = α(ω)n(ω) = 2πω ( 1 e βhω) 3hcV + dt M(t). M(0) e iωt - Good for intensities, less good for positions of peaks - Recent works by Gaigeot et al. using QM and QM/MM MD Yohann Moreau (UJF) Molecular Dynamics, Label RFCT 2015 November 26, 2015 35 / 35