Crystal Structure Prediction using CRYSTALG program

Similar documents
Crystal structure prediction is one of the most challenging and

Ab-initio protein structure prediction

Molecular Aggregation

Monte Carlo (MC) Simulation Methods. Elisa Fadda

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007

CE 530 Molecular Simulation

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Crystal Structure Prediction A Decade of Blind Tests. Frank Leusen

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Polymorphism. 6 June 2012 Lecture 4 RUB

510 Subject Index. Hamiltonian 33, 86, 88, 89 Hamilton operator 34, 164, 166

Big Idea #5: The laws of thermodynamics describe the essential role of energy and explain and predict the direction of changes in matter.

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.

Chapter 4. Glutamic Acid in Solution - Correlations

Computer simulation methods (2) Dr. Vania Calandrini

Supporting Online Material for


Atomic and molecular interaction forces in biology

Free energy, electrostatics, and the hydrophobic effect

Protein Structure Prediction, Engineering & Design CHEM 430

Michael W. Mahoney Department of Physics, Yale University, New Haven, Connecticut 06520

Crystal Polymorphism in Hydrophobically Nanoconfined Water. O. Vilanova, G. Franzese Universitat de Barcelona

André Schleife Department of Materials Science and Engineering

Current approaches to predicting molecular organic crystal structures

Solutions to Assignment #4 Getting Started with HyperChem

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

Lecture 11: Potential Energy Functions

What is Classical Molecular Dynamics?

Entropy, free energy and equilibrium. Spontaneity Entropy Free energy and equilibrium

Introduction to Geometry Optimization. Computational Chemistry lab 2009

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Energy minimization using the classical density distribution: Application to sodium chloride clusters

Ab initio crystal structure analysis based on powder diffraction data using PDXL

The Effect of Model Internal Flexibility Upon NEMD Simulations of Viscosity

Kd = koff/kon = [R][L]/[RL]

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Global Optimisation of Hydrated Sulfate Clusters

Specific Ion Solvtion in Ethylene Carbonate and Propylene Carbonate

Dynamic modelling of morphology development in multiphase latex particle Elena Akhmatskaya (BCAM) and Jose M. Asua (POLYMAT) June

Exploring the energy landscape

Water models in classical simulations

Distance Constraint Model; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 11

Improving the accuracy of lattice energy calculations in crystal structure prediction using experimental data

Generating Small Molecule Conformations from Structural Data

V(φ) CH 3 CH 2 CH 2 CH 3. High energy states. Low energy states. Views along the C2-C3 bond

Heat, Work, Internal Energy, Enthalpy, and the First Law of Thermodynamics. Internal Energy and the First Law of Thermodynamics

k θ (θ θ 0 ) 2 angles r i j r i j

Docking. GBCB 5874: Problem Solving in GBCB

Generation of crystal structures using known crystal structures as analogues

This semester. Books

Table of Contents Preface PART I. MATHEMATICS

All-atom Molecular Mechanics. Trent E. Balius AMS 535 / CHE /27/2010

Structural Bioinformatics (C3210) Molecular Mechanics

FORCE FIELDS AND CRYSTAL STRUCTURE PREDICTION

MOLECULAR RECOGNITION DOCKING ALGORITHMS. Natasja Brooijmans 1 and Irwin D. Kuntz 2

Direct Method. Very few protein diffraction data meet the 2nd condition

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Lecture 2+3: Simulations of Soft Matter. 1. Why Lecture 1 was irrelevant 2. Coarse graining 3. Phase equilibria 4. Applications

CHEM-UA 652: Thermodynamics and Kinetics

A) sublimation. B) liquefaction. C) evaporation. D) condensation. E) freezing. 11. Below is a phase diagram for a substance.

Subject of the Lecture:

Chem 112 Dr. Kevin Moore

Phase Equilibria and Molecular Solutions Jan G. Korvink and Evgenii Rudnyi IMTEK Albert Ludwig University Freiburg, Germany

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar.

MatSci 331 Homework 4 Molecular Dynamics and Monte Carlo: Stress, heat capacity, quantum nuclear effects, and simulated annealing

Sample Exercise 11.1 Identifying Substances That Can Form Hydrogen Bonds

Big Idea 2: Chemical and physical properties of materials can be explained by the structure and the arrangement of atoms, ions, or molecules and the

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

The Potential Energy Surface (PES) And the Basic Force Field Chem 4021/8021 Video II.iii

Optimisation: From theory to better prediction

Calculation of point-to-point short time and rare. trajectories with boundary value formulation

Soot - Developing anisotropic potentials from first principles for PAH molecules. Tim Totton, Alston Misquitta and Markus Kraft 12/11/2009

Parameterization of a reactive force field using a Monte Carlo algorithm

Wang-Landau Monte Carlo simulation. Aleš Vítek IT4I, VP3


Chemistry 123: Physical and Organic Chemistry Topic 2: Thermochemistry

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Multi-Ensemble Markov Models and TRAM. Fabian Paul 21-Feb-2018

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Molecular Dynamics. A very brief introduction


Potts And XY, Together At Last

Monte Carlo Study of the Crystalline and Amorphous NaK Alloy

ON PREDICTING THE CRYSTAL STRUCTURE OF ENERGETIC MATERIALS FROM QUANTUM MECHANICS

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University

Noncollinear spins in QMC: spiral Spin Density Waves in the HEG

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Physics 211B : Problem Set #0

Limitations of temperature replica exchange (T-REMD) for protein folding simulations

Molecular Mechanics. Yohann Moreau. November 26, 2015

Theoretical comparative study on hydrogen storage of BC 3 and carbon nanotubes

PH 548 Atomistic Simulation Techniques

Transcription:

Crystal Structure Prediction using CRYSTALG program Yelena Arnautova Baker Laboratory of Chemistry and Chemical Biology, Cornell University

Problem of crystal structure prediction: - theoretical importance - practical importance for design and industrial processing of molecular materials (pharmaceuticals, pigments, optical materials, explosives) Polimorphism is the ability of a particular substance to exist in several different crystal forms. Polymorphs of a compound differ in their physical and biological properties. The solubility and rate of dissolution can vary, which influences the bioavailability. Differences in density, hardness, and crystal morphology can influence the handling properties in the production process. Color is dependent on the crystal packing and is essential for pigments. The existence of different polymorphs can cause problems with patent protection.

The stability of crystal structure at a given temperature and pressure is determined by its free energy G(T,P) = U(T,P) + pv TS Thermodynamic hypothesis: The observed structure corresponds to the global minimum of the free energy (G) The thermodynamic stability is not always the decisive factor because kinetics plays an important role, influenced by crystallization conditions. We can assume that the observed structure is among structures having a ow free energy. The latter can be approximated by the potential energy U (thermal effects may contribute as much as 2 kcal/mol).

a Crystal structure prediction is search for lowest energy minima of potential energy (which correspond to possible polymorphs) given only the atomic connectivity for an organic compound. c t x, t y, t x, ϕ, θ, ψ t x, t y, t x, ϕ, θ, ψ b E=E(a,b,c,α,β,γ,t x1,t y1,t z1,φ 1,θ 1,ψ 1, 6+Z(6+N) variables, where Z is number of molecules in the unit cell, N is number of torsional angels in the case of flexible molecule. Global optimization of potential energy (E) = multiple minima problem PROBLEM: huge number of local minima = a very efficient global optimization method is needed.

A good potential function is a necessary element for any algorithm for crystal structure prediction. On the other hand, crystal structure prediction algorithm based on global optimization is a very useful tool for assessing the quality of a given potential. Good (predictive) potential function: (a) Should reproduce the experimental structure within a certain accuracy (b) If experimental value of sublimation enthalpy is known, it should be reproduced as well. (c) The structures corresponding to the lowest-energy minima found for the potential should represent plausible structures, and one of them, preferably the global minimum, should correspond to the observed most stable experimental structure Most potentials are derived based on a series of local minimizations instead of a global optimization, therefore they may not satisfy criterion (c).

Blind test of crystal structure prediction of small organic molecules (organized by the Cambridge Crystallographic Data Centre) May 1999 Acta Crystallographica, 2000, B56, 697-714 May 2001 - Acta Crystallographica, 2002, in press

Global optimization methods for crystal structure prediction Self-Consistent Basin-to-Deformed-Basin Mapping (SCBDBM) Method Deformation-based method developed in Scheraga group in 1998. It has been successfully applied to many different systems such as proteins, Lennard-Jones clusters and crystals. Conformation-Family Monte Carlo (CFMC) A variant of Monte-Carlo Minimization (MCM). It has been recently developed in our laboratory in application to proteins and crystals.

Monte Carlo-Minimization (MCM) 1. Select a conformation at random, and minimize the potential energy. 2. Select any variable at random. 3. Make a random change in the variable. 4. Minimize the energy of the new conformation. 5. Compare E new to E old by means of the Metropolis criterion. 6. Iterate.

Conformation-Family Monte Carlo (CFMC) Helvet.Chim. Acta 2000, 83 (9), 2214-2230; PNAS 2001, 98 (22),12351-12356 The central element of the CFMC method is the structure family database Uses Metropolis criterion to move between families Uses Boltzmann distribution to choose conformation from a family Does not move between structures, but between families It is equivalent to smoothed staircase deformation of a potential function It has been successfully applied for protein structure prediction and performs as well as the Conformational Space Annealing (CSA) Original function MCM CFMC

Generate and minimize N f random conformations to initialize the conformation-family database Select the lowest-energy family as the generative family Choose conformation C from generative family F Step 1 CFMC algorithm Step 1. A conformation C is chosen at random from the generative family F, with a probability proportional to its Boltzmann weight. Yes zero Step 3a. Make new family F Generate new conformation C from conformation C Is C too close to a database structure? No To how many families does C belong? Step 2 Step 3a Step 2. This conformation C is modified to yield a new conformation C. Step 3a. If the new conformation C is unconnected to any family in the database, a new family F is created whose sole member is C. If the number of families in the database exceeds the limit N f, the family with the highest energy is eliminated.

zero Step 3a. Make new family F To how many families does C belong? one Step 3b. Add C to the family F x Update database to satisfy limits N f and N c If F F, apply Metropolis test several Step 3c. Merge all families of C into one family F Steps 3a, 3b, 3c Step 3b. If the new conformation C connected to exactly one family F in t database, the conformation is added to t family. If N > N c, the conformation with t highest energy is eliminated. Step 3c. If the new conformation C connected to more than one family in t database, these families are merged into single family F. Two families are merged if has lower energy than every conformation the higher-energy family and C has a low energy than at least one conformation of t lower-energy family. Accept F? Set F=F Step 4 Step 4. If the new family F F, a Metropo criterion is applied to determine whether to make the generative family. F becomes the new generati family if it has a lower energy than F, or if Boltzmann factor exp(-β E/kT) > random generated number in the interval (0,1)

The Methods for Producing New Conformations There are two general classes of moves used in the CFMC method: - internal (or local) moves, intended to improve the low-energy conformations within a family - external (global) moves, intended to search the conformational space for new families. Families and structures are always chosen according to Boltzmann distribution In order to avoid atomic clashes all newly generated structures are expanded before local minimization. All newly generated structures are locally minimized.

CFMC moves 1. Perturbation of translations of all molecules within unit cell by 2. Perturbation of all rotations of molecules within unit cell by 3. 4. Averaging. Every variable is calculated according to: ν * = i ν(1) x + i ν(2) i (1 - x) internal moves DPERT_2 D_PERT2A Systematic search in rotations for one molecule All variables between two structures from the same family external moves D_PERT1 D_PERT1A All degrees of freedom are perturbed All variables between two structures from different families

Input data Covalent structure of target molecule Force field parameters Number of molecules per unit cell, as a parameter Variables in the calculation (all vary independently) Internal degrees of freedom (torsional angles) Intermolecular distances (translational vectors of molecules) Euler angles for molecular rotations Components a x, b x, b y, c x, c y, c z of the lattice vectors (basis vector a of unit cell coincides with the x axis, vector b lies in the (x,y) plane, and lattice vectors form a right-handed system

Potential Energy E el E tot = E el + E nb + E tor obtained from Coulomb formula with Ewald summation. Point charges on atoms and additional sites E nb calculated with Lennard-Jones 6-12 or Buckingham 6-exp potential function E tot calculated using real part of third-order Fourier expansion E = 3 tor k m (1 cos mω) i= 1 (ω, k 1, k 2, k 3 obtained by fitting E tor to difference between ab initio and molecular mechanic (E el +E nb ) profiles)

Structure comparison for identification of a family The following values are being calculated and stored for all structures from the database: total energy E tot and volume of the unit cell V unit cell parameters of the unit cell a, b, c, α, β, γ set of shortest interatomic distances within a cutoff radius r d sorted according to their value Structure comparison is carried out using the similarity measure calculated according to the ormula: R = E i -E j / E + V i -V j / V + max r ik -r jk / r, where E, V, r are preset values of the largest deviations of energies, volumes and interatomic distances, allowed for equivalent structures. n order to speed up calculations, all local minimizations (except final ones) are carried out with elatively low accuracy. The geometrical focusing is achieved by gradual lowering R cutoff for structural differences.

Application to Crystal Structure Prediction PNAS, 2001, 98 (22), 12351-12356 Two different potentials have been tested: AMBER and W99. Set of nine rigid and flexible organic molecules was used: N N N O NO 2 N N O 2 N O BENZEN PRMDIN IMAZOL06 PHYPHM DNITBZ03 OH H 2 N OH N O O N O 2 N NITPOL03 FORAMO01 BENZON LIQVUC None of the above potentials was perfect, however W99 force field performed slightly better,.e., it is closer to satisfy criteria (a)-(c) discussed earlier.

Imidazole - CFMC global optimization -76-76.5 Lowest-energy family Family 2 Family 3 Family 4 Energy (kcal/mol) -77-77.5-78 -78.5 40 1040 2040 3040 4040 5040 Number of minimizations

Imidazole - CFMC global optimization - 200 minima 18 16 14 Number of minima 12 10 8 6 4 2 0-78.6-78.1-77.6-77.1-76.6-76.1-75.6 Energy (kcal/mol)

-75 Imidazole - CFMC global optimization - 200 minima -75.5-76 Energy (kcal/mol) -76.5-77 -77.5-78 -78.5-79 0 50 100 150 200 minimum rank

W99 AMBER rank E rank E BENZEN 2 0.19 4 0.14 PRMDIN 11 0.11 83* 0.55 IMAZOL06 24 0.79 2 0.02 PHYPHM 1 0.00 1 0.00 DNITBZ03 1 0.00 1 0.00 NITPOL03 19* 1.44 9* 0.87 FORAMO01 >200* 3.82 >200* 6.68 LIQVUC 2 0.04 5 0.52 * - experimental structure was not found during the global search

Imidazole (AMBER)

PHYPHM (AMBER)

DNITBZ (W99)

LIQVUC (W99)

(a) (b) (c) Minimized experimental structure (green, no 13 above global minimum) Global minimum Global minimum (optimized) Propylene oxide example of potential optimization

Massive parallelization worker 1 worker 2 worker 3 master worker 4 worker 5 worker 6 The CFMC algorithm has been parallelized at the coarse-grain level all local minimizations are carried out in parallel by workers. The master is maintaining the database of structures and it is making all decisions within the algorithm. MPI has been used for communication.

Efficiency of parallelization of the CFMC algorithm CFMC (ethlene Z=4) 200 180 number of energy evaluations / second 160 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 number of processors

Efficiency of parallelization of the CFMC algorithm CFMC (ethlene Z=4) 1.0 0.8 efficiency 0.6 0.4 0.2 not counting master master included 0.0 0 10 20 30 40 50 60 70 number of processors