Current approaches to predicting molecular organic crystal structures

Size: px

Start display at page:

Download "Current approaches to predicting molecular organic crystal structures"

Morgan Watkins
6 years ago
Views:

Crystallography Reviews ISSN: 0889-311X (Print) 1476-3508

com/loi/gcry20 Current approaches to predicting molecular

crystal structures, Crystallography Reviews, 17:1, 3-52, DOI:

517526 To link to this article: https://doi.org/10.

Submit your article to this journal Article views: 1671 View

1 Crystallography Reviews ISSN: X (Print) (Online) Journal homepage: Current approaches to predicting molecular organic crystal structures Graeme M. Day To cite this article: Graeme M. Day (2011) Current approaches to predicting molecular organic crystal structures, Crystallography Reviews, 17:1, 3-52, DOI: / X To link to this article: Published online: 11 Jan Submit your article to this journal Article views: 1671 View related articles Citing articles: 100 View citing articles Full Terms & Conditions of access and use can be found at

2 Crystallography Reviews Vol. 17, No. 1, January March 2011, 3 52 Current approaches to predicting molecular organic crystal structures Graeme M. Day* Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK (Received 29 June 2010; final version received 18 August 2010) Considerable effort has been invested in developing methods for predicting the crystalline structure(s) of a given compound, ideally starting from no more than a structural formula of the molecule. Reliable computational predictions would be of great value in many areas of materials chemistry, from the design of materials with novel properties to the avoidance of an undesirable change of form in the late stages of development of an industrially important molecule. Methods used in crystal structure prediction are reviewed, with particular focus on the most common approach global lattice energy minimization. Progress and current limitations are highlighted, with reference to examples from the literature and the results of blind tests organized to objectively monitor developments in the field. Keywords: crystal structure prediction; modelling; lattice energy; polymorphism Contents page 1. Motivation for computational prediction of crystal structures 4 2. Computational methods The molecular model Crystal structure generation Size of the search space Algorithms for structure generation Clustering of structures Lattice energy calculations Atomistic methods Electrostatic models Performance of lattice energy minimization for CSP Flexible molecules Alternative methods for lattice energy calculations: solid-state DFT and semi-classical density sums Beyond lattice energy calculations Dynamic contributions to the energy Lattice dynamics 29 * gmd27@cam.ac.uk ISSN X print/issn online ß 2011 Taylor & Francis DOI: / X

3 4 G.M. Day The influence of vibrational energy contributions in CSP Molecular dynamics Crystal properties and growth Morphology prediction Prediction from known structures The blind tests of CSP Concluding remarks and the future Requirements for progress in CSP and future prospects Structure searching The thermodynamic model Kinetics 42 Acknowledgements 42 Notes on contributor 43 References 43 Subject Index Motivation for computational prediction of crystal structures The ab initio prediction of crystal structures is a rapidly growing area of research and developments are being driven by both academic interest in the problem and the potential benefits to any industry that relies on delivering a product in crystalline form. The longterm aim of computational crystal structure prediction (CSP) is to propose the crystal structure(s) that will be observed for a given molecule, given no other information than the structural formula and conditions of crystallization. The structure of a crystal is a necessary starting point for understanding and predicting its physico-chemical properties. Furthermore, the calculated properties of predicted crystal structures could be used as a screen for crystals with novel properties, such as a high non-linear optical response (which requires a non-centrosymmetric packing of the molecules). The calculations can be performed in advance of synthesizing the molecule, so they have the potential to streamline research by narrowing down experimental investigations to the most promising candidates. CSP is also of great interest to the pharmaceutical industry because of the prevalence of polymorphism (different crystal structures of a given compound) of pharmaceutical molecules. The occurrence of polymorphs is a potentially disastrous problem, should it occur without warning late in the processing cycle of a pharmaceutical molecule. The most notorious example of the consequences of unanticipated polymorphism is the temporary withdrawal from the market of the protease inhibitor Ritonavir (1, 2); the scientists dealing with the impact of the new polymorph concluded that it is highly advisable to put enough resources to carry on exhaustive research to identify the most stable and all possible polymorphs (1). Computational methods of predicting crystal structures could play a large part in this research, both in identifying all possible polymorphic forms and judging their relative stabilities an examination and evaluation of all possible crystal structures of a given compound is a step towards the understanding and, perhaps, control of polymorphism.

4 Crystallography Reviews 5 While industrial applications may be a major motivation for the development of CSP, surveys of the CSP literature up to 2004 ((3, 4); covering about 20 years since the first CSP studies were reported) showed that, up to that point, pharmaceuticals made up only 4% of systems studied, and pigments a further 3%. Hydrocarbons, on the other hand, represented 22% of molecules studied. The types of molecules that had been studied reflect a greater focus on method development, where relatively simple model systems are being examined to assess the assumptions and methodologies, than applications of CSP to industrially important molecules. However, over the past few years, the size and complexity of systems studied has increased significantly, with some impressive applications to structure prediction of larger, flexible pharmaceutical molecules (5, 6), as well as salts (7 10), cocrystals (11 13) and solvates (14 17). Over 20 years ago, Maddox (18) described our inability to predict a molecule s crystal structure as a scandal and one would have thought that, by now, it should be possible to equip a sufficiently large computer with a sufficiently large program, type in the formula of the chemical and obtain, as output, the atomic coordinates of the atoms in a unit cell. Over the next few years, programs started appearing that attempted to do just this and, in 1994, Gavezzotti (19) addressed the fundamental question Are crystal structures predictable? while reviewing the difficulties and necessary advances in methodology, his answer wavered between No and a qualified Maybe. Since then, the many approaches that are being developed have been assessed in four blind tests of CSP (20 23), where the predictions were performed in advance of the real crystal structures being made available. There were some successes under these blind test conditions during the first three of these blind tests (held between 1999 and 2004; (20 22)) but their results highlighted clearly a lack of reliability in the computational predictions. The latest blind test, held in 2007 (23) demonstrated significantly improved results, showing that great steps are being made towards an invaluable technology. 2. Computational methods Much of our understanding of the principles of close packing in molecular crystals was developed by Kitaigorodskii (24, 25) in the middle of last century. Qualitatively, the projections of a molecule fit into the hollows of its neighbours to minimize free volume in the crystal and the mutual orientation of molecules in a crystal is conditioned by the shortest distances between the atoms of adjacent molecules (24). The quantification of close packing started with the determination of volume increments for common functional groups (e.g. on average, a methyl group adds 23.5 A 3 to the volume per molecule of a crystal) and average intermolecular (or van der Waals) atomic radii in crystals. Using these parameters, the packing coefficient of the overwhelming majority of organic molecular crystals was found to fall in the range of (26). The description of molecules in crystals by their atomic radii and analysis of their packing in terms of atomic contacts naturally leads to an atom atom approach in the modelling of molecular crystals, where intermolecular interactions are described in terms of pairwise interactions between atoms. It is the suitability of the atomistic approach to computational methods that has led to a developing understanding of the energetics and, more recently, dynamics in molecular crystals. Thanks to increasing computing power available at relatively low cost, some tasks have become possible that were thought to be unmanageable in the early days of crystal

5 6 G.M. Day structure analysis. Particularly, while describing the many possibilities for close packing in molecular crystals, Kitaigorodskii (24) remarked that it would not, of course, be feasible to carry out one-by-one examination of all possible packings of molecules for unit cells of different symmetries and different dimensions. In a way, this task can now be performed on high performance, or even desktop, computers in studies aimed at predicting the crystal structure of a molecule. For most of the crystal structures in such studies, the examination is normally limited to a computation of some scoring function (usually the lattice energy) while a selection of the most promising structures is often investigated in more detail. The most common approach to the challenge of CSP has been to search for all possible crystal packings and calculate their lattice energies (3). The lowest energy structure (the global minimum) is assumed to be the most likely crystal structure and crystals with competitive energies are possible polymorphs. In this approach, kinetics of crystallization are ignored and the two major challenges are: (i) the search for all possible ways to pack the molecule in a crystal lattice (27 ) and (ii) accurately calculating the energies of these putative structures (28). Such calculations are becoming somewhat reliable for small molecules, at least in proposing the observed forms amongst a handful of possibilities (29 31). While there are many variations in methodology, the main steps in this approach are nearly always the same (Figure 1): building a molecular model from the known chemical bonding in the molecule; generating all possible ways of packing the molecule in a reasonable crystal structure and evaluating energies and properties of these putative structures to decide which are most likely. Each of these steps presents its own challenges and requires careful attention; a poor choice or implementation of methodology at any stage will seriously reduce the chances of locating all likely polymorphs. Specifics of the approaches currently being applied to CSP are described in the following sections The molecular model The starting point of energy or property calculations for a known crystal structure is the set of atomic coordinates, taken directly from that structure. Slight corrections might be made to account for errors in experimental determinations, such as normalization of bond lengths to hydrogen atoms in X-ray determined crystal structures (32). However, in a prediction study, we are interested in crystals with unknown structures no atomic coordinates are known, so our first step must be to build a three-dimensional model of the molecule. This is our building block for generating potential crystal structures. The most ambitious use of CSP is to perform the predictions in advance of even synthesizing the molecule. The computational results for a molecule might influence whether the time and expense is then invested in its synthesis. In such an application, the starting point for calculations could be as simple as the chemical diagram defining atom connectivity. Even where one crystal structure is known and CSP is part of a screen for other polymorphs, it is best practice to take a neutral starting point instead of the molecular geometry from the known crystal structure, so as not to bias calculations towards that known form. This molecular structure is normally the result of a geometry optimization of the isolated molecule. At least for small molecules, calculating the threedimensional molecular structure is fairly routine force fields can provide a starting point or quantum mechanics may be used for greater accuracy. The development of

6 Crystallography Reviews 7 Figure 1. Diagrammatic summary of the lattice energy minimization approach to CSP. Structures and data reported in (31). density functional theory (DFT) has reduced the cost of high quality electronic structure calculations and extended the size of molecule for which quantum mechanics is affordable. There are two main considerations at this stage: the number of stable conformations and, for a given conformation, the effect of packing forces on the molecular geometry. The equilibrium geometry of the molecule in a crystal is a balance of the forces within the molecule with the forces between molecules. These two influences are often on different scales the interactions between molecules are much weaker than the covalent bonds that hold molecules together. For this reason, the geometry of the isolated molecule is generally a good starting point for generating possible crystal structures. Large changes in bond lengths or angles generally require more energy than can be supplied by intermolecular forces; bond lengths in crystal and free-state conformations, determined experimentally or computationally, typically differ by hundredths of an A, while differences in bond angles are usually smaller than a few degrees (24, 33 36). The intramolecular parameters that are most significantly affected by the crystalline environment are torsion (dihedral)

7 8 G.M. Day angles usually rotations about single bonds, but also pyramidalization of atoms (e.g. the nitrogen atom in an amine group) or puckering of rings. Such changes in geometry are often possible at a small cost in conformational energy. The energy cost for such distortions, which is specific to the atoms involved and their exact bonding environment, can be calculated from a force field or quantum mechanics. The energy profile places limits on packing-induced distortions; conformational strain energy in crystals rarely exceeds 1 kcal mol 1 (4 kj mol 1 (37 )). For rotation about a single bond, this often corresponds to a possible rotation away from the equilibrium position. A search of the Cambridge Structural Database (CSD (38)) for molecules that are similar to that of interest can also highlight the range of likely conformations. Most of the development of CSP methodologies has focussed on small, fairly rigid molecules. Therefore, many software packages developed for the prediction of crystal structures constrain the molecule to being rigid, at least in the initial generation of trial structures (27). Even if the molecular structure is allowed to relax in a final lattice energy minimization, conformational energy barriers are rarely crossed at that stage and the molecule normally remains close to the initial chosen geometry. For this reason, if several stable conformations of the isolated molecule exist, then each distinct conformation must be treated separately a geometry for each conformation is optimized using a force field or quantum mechanical method and predictions must then be performed for each conformer (5, 6, 29). In many studies, the molecule is kept rigid throughout all steps of the CSP calculations, so we must recognize the influence of small changes in the assumed molecular structure on calculated structures and their energies. As an example, small differences in the assumed molecular geometry of paracetamol (acetaminophen) lead to large variations in the calculated energy difference between the two known polymorphs (39): 3.6 kj mol 1 with an ab initio optimized molecular geometry, and ranging from 1.0 to 7.2 kj mol 1 with molecular geometries taken from various structural determinations. CSP usually generates a multitude of possible crystal structures within a small energy range (sometimes 50þ in a 5 kj mol 1 window above the global minimum (29)), so changes in relative energies of a few kj mol 1 can rearrange the relative stabilities of a large number of the predicted crystal structures. The lattice energy minimization approach relies on getting these relative energies correct, so the choice of molecular geometry is a crucial step in a successful prediction. In the case of paracetamol, the ab initio (HF/6-31G**) optimized molecular structure led to successful predictions of the two common polymorphs as the two lowest energy structures (40). In many molecules, there are one or more soft degrees of freedom and a continuum of conformations is accessible perhaps 20 either side of the equilibrium for the rotation about a single bond. In such cases, the rigid molecule approach will not be able to describe all possible crystal structures. Such molecular flexibility affects both the generation of crystal structures and the refinement (energy minimization) of these trial structures the full range of conformations must be accounted for in the search and the conformation must be allowed to adjust to its environment in each individual crystal structure. The simplest solution to this problem is the molecular mechanics approach the intramolecular degrees of freedom are described by simple mathematical models, whose parameters are sometimes fitted to reproduce quantum mechanical energy calculations. A classic example of a soft molecular distortion is the twist angle between the phenyl rings in biphenyl. In the isolated molecule, the angle between the rings is approximately

8 Crystallography Reviews 9 45 (41), the molecule is planar in the room temperature crystal and only slightly twisted in the low-temperature incommensurate phases (42). For this molecule, predicting structures with a fixed gas phase optimized molecular structure would certainly fail. However, a simple description of the intramolecular potential can adequately describe both the isolated molecule and crystal phase conformations (43 45) Crystal structure generation Having decided on a starting molecular geometry, or a set of conformations, and possibly a force field to describe their flexibility, the next stage in the structure prediction is to explore the possible close packed arrays of this three-dimensional molecular model. In the early days of modelling molecular crystals, hypothetical alternatives to known crystal structures were sometimes derived from the packing of similar molecules (see Pertsin and Kitaigorodskii (26) and references therein). The development and implementation of algorithms for exploring all crystal packing possibilities of a molecule started in the mid-1980s with Dzyabchenko s (46) search for the low-energy crystal structures of benzene. Many more approaches have since been applied to the problem, starting in the early 1990s (e.g. (47 49)). Here, an attempt is made to overview the methods that are currently in use Size of the search space The generation of all plausible crystal structures means searching for all of the minima on the potential energy hypersurface, which is defined by the degrees of freedom that describe the geometry of the system. For crystal structures, these structural variables might be unit cell parameters or the positions of molecules in the unit cell. Such an energy surface could be sampled using a very simple approach of creating structures at points on a grid: M equally spaced values are chosen for each structural parameter. The M values for one variable must be combined with M values for each other variable so that a total of M N structures are generated, where N is the number of degrees of freedom. While the methods used in real studies are more advanced than such a grid-based approach, the complexity of the energy surface and the size of the search space clearly grow very rapidly with increasing dimensionality. The search for all possible crystal structures of a molecule is a challenge because of the large number of degrees of freedom. For a molecular crystal, there are six degrees of freedom in the unit cell dimensions (the three cell lengths a, b and c, and three angles, and ) and, assuming rigid molecules, a further six degrees of freedom for each molecule in the unit cell three positional and three orientational coordinates per molecule. When rotation about single bonds is possible, the packing can be strongly influenced by distortions of the molecular structure and some computational approaches relax the rigid molecule constraint. The conformational flexibility of such molecules can be treated by including a few variable torsion angles as searchable degrees of freedom, to be varied along with the unit cell parameters and molecular positions (see, e.g. (50)). Even with the dimensionality of the problem restricted by the rigid-body approximation, the search space is enormous; searching say 30 degrees of freedom (a unit cell containing four molecules) for all possible low-energy structures is an incredibly demanding task. Therefore, the problem is often simplified even further, thanks to the majority of organic molecules crystallizing in one of a small set of space groups. For example, 80% of homomolecular organic crystal structures have either one or a

9 10 G.M. Day fraction of a molecule in the asymmetric unit (Z 0 1) in one of the six space groups: P2 1 /c; P 1; P ; P2 1 ; C2/c and Pbca (51). Therefore, most of the possible crystal structures for a molecule can be found by generating structures in a limited set of space groups with one molecule in the asymmetric unit; crystals with Z are automatically considered, as molecules with internal symmetry are free to find special positions in the crystal structures (e.g. the calculations can place a centrosymmetric molecule on a crystallographic centre of inversion). Generating structures within a given space group is a much more manageable task because restrictions are imposed on the relationship between molecules in the unit cell only the position and orientation of the molecules in the asymmetric unit must be considered. The shape of the unit cell is also usually constrained by the space group. For example, the position and orientation of only one of the four molecules in a P2 1 /c, Z 0 ¼ 1 unit cell must be varied independently and two of the angles defining the unit cell are fixed at 90 : 10 degrees of freedom must be searched instead of 30 for a general unit cell containing four rigid molecules. The use of space group symmetry facilitates a search, but it means choosing a set of the most likely space groups in advance, so risks missing important structures in less common space groups molecular crystal structures are known in all 230 space groups, although for many of these only a handful of structures have been reported. Furthermore, the contents of the asymmetric unit are fixed when using space group symmetry in a crystal structure search. While most molecules crystallize with Z 0 1, about 10% of homomolecular organic crystals have more than one molecule in the asymmetric unit (Z (52)); 1 in 10 crystal structures would be missed in CSP unless searches are performed with several independent molecules in the asymmetric unit. Due to these risks, some approaches to generating crystal structures do not use symmetry to restrict the size of the search space. Instead, unconstrained unit cells are searched with a fixed total number of molecules in the unit cell, Z. Separate searches must then be performed for different Z. Such searches must use highly efficient algorithms (and powerful computers) for searching the many degrees of freedom. The space group symmetry of each predicted structure can then be identified at the end of the calculations; several programs are available to identify symmetry in crystals, for example PLATON (53) and the symmetry finder in Cerius2 (54) Algorithms for structure generation The difficulty of the search for all possible crystal structures will obviously increase with (a) less symmetry relating the molecules in the unit cell (Z 0 4 1); and (b) increasing flexibility of the molecules. However, even with symmetry constraints, Z 0 ¼ 1, and the rigid molecule assumption, a complete search of the energy surface is a huge task and many approaches have been applied to the problem. A thorough review by Verwer and Leusen (27) in 1998 examined and contrasted the methods available at the time. The performance of many of the available methods has also been evaluated in the four blind tests of CSP (20 23). The most straightforward methods use a brute-force approach for searching all possible crystal structures either systematically generating crystal structures on a grid of the multidimensional search space or generating a large number of crystals with random values for the parameters (cell lengths and angles, molecular positions and orientations). The CRYstal Structure CAlculation (CRYSCA) program, developed by Schmidt and co-workers (55, 56),

10 Crystallography Reviews 11 is one example of such a program trial structures are randomly generated and these are then energy minimized. The molecular packing analysis (MPA) program (57 ) performs a similar random search. Dzyabchenko s (50, 58) program (PMC) generates structures in a grid-based search and the program UPACK (14, 59) has capabilities for both grid searches and random generation of structures. UPACK was originally written to predict crystal structures for monosaccharides and has even been successfully applied to predicting structures of the hydrates of such flexible molecules (14) very challenging searches. In both the grid and random search, many structures will have unrealistic densities; to avoid wasting computing time on energy-minimizing structures with large voids or overlapping molecules, the programs often consider only two of the cell lengths as independent variables, generating the third from an assumed cell volume. For search problems with a fairly low dimensionality, the systematic grid search method seems slightly more effective than a random search (14 ). However, the size of a grid search can quickly become unmanageable with increasing degrees of freedom (e.g. when the asymmetric unit contains more than one molecule) and randomly generating structures might be the better approach for the higher dimensionality problems. However, for very complex problems, random searches can be unreliable at generating all low-energy structures, at least in a reasonable number of trials (van Eijck (14) found a limit of around 20 degrees of freedom with 5000 starting structures). Therefore, several variations on the random search have been developed to increase its efficiency, as well as adaptations that take advantage of parallel computing environments to significantly increase the number of trial structures that can be generated. A recent adaptation of the random search replaces the random number generator with a low discrepancy sequence; such quasi-random sequences of numbers avoid the clusters and gaps of points that are often seen in random sampling, ensuring a more uniform (and therefore efficient) coverage of the energy surface. These sequences aim to provide the most uniform sampling possible at every point in the sequence. One benefit of such methods is that, unlike a grid search, the final number of trial structures does not have to be decided in advance of starting the search; this offers advantages for making the full use of available computing power. The procedures of both Della Valle and co-workers (60), and Karamertzanis and Pantelides (61) take this approach, using a Sobol (62) sequence of quasi-random numbers to generate their starting structures. Another modification of the random search procedure is a Monte Carlo search where random moves on the potential energy surface are accepted or rejected based on a probability that is related to the change in energy associated with that move. One application of a Monte Carlo search applied to CSP is the simulated annealing algorithm (27) implemented in the Polymorph Predictor modules of the commercial programs Cerius2 (54) and Materials Studio (63). This algorithm evolved from the earlier work of Gdanitz and co-workers (48, 64, 65). At each step in this Monte Carlo search, random changes are made to the crystal s structural degrees of freedom, the change in energy (DE) is calculated and the step is accepted if its Boltzmann factor exp( DE/kT) is greater than a random number in the range 0 1 (i.e. the Metropolis algorithm (66)). The simulated annealing proceeds through a heating phase where the temperature is increased until any move is accepted, allowing all energy barriers on the potential energy surface to be crossed and the entire search space can be sampled. The simulation is then slowly cooled and, as the temperature decreases, fewer high energy steps are allowed and the search focusses in

11 12 G.M. Day on the low-energy areas. Each accepted step through the heating and cooling cycle is retained as a trial structure, to be later refined by lattice energy minimization. Polymorph Predictor uses crystal symmetry in the searches, so the search is performed separately in each space group. Due to the stochastic nature of the process, several repeats of the simulated annealing must be performed to ensure a complete sampling of packing possibilities (a common mistake is to rely on the structures generated in a single heating and cooling cycle) the convergence of the search can be assessed after each repeat of the temperature cycle. Unlike the pure random and grid searches, the set of trial structures resulting from simulated annealing depends on a model for the energy of the crystal; in the other methods, energy calculations are not necessary until the trial structures are refined during energy minimization. This energy dependence from the start allows the simulated annealing approach to preferentially sample areas of search space with relatively low energy, avoiding spending time on unfavourable, high-energy trial structures. This methodology has been successfully applied to some complex systems with many degrees of freedom, such as molecular salts (10) and structures with high Z 0 (67 ). A more recent Monte Carlo variation, the conformation-family Monte Carlo global optimization scheme, was originally developed to search the conformation space of proteins (68) and its application has been extended to molecular crystals in the program CRYSTALG (69). Conformation-family Monte Carlo differs from a classical Monte Carlo search by making steps between families, instead of between single structures. Here a family is a set of numerically different representations of a single structure; these are identical structures in different settings (in the same way that identical crystal structures can be described in space group settings P2 1 /c and P2 1 /n). Keeping such families of identical structures leads to a wider coverage of the search space and the algorithm is efficient enough to perform well without the use of space group symmetry. In other respects, the search is similar to the Polymorph Predictor simulated annealing, except that the sampling is performed at a constant temperature. Polymorph Predictor and CRYSTALG achieve efficient sampling of all important (low energy) parts of the energy surface through special techniques of guiding the random steps between structures. A different approach is to distort the energy surface itself to create a more easily searched problem. Such a procedure, termed self-consistent basinto-deformed-basin mapping, temporarily smoothes the potential energy surface to a surface with fewer minima. This deformed surface is easily searched and the small numbers of minima are traced back to the deepest minima on the original potential energy surface by removing the deformation in a stepwise fashion. The best structures from one step are carried forward and energy minimized on the new energy surface, with extra structures being generated by random perturbations of all parameters. This procedure has been implemented for crystal structure generation using either a diffusion equation or distance scaling method for the smoothing transformation (70). The methods proved to be very efficient, allowing the experimental crystal structures of benzene and S 6 to be located with no use of space group symmetry the searches were performed with up to eight independent molecules in the unit cell (51 degrees of freedom (70)). An alternative to the grid-based, random and Monte Carlo approaches is the very common global minimization methodology called a genetic algorithm, which is based on the ideas of biological evolution. The descriptors of the crystal structure (lattice dimensions and molecular positions) are stored in genomes, which undergo operations, such as crossover (mating), mutation and selection as a population of trial crystals is evolved towards the best set of structures. The modified genetic algorithm for crystals

12 Crystallography Reviews 13 and cluster structures (MGAC (71 75)) program employs such a method to locate the lowest energy crystal structures. MGAC treats the molecular positions, orientations, flexible dihedral angles and unit cell angles as parameters. Cell lengths are then chosen to define the smallest asymmetric unit that encloses the molecules. The resulting crystal is energy minimized and the genetic algorithm operations (crossover, mutation and selection) are applied to these locally minimized structures. An attractive feature of the genetic algorithm approach is the flexibility in the choice of scoring and penalty functions (which control selection). MGAC seeks to minimize the lattice energy, while another method, developed by Motherwell (76) and implemented in the RANCEL program, avoids the use of energy altogether. Instead, RANCEL uses a scoring function derived from structural information in the CSD (38). The crystal structures of molecules that are similar (in molecular formula and chemical functional groups) to the target molecule are retrieved from the CSD and intermolecular atom atom distance frequency curves are derived for each pair of atom types. The genetic algorithm then searches for the crystal structures whose atom atom distances fit these curves as well as possible. Penalty functions can be also defined to punish structures without certain structural features, such as common hydrogen-bonding motifs. The distance frequency curves seem to contain enough information about the three-dimensional packing around atoms that the correct crystal structure can sometimes be successfully predicted without recourse to any energy calculations (76). The method is particularly useful for modelling molecules containing atoms or functional groups that are considered difficult for traditional, energy-based modelling methods. In theory, such homology considerations could be combined with energy calculations in the selection of structures during a genetic algorithm search. The MOLPAK software (49) describes crystal structures by a coordination geometry around a central molecule, instead of by their unit cell dimensions and space group. The most common coordination geometries (consisting of the 14 nearest neighbours to the central molecule) have been derived from the CSD and each of these is searched separately for the densest possible packings by systematically varying the orientation of the central molecule. At each orientation of the central molecule, a line of molecules is brought into close contact, lines are then packed together to form a two-dimensional grid, and these grids are then packed along the third dimension. The close packing between molecules is controlled by a repulsive interaction between atoms, and molecules are brought together until a threshold repulsive energy is reached. The densest crystals resulting from the search are then refined through energy minimization using a more complete energy model. One of the most recently developed methods employs molecular dynamics (MD) simulations to explore packing space. MD, where atoms follow Newton s equations of motion, would normally be inefficient at sampling all possible minima on the energy surface, because high-energy barriers between structures tend to confine the simulation to the neighbourhood of one crystal structure for long simulation times. The modified MD method, which has been named metadynamics (77 ), forces a quicker evolution of the structure by dynamically adding a biasing potential to the energy surface which guides the simulation away from all structures that have already been sampled. Because of the MD basis of the search, a metadynamics simulation samples the free energy surface. The free energy surface tends to be smoother than the lattice energy surface that is sampled by other temperature-free methods because low-energy barriers between related structures are overcome. Any minimum on the lattice energy surface whose potential well is

13 14 G.M. Day shallower than about k B T will not be a stable structure at that temperature, so that a search for stable structures on the free energy surface will not locate these structures and leads to a smaller number of candidate structures than lattice energy-based methods. In the case of benzene, the result is that all known polymorphs are found as free energy minima, and no other unobserved structures remain (78). This contrasts with lattice energy searches that locate many possible crystal structures of benzene (79) and indicates that the many unobserved predicted structures lie in shallow lattice energy minima. These results are promising and show that MD simulations are an effective method of eliminating many of the unobservable structures that result from lattice energy searches. However, the effect of metadynamics on smoothing the energy surface is less effective for a system with strong hydrogen bonds and larger energy barriers between local minima on the lattice energy surface: for fluorouracil, the majority of crystal structures resulting from a lattice energy search are stable in a MD simulation at ambient conditions and only about 25% of the many lattice energy minima are eliminated by considering the free energy surface (80). Furthermore, the metadynamics simulations of fluorouracil only sampled crystal structures with the same hydrogen bond motif as the structure that was used as a starting point of the simulation. Overall, the metadynamics method could be useful for studying the stability of different structures and phase transitions between closely related structures, but does not seem to be an effective approach to producing all possible crystal structures of molecules that form strong intermolecular interactions. All the above methods assume three-dimensional translational symmetry from the start, varying the unit cell dimensions and molecular positions simultaneously. A different class of methods starts by searching for stable clusters of molecules; translational symmetry is then applied to the most promising of these nuclei to build trial crystal structures. The rationale for these methods is that certain strong interactions are crucial to the development of the crystal and that these interactions can easily be determined in small groups of molecules before worrying about how these will pack in the crystal lattice. One of the first CSP programs, Gavezzotti s (47, 81) PROMET, is based on this concept small clusters are built from the molecular structure using the most common symmetry operators in organic crystals the inversion centre, translation, 2 1 screw and glide. Translational symmetry is then imposed in one dimension, creating a string of molecules. If the string is stable, translational symmetry is imposed in a second direction to form layers, and the structure is finally expanded into a three-dimensional crystal. Many trial crystal structures can usually be generated from each cluster by systematically varying the three translation vectors (though one or more cell vectors are often fixed by one of the symmetry operators used to build the original cluster). A similar approach is taken to generating crystals in Hofmann s (82, 83) FlexCryst program. However, unlike the atom atom potentials that are used to judge the stability of clusters in PROMET, FlexCryst uses group-based interactions (e.g. phenyl ring phenyl ring, methyl phenyl or hydrogen bonds) in the initial search for clusters. Potentials that are trained by data mining (82, 84) are used to refine and judge the final crystals. A unique feature of FlexCryst is that the program discretizes configuration space all interaction sites are moved to a point on a grid, allowing the scoring function to be tabulated and simply looked up for each interaction. This approach leads to significantly faster computation times than methods that require evaluation of potential functions for all atom atom pairs. However, this comes at the expense of sacrificing accuracy in the final calculated structures.

14 Crystallography Reviews 15 In short, a host of methodologies has been thrown at the task of crystal structure generation this review summarizes many, but certainly not all, approaches that have been applied to this problem. Few head-to-head comparisons of the programs have been made to assess their relative strengths. Even in the CSP blind tests (20 23), whose results are discussed in a later section, each participant applied different computational resources to the project, so the search algorithms are not compared on a level playing field. As the blind tests continue, certain methods may emerge as the most reliable. For now, the choice of search algorithm depends on many factors: personal preference and experience; cost (in the case of commercial software); ease of implementation versus complexity of the problem; suitability to the computing setup of the research group; and the final aim of the calculations. An expensive and thorough search method would be preferred for polymorph screening applications, while faster approaches would be better suited in the context of testing hypotheses in crystal engineering and the design of targeted structures. Progress is being made in developing more efficient approaches that allow more and more difficult problems to be addressed multi-component crystals and highly flexible molecules. Furthermore, recent search algorithms are taking advantage of distributed and parallel computing, allowing energy minimization of trial crystal structures (61). These very extensive searches should give a complete coverage of the search space, preventing any structure from being overlooked. Furthermore, such computing power allows more space groups (for search methods that rely on a pre-defined space group symmetry) or more values of Z (for search methods that treat all molecules in the unit cell independently) to be considered, so that crystal structures with less common symmetry need no longer be problematic Clustering of structures The structure generation phase of CSP calculations typically creates trial structures and these structures must usually be relaxed to minimize the energy or cost function before being analysed. Therefore, clustering algorithms are used at the final or intermediate stages to remove identical structures and save computing (and human) time. The clustering method is an important part of the process it must be both quick and discriminating so that identical structures can be identified rapidly, but distinct structures are not erroneously discarded. The commonly used methods include comparison of: reduced cells (60); standardized unit cells, molecular positions and orientations (85, 86); the geometries of close atom atom contacts (87 ); and distributions of inter-atomic separations (27 ) Lattice energy calculations There is a wide choice of methods for generating a set of trial crystal structures from the molecular model and most of these rely on energy calculations at some stage to guide the crystal structure generation (e.g. during a Monte Carlo simulated annealing or genetic algorithm search), for refinement of the trial structures through energy minimization, and usually for the final ranking of the possible structures. After the final energy minimization, the results of such a search are often summarized as a plot of lattice energy against some property of the crystal structure (often the density, volume per molecule or packing coefficient), as at the bottom of Figure 1. The set of structures, and their energies, is often referred to as the crystal energy landscape. Several types of landscape are possible, as

15 16 G.M. Day Figure 2. Possible outcomes of a CSP. Each point on the graph represents a distinct crystal structure. The large square data points represent the experimentally observed crystal structures. The same energy scale is used in all three plots. Results taken from the exp-6 þ multipoles calculations in (31). Figure 3. Cumulative number of computer-generated crystal structures within DE of the global minimum in lattice energy for three small organic molecules. Data taken from (31). illustrated in Figure 2, using results taken from (31). In the simplest case, one crystal structure is thermodynamically favoured and separated from the rest of the possible structures by a large energy gap (Figure 2a). More often, many possible structures are located in a small energy window above the global minimum (Figure 2b and c). In all three of these searches shown in Figure 2, one of the predicted structures (red squares in Figure 2) does correspond to the experimentally observed crystal structure. A cumulative density of states, counting the number of distinct crystal structures from the global minimum in lattice energy up to any DE (Figure 3), highlights the differences in the results for these three molecules. The first of these outcomes (Figure 2a) is an ideal situation for prediction by lattice energy minimization. One structure is clearly the thermodynamically most stable, so that small errors and approximations in the model potential should be unimportant compared to the energy gap separating this structure from all other possibilities. Furthermore, kinetic influences during crystal nucleation and growth would have to be very strong to produce one of the much higher energy structures. Indeed, this lowest energy structure (red square in Figure 2a) does correspond to the known crystal structure for this molecule.

16 Crystallography Reviews 17 However, after studies of many model systems, we now recognize this type of situation as atypical; usually, many nearly equi-energetic possibilities are found in a structure search (e.g. 100þ different crystal structures within about 6 kj mol 1 of the global minimum, dashed and dashed dotted curves in Figure 3). This observation of multiple low-energy minima is a consistent problem in CSP many more structures are found in a reasonable energy range than we would ever find as real polymorphs. As a consequence, the energy differences between possible structures are very small and minor changes to the way the energy is calculated could lead to significant reordering of the predicted structures. For one of these examples (Figure 2b), the four lowest energy predicted crystal structures are all within 1 kj mol 1 and the known crystal structure does correspond to one of these. Energy differences on the order of 1 kj mol 1 can come and go in lattice energy calculations, due to small changes in parameters, numerical considerations (e.g. tolerances used for convergence of the minimization) or details of the algorithm used to evaluate the lattice energies (e.g. summation cut-offs). Therefore, the energy differences between the observed structure and the two lower energy structures are smaller than the combination of errors and approximations in the energy calculations, and it is possible that the observed structure is in fact the thermodynamically most stable. However, in the third example (Figure 2c), there are many predicted structures with lower energies than the known crystal structure, by up to about 4 kj mol 1 (a small amount of energy by many standards, but large in the context of CSP). This type of result makes one suspect that kinetic factors are at work; the observed crystal structure is thermodynamically metastable, but forms in preference to its competitors during nucleation and/or growth of the crystal. Such observations are important for us to develop an understanding of crystallization and polymorphism, but to be confident in the conclusions drawn from a lattice energy search, we must use reliable models to evaluate the relative energies. There are many choices for how to calculate lattice energies of molecular organic crystals. The choice of model for a given molecule is always a balance between accuracy and simplicity the ease of use and implementation, as well as computational expense. Typically, thousands of structures must be energy minimized during a prediction and the required accuracy in relative energies is usually very high. In fact, the goal of CSP has been the main motivation for the development of accurate models for intermolecular interactions in crystals; the most common models for energy calculations and advances in this area have been reviewed several times (see, e.g. (28, 88, 89)) Atomistic methods The lattice energy of a crystal structure can be conceptually divided into the interactions between molecules and the conformational (intramolecular) energies of the molecules. When the molecular geometry is kept rigid throughout the calculations, the conformational energy is a constant for all of the predicted structures, so only the intermolecular energy must be evaluated. The total intermolecular energy in an assembly of molecules is usually approximated as a sum of pairwise potentials, U MN U latt ¼ XNmol U MN M5N ignoring all many-body contributions to the lattice energy. ð1þ

17 18 G.M. Day The most commonly used approach in lattice energy studies of molecular crystals is the atom atom potential method for the pairwise potentials, where the interaction between molecules M and N is expressed as a sum over atom atom interactions: U MN ¼ X U ik : ð2þ i2m,k2n Here, the sum is over all pairs of atoms i in molecule M and k in molecule N. Each pair of atoms is assigned a potential; the functional form of this atom atom interaction potential is usually isotropic (i.e. spherically symmetric) and typically consists of van der Waals (i.e. repulsion and dispersion) and electrostatic terms. There are two commonly used forms for the repulsion and dispersion interactions: the n-6 or Lennard Jones model U ik,repulsion dispersion ¼ A R n ik C R 6 ik, where n is most often chosen as 12, and the exp-6 or Buckingham model U ik,repulsion dispersion ¼ A expð B R ik Þ C R 6 ik : ð4þ R ik is the separation between atoms i (of type ) and k (of type ) and the models depend on the parameters A, B and C to describe the interaction between each type of atom pair. These atom types are normally chosen according to atomic number and sometimes further classified by hybridization or bonding environment (e.g. aliphatic vs. aromatic carbon atoms). The n-6 model s advantage is in computational convenience and speed, while the exponential form for the repulsive wall at short contact distances is theoretically better justified (90). However, the success of these simple models in CSP seems more dependent on parametrization of the model than the exact functional form (91). For the most common elements (C, N, O, H), there is an abundance of parameter sets, differing in the process used to derive values for the As, Bs and Cs. These parameters are usually derived empirically, to give the best possible fit to known structures or properties. Therefore, an important consideration in choosing a parameter set is the similarity of the molecule being studied to the molecules that were used in the parametrization. The fitting procedure usually involves varying the parameters simultaneously to reproduce the structures and properties (e.g. sublimation enthalpies, phonon frequencies) of known crystals much of the early development of this empirical atom atom approach and its application to the organic molecular solid state has been reviewed by Pertsin and Kitaigorodskii (26). Recently, the results of CSP searches have sometimes been incorporated in the parametrization process; the parameters are refined not only to reproduce structures and properties of known crystals, but also to ensure that the known crystal structure is as close as possible to the global minimum of a set of predicted structures for that molecule (92 94). While such empirically derived atom atom models are very commonly used and do lead to successful prediction results, there are several limitations with using empirically parametrized models in crystal structure modelling. A large set of known crystal structures is needed to sample a sufficient range of possible types of interactions at various interatomic separations. There are often insufficient data to fit reliable parameters for less common atom types and functional groups. Furthermore, the model potential should represent the energy of a static, temperature-less configuration of molecules, so that a lattice energy minimized crystal structure approximates the true structure at T ¼ 0K. ð3þ

18 Crystallography Reviews 19 Temperature should only enter the calculations through dynamical simulations. However, the structural data used in empirical parametrization are often determined at a range of temperatures. Ideally, only low-temperature crystal structures are used. However, when too few low-temperature structures containing the atom types of interest exist, it is often necessary to utilize room temperature structures. Thus, calculations using empirical model potentials refer to an ill-defined temperature a clear problem where the relative energies of polymorphs and the properties of crystal structures vary significantly with temperature. Due to the problems with empirical atom atom models, methods are being developed to systematically derive atom atom model potentials for organic molecular crystals from quantum mechanical calculations on isolated molecules and small clusters (95 104). Non-empirical models can be developed specifically for a given molecule or of a small family of molecules and quantum mechanical calculations can be used to probe many more interaction orientations and separations than are sampled in the known crystal structures used for empirical parametrization. Such non-empirically parametrized models usually lead to superior modelling of crystal structures (97, 101), properties (102) and more reliable lattice energy predictions of crystal structures than empirically parametrized model potentials (98, 100, 103, ). Furthermore, if the parameters are fitted to reproduce the pure intermolecular potential energy, then the resulting model potential is a valid starting point for dynamical simulations (discussed in Section 2.4), such as lattice dynamics or MD to investigate the temperature dependence of crystal structures, energies and properties. Another attractive feature of non-empirically derived models is that sufficient relative orientations of atoms in molecules can be sampled to allow the orientational dependence of van der Waals interactions to be easily assessed and incorporated in the model. Anisotropy in atom atom repulsion is known to be particularly accentuated in interactions involving certain atoms. For example, the polar flattening of halogen atoms (108) is sufficiently important to the crystal packing, thermal expansion and vibrational properties of chlorobenzenes that the isotropic atom atom form cannot adequately describe these crystals (102, 109) Electrostatic models Besides the repulsion dispersion model described above, almost all model potentials include a description of the electrostatic interactions between molecules. The simplest model assigns charges to each atom and the most appropriate schemes for partitioning the atomic charges choose these to reproduce the electrostatic potential (ESP) around the molecule (110). Along with an isotropic n-6 or exp-6 model, the inter-atomic interaction energy still depends only on the separation of the atoms. While a set of atomic point charges can give a reasonable approximation to the molecular ESP, we know that the electron distribution around atoms in molecules is not spherically symmetric the rearrangement of valence electrons upon bond formation results in localized lone pairs and -electron density, which cannot be adequately described by the isotropic atom model. This has been recognized for some time and several improvements on the atomic charge model have been used in modelling molecular crystals. The simplest enhancement is to define off-nuclear charge sites, at bond centres and chemically sensible lone pair positions (111, 112), or at positions that are optimized to give the best fit to the molecular ESP (113). An even more elaborate representation of the molecular charge distribution consists of sets of multipoles

19 20 G.M. Day (i.e. charge þ dipole þ quadrupole, etc.) centred on each atomic site. These multipoles can be derived directly from a calculated molecular wavefunction (114, 115) or fitted to the ESP, in the same way as charges are usually derived (99, 116). Atomic multipoles offer an important improvement in reproducing the molecular ESP adding just dipoles and quadrupoles to the atomic charges typically halves root-mean-squared errors in the ESP around organic molecules (117 ). Unlike atomic charge models, such distributed multipole models have proven to be remarkably successful in predicting the hydrogen bond geometries in complexes of small molecules (118, 119). Distributed multipole models also show superior performance in reproducing the structures (32, 117 ), phonon frequencies (120) and mechanical properties (121) of molecular organic crystals. The use of anisotropic electrostatic and van der Waals models for modelling crystal structures is currently limited to a few specialized modelling packages, such as DMAREL (122) and its revised version, DMACRYS (123). The value and necessity of more accurate models than the isotropic atom atom model should increase the popularity, as well as the availability of such calculations. Extra terms in the model are sometimes included to account for the other contributions to the intermolecular interactions, such as polarization and higher order dispersion coefficients. Furthermore, flexible molecule models require additional terms to describe bond stretching, angle bending and torsions for rotations about bonds. The simplest approach is to include such intramolecular terms to the model potential via a molecular mechanics approach. At its simplest, the intramolecular potential might then take the form U bonding ¼ X bonds 1 2 K bðb b o Þ 2 þ X angles 1 2 K ð o Þ 2 þ XX ½K n cosðnþš: These three terms model bond stretching, angle bending and bond torsions; b o and o are equilibrium bond lengths and bond angles and the force constants K are often fitted to ab initio quantum mechanical calculations on isolated molecules. A major challenge in the development of energy calculations is the merging of accurate intermolecular models with highly accurate descriptions of the intramolecular energy. For example, the conformational dependence of atomic multipoles hampers their use for flexible molecules (117 ). One solution to the conformational dependence of the molecular electrostatic model is to recalculate the atom charges and multipoles as the molecular geometry adjusts to the crystal packing during lattice energy minimization (106, 124). It is also clear from many studies that the molecular mechanics description of intramolecular energies is not sufficiently accurate for the final energy ranking of crystal structures in CSP studies of flexible molecules, where the conformation can vary between the possible crystal structures. One approach to achieve sufficiently accurate total (inter þ intramolecular) energies for CSP of flexible molecules has been to couple atom atom descriptions of intermolecular interactions with quantum mechanical electronic structure theory molecular energy calculations, either for the final energy evaluation of crystal structures that have been lattice energy minimized using molecular mechanics intramolecular energy models (5, 125) or iteratively throughout the lattice energy minimization procedure (124). While this approach has been successful, the calculations normally require at least one quantum mechanical molecular energy calculation per crystal structure. The computational cost of such calculations is therefore much greater than for rigid molecules, and the scaling of computing cost with molecular size is an issue for extending studies to larger, flexible molecules. ð5þ

20 Crystallography Reviews Performance of lattice energy minimization for CSP The success of the global lattice energy minimization approach to CSP could be assessed by surveying published results to judge the rate of success and limitations of the general method. The formidable task of reviewing all published results of lattice energy minimization for CSP was performed in 2001 by Beyer et al. (3). While this survey does give a feel for the utility of global lattice energy minimization and a perspective on progress in the field up to about the year 2000, the variations in methodology and lack of details in many published studies prevent a real assessment of this approach to CSP. Another approach to assessing the approach is to perform CSP calculations on a large set of molecules using a consistent methodology. Such a study was performed on a set of 50 small, rigid organic molecules where each molecule was treated with exactly the same method (29, 31). The molecular structure for each of the 50 molecules was calculated using DFT calculations, trial crystal structures were generated using a simulated annealing approach and, because small molecules with limited flexibility were chosen for the study, the trial structures were then lattice energy minimized in the rigid-molecule approximation. Lattice energy calculations were performed using an empirically parametrized exp-6 repulsion dispersion model potential and two electrostatic models were tested: ESP fitted atomic charges and atomic multipoles from a distributed multipole analysis of the calculated wavefunction. The crystal structures were first generated and lattice energy minimized with the simpler isotropic energy model and their energies were re-evaluated using the multipoles model to investigate the sensitivity of lattice energy ranking to the electrostatic model. The first important results relate to the success of crystal structure generation, with the caveat that the chosen set of molecules was not a difficult test for the search algorithm because one of the criteria for the selection of molecules was that they crystallized in one of the nine most common space groups, with Z 0 1. Some of the 50 chosen molecules have known polymorphs and, in total, there were 62 known crystal structures that were considered as targets for the calculation. Fifty eight (94%) of the known crystal structures were easily located in the simulated annealing search, which was initially repeated four times in each space group. When found, the observed crystal structures are generally very well reproduced by the prediction, with unit cell parameters usually within 3 4% of those in the observed structure. We can conclude that this crystal structure search method works extremely well for these small rigid molecules. Of the remaining four known crystal structures that were not located in these initial searches, three were located after additional (up to 13) repeats of the simulated annealing temperature cycle. Only one of the 62 known crystal structures was not located and this failure was discovered to be a failure of the chosen energy model in reproducing that structure, rather than the search algorithm itself (29). These results demonstrate that the search problem is not a major obstacle to CSP, at least for crystal structures with Z 0 ¼ 1. Given that the search method locates the known crystal structures, the overall success of the predictions depends on how well lattice energy minimization ranks the true crystal structures amongst the lists of computer generated possibilities. Two measures of the success of the lattice energy predictions were used to summarize the results for these 50 molecules: DE, the energy difference between the experimentally observed crystal structure and the lowest energy unobserved crystal structure; and N lower, the number of unobserved crystal structures with lower energy than the observed structure. The distributions of DE are shown in Figure 4. The results were fairly encouraging,

21 22 G.M. Day Figure 4. (a) Distributions of relative lattice energies of crystal structures of 50 small molecules from lattice energy minimization with exp-6 þ atomic charge and exp-6 þ atomic multipoles model potentials. (b) Distributions of relative lattice energies for hydrogen bonding and nonhydrogen bonding molecules, from the exp-6 þ atomic multipoles calculations. Reprinted with permission from (31). Day, G.M.; Motherwell, W.D.S.; Jones, W. Beyond the Isotropic Atom Model in Crystal Structure Prediction of Rigid Molecules: Atomic Multipoles Versus Point Charges. Cryst. Growth Des. 2005, 5, Copyright 2005 American Chemical Society. Data taken from (29, 31). even using the simpler energy model that employed isotropic repulsion dispersion and electrostatic atom atom models. With this simple model, the experimentally observed crystal structures are found as either the global minimum energy structure or with DE kj mol 1 in 36% of cases. The highest DE for any of the molecules was 7.3 kj mol 1. These results were improved when the atomic partial charge model of electrostatic interactions was replaced by the more elaborate multipole expansions (charge, dipole, quadrupole, octupole and hexadecapole on each atom): the number of known crystal structures with DE kj mol 1 was increased to 50% and the worst DE was reduced to 5.1 kj mol 1 (Figure 4a). A third of the observed crystal structures for this set of molecules are perfectly predicted, that is lower in energy than any unobserved predicted structure. In 69% of cases, five or fewer unobserved structures are assigned lower lattice energies than the real crystal structure with the exp-6 þ multipoles model. With this energy model, a handful of the best structures from a global lattice energy search will normally contain the observed crystal structure. 1 These results of this study demonstrate that a lattice energy search is a useful starting point for CSP and the improvement in prediction success that was attained by improving the energy model should motivate the further development of model potentials, as well as improvements in the thermodynamic model (e.g. by including dynamic/vibrational contributions to the free energy). Further analysis of how DE was influenced by molecular properties also suggested differences between hydrogen-bonding and non-hydrogenbonding molecules (Figure 4b): non-hydrogen-bonding molecules are more often predicted at or near the global minimum in lattice energy than hydrogen-bonding molecules and almost all molecules whose known crystal structures are located with high DE are those with hydrogen-bonded functionality. The differences in results for hydrogen-bonding

22 Crystallography Reviews 23 and non-hydrogen-bonding molecules probably result from a combination of factors. There may be inadequacies using the energy models applied in these studies in accurately modelling hydrogen bond energies and the relative stabilities of crystal structures containing different arrangements of hydrogen bonds. The polar environments in crystals of hydrogen-bonded molecules will polarize the molecular charge density, strengthening electrostatic interactions between molecules. However, the effect can vary greatly depending on the relative arrangement of polar functional groups. This polarization effect, which is responsible for cooperative effects in extended hydrogen bond networks, cannot be described using the pairwise additive atom atom approach. Polarization models are now being developed to improve atom atom models for polar, hydrogen-bonding molecules (126, 127). Apart from deficiencies in the energy model, the results in Figure 4(b) also suggest that hydrogen-bonding molecules are more likely to be kinetically trapped in metastable crystal structures than molecules with no hydrogen-bonding capability. This is physically reasonable from the point of view of the energy landscape controlling nucleation and crystal growth from solution. Strong directional interactions lead to high barriers between minima on the potential energy surface. Therefore, strong intermolecular interactions and the resulting molecular arrangements that are preferred in solution and the early stages of crystal nucleation are likely to persist in the final crystal structure, even if that arrangement of strong interactions does not lead to the thermodynamic global minimum, because the barriers to rearrangement into the lowest energy crystal structure are too large. Molecules that only interact via weak interactions, on the other hand, are relatively free to explore phase space in search of the deepest minimum on their smoother lattice energy surface Flexible molecules. Much of the development of methods for CSP has focussed on fairly simple, rigid molecules. However, the development of accurate methods of treating both inter- and intramolecular energy differences between crystal structures has extended the scope of global lattice energy minimization methods to molecules with a limited amount of conformational flexibility. No large-scale evaluations have been performed for more flexible molecules in the way that has been done for rigid molecules. However, a number of individual studies seem to indicate that current methods have similar success rates for moderately flexible molecules as seen above for rigid molecules. A few examples of flexible molecules for which current methods have been successful are (Figure 5): glycol and glycerol, whose known crystal structures were predicted amongst the lowest energy calculated structures (106); aspirin, where the global lattice energy minimum corresponded to the stable polymorph (6) and another low-energy prediction was later discovered during an attempted co-crystallization (128); phenobarbital, where the global minimum in lattice energy corresponded to the thermodynamically stable polymorphic form (5); and piracetam, for which the structure of a new polymorph was predicted (129). Recently, several methods of refining the molecular geometry during lattice energy minimization were tested on a series of hydrophobic -amino acids with varying degrees of conformational flexibility (alanine, valine, leucine and isoleucine) and enantiopure and racemic crystal structures all corresponded to either the global minimum or one of the 2 5 lowest energy structures from lattice energy searches, depending on the details of the method (130). The results of these investigations demonstrate promising progress in the size and flexibility of molecules for which CSP methods can be successfully applied.

23 24 G.M. Day Figure 5. Flexible molecules studied by CSP methods Alternative methods for lattice energy calculations: solid-state DFT and semi-classical density sums The atom atom approach has greatly contributed to our understanding of the molecular solid state, but with the development of periodic (solid-state) DFT, quantum mechanical calculations on the periodic lattice are now feasible, though demanding on computational resources. Such calculations have the potential to solve many of the problems encountered with energy minimization using atom atom models. All electron density and nuclear positions are optimized together, so that the molecular conformation can naturally adjust to the crystalline environment, without having to worry about the balance of inter- and intramolecular terms in an atom atom force field. Furthermore, electrons can flow; so the electron distribution of a molecule does not need to be fixed from gas phase calculations. For example, the atomic charges (determined by Mulliken partitioning) in periodic DFT calculations on the polymorphs of glycine differ, on average, by 0.05 e from the corresponding atomic charges in the gas phase (131). This allowance for charge density

24 Crystallography Reviews 25 reorganization enables energy calculations to include cooperative effects which might, for example, stabilize one hydrogen-bonding motif over another. Naturally, periodic DFT calculations have been applied to the challenge of CSP. Rovira and Novoa (132) used periodic DFT calculations to re-explore the lowest energy acetic acid crystal structures that were generated with UPACK using an atom atom force field. They demonstrated that DFT calculated energies do predict the low-pressure acetic acid crystal structure closer in energy to the global minimum than the force field used in the initial search. However, the computational expense for even this small molecule prevented all computer-generated crystal structures from being fully optimized (including cell constants) in the DFT calculations. While their results demonstrated promise for the use of periodic DFT in a final evaluation of energy differences between predicted structures, the calculations were shown to lack an adequate description of the long-range attractive dispersion contribution to intermolecular interaction energies, requiring a correction to be added to the pure DFT energy to yield reasonable agreement with experimental interaction energies. More recently, Chisholm and co-workers (131) tested several flavours of DFT on the known and hypothetical polymorphs of glycine. DFT generally had problems reproducing the structures of the known polymorphs in this study and the errors were very sensitive to the choice of exchange-correlation functional the local density approximation (LDA) leads to a serious overbinding of the crystal structures, while generalized gradient functionals (e.g. PW91) tended to underbind the crystals. Byrd and co-workers (133) made similar observations; PW91 calculations underestimated densities of a set of known molecular crystals by 10 15%. This underbinding is a result of the poor description of dispersion interactions in DFT calculations. Despite these problems in reproducing the densities of the known crystal structures, the energy ranking of the real polymorphs in the predicted lists was found to be about as good as predictions using an empirical atom atom model potential. The LDA calculations get the correct order of stability of the polymorphs as well as assigning them lower energies than any of the other predicted structures. These results show promise for DFT calculations in CSP. However, glycine was chosen for this study as a system that is very well suited to solid-state DFT calculations the molecule is zwitterionic in the crystal and the lattice energy is dominated by electrostatic contributions, which are reproduced very accurately by the DFT methods. Results would undoubtedly deteriorate for more weakly bound van der Waals crystals, where dispersion interactions are more influential. To be generally useful for studying and predicting the crystal structures of organic molecules, DFT must overcome its poor treatment of the very important dispersion attraction between molecules. There has been significant effort in developing corrections to pure DFT to provide a better description of dispersion interactions. The most popular approach to date has been to augment the solid-state DFT energy calculation for a crystal with empirically derived C 6 /R 6 terms ( ). Neumann and Perrin (134) presented the first application of such dispersion-corrected DFT (DFT þ D) methods to CSP of molecular organic crystals, using their method to lattice energy minimize the lowest energy predicted crystal structures that were generated using an atom atom force field and simulated annealing method for six small molecules (acetylene, ethylene, ethane, methanol, acetic acid and urea). In this method, C 6 coefficients were fitted to reproduce molecular C 6 coefficients and this C 6 /R 6 term is damped at short distances, with the parameters of the damping function empirically fitted to reproduce unit cell parameters of a set of low-temperature molecular organic crystal structures.

25 26 G.M. Day The method was shown to reproduce the low-temperature experimentally observed crystal structures very accurately and finds the observed crystal structure as the lowest energy of the predictions for all but ethylene, where the observed structure corresponds to the second lowest energy predicted structure. The method has since then been further developed by fitting an atom atom force field (including both non-bonded and bonding terms) specifically for each molecule to reproduce lattice energies and ESPs that are calculated using the DFT þ D method (138). This Tailor-made force field (TMFF) is used during the crystal structure generation procedure and a set of the resulting lowest energy crystal structures are re-optimized using DFT þ D. The purpose of the TMFF is that it is sufficiently computationally efficient to be used during crystal structure generation (where the energies of a large number of structures are assessed) and, that it is sufficiently accurate to guarantee that a relatively small set of the TMFF-generated crystal structures will include the global minimum structure on the DFT þ D lattice energy surface. The TMFF/DFT þ D method for CSP has been tested in the 2007 blind test of CSP (23), whose results are discussed in Section 3. A method falling somewhere between the atom atom approach and all-electron quantum mechanics calculations is Gavezzotti s ( ) semi-classical density sums (SCDS) or Pixel method. In SCDS calculations, a molecule is represented by a large number (10 4 ) of interaction sites representing the quantum mechanically calculated monomer electron density, instead of just the 10 1 nuclear positions used by atom atom methods. Relatively, few parameters are used to calculate interaction energies between molecules electron pixels interact to give the electrostatic, exchange-repulsion, dispersion and polarization energies (139, 140). Because each energy contribution is derived separately from the interactions of monomer charge densities using theoretically sound models, the Pixel method provides a better partitioning of the total lattice energy than empirical potentials. The individual contributions to the lattice energy can therefore be meaningfully analysed. The method shows good promise SCDS was tested for CSP by recalculating the energies of the putative crystal structures of a set of six small molecules (141) that were determined using atom atom force field methods. The SCDS Pixel calculations gives as good or better results than the atom atom methods in the majority of cases, ranking the observed crystal structures of three of the six test molecules, naphthalene, naphthoquinone and pyridine, as the lowest energy of the computer-generated crystal structures. The periodic electronic structure methods, such as DFT, as well as the SCDS Pixel approaches are promising for recalculating the energies of lists of predicted crystal structures from atom atom calculations. However, the computational expense of these approaches does limit their applicability in calculations beyond lattice energy minimization, such as dynamical methods discussed in the next section. To give a rough idea, singlepoint energy calculations on a typical crystal of a small (about 10 atoms) organic molecule using exp-6 þ atomic charges, exp-6 þ multipoles (including terms up to hexadecapole), SCDS Pixel and DFT take on the order of 0.1 s, 1 s, 15 min and 1 h, respectively, on a modern workstation. Energy minimizations typically require such energy calculations; meaningful MD requires, at minimum, tens or hundreds of thousands Beyond lattice energy calculations Part of the motivation for developing CSP methods is to gain an understanding of, and to anticipate, polymorphism. Yet, the very existence of polymorphs tells us that the methods

26 Crystallography Reviews 27 described so far cannot provide a complete understanding of, nor can they always predict, polymorphs. Even with a model potential that yields perfect energy differences between all possible crystal structures, lattice energy minimization can only be a first approximation and we must look beyond such calculations to fully understand polymorphism. A pair of polymorphs can have one of two types of relationship monotropic or enantiotropic. Monotropic polymorphs maintain their order of stability at all temperatures, while the free energies of enantiotropic polymorphs cross at some transition temperature. Both types of system tell us something about the limitations of predicting crystal structures from global minimization of the lattice energy of static crystal models. With enantiotropic polymorphs, each form is stable across a certain temperature range; below a transition temperature, the low-temperature form is the thermodynamically favoured polymorph, while above the transition temperature thermodynamics favours the other polymorph the high temperature form. This behaviour tells us that temperature has an important effect on the relative stability of polymorphs. Lattice energy calculations are temperature-free, so they could never predict such phase transitions. The temperature of a lattice energy calculation is formally 0 K (and furthermore ignores zero-point vibrations of the molecules), so the relative lattice energies should be a representation of the low-temperature order of stability of the predicted crystal structures. In a monotropic pair of polymorphs, one polymorph is thermodynamically metastable at all temperatures below their melting point (i.e. the transition temperature between the polymorphs lies above their melting point). The fact that such systems are known (and are not uncommon) demonstrates that crystal structures other than that with the lowest possible free energy exist kinetics can have a role in determining which structure is formed and can lead to a higher energy crystal structure if the most easily formed or fastest growing nuclei do not lead to the global minimum crystal structure. Some progress has been made in improving the thermodynamic model to recover the temperature dependence of the relative energies (Section 2.4.1). However, the influence of kinetics on the final crystal structure is a much greater problem for molecular simulation. Ideally, we would model the crystallization process from the nucleation event to growth of the bulk crystal, all in the presence of the solvent and any surfaces that could promote heterogeneous nucleation. Unfortunately, the time scales and size of the system that must be simulated are far beyond computational capabilities. However, much simpler kinetic considerations have formed a part of some CSP studies. One approach has been to test the usefulness of simple crystal growth calculations in highlighting kinetic preferences for certain crystal structures (Section 2.4.2), while others have used analyses of the many known crystal structures in the CSD to identify recurring structural features that might indicate a kinetic preference for their formation over energetically comparable alternatives (Section 2.4.3) Dynamic contributions to the energy Phase transitions between enantiotropic pairs of polymorphs are evidence that lattice energy modelling is not always sufficient to elucidate the relative stability of putative crystal structures. The thermodynamically favoured polymorph is the structure with the lowest free energy; under normal laboratory conditions of constant pressure, we consider the Gibbs free energy differences, DG, between structures: DG ¼ DU þ PDV TDS ð6þ

27 28 G.M. Day where U is the internal energy, V is the volume and S is the entropy, and P and T are the pressure and temperature. We get an idea of the importance of each contribution to the free energy differences between structures, and so how important they will be in CSP, from what we know about existing polymorphs. While densities do vary between polymorphs, the PDV contribution to energy differences is negligible at ambient conditions. PDV favours the highest density structures and only becomes important at extreme pressures. For example, the high-pressure structures of benzene have been considered by several groups (72, 79, 143, 144) and, more recently, pressure was used to access some of the predicted crystal structures of halophenols (67 ). In both these cases, the low-pressure form has one of the lowest U of all predictions, while the high-pressure form is one of the densest predicted structures. Temperature, on the other hand, can have an important influence, even at ambient conditions. The difference in the heat capacities (and, thus, thermal contributions to U) between polymorphs is normally very small (26), so the internal energy differences (DU) between polymorphs are only weakly temperature dependent. However, the entropy is dominated by contributions from the low-energy lattice vibrations, which are strongly influenced by differences in crystal packing. These low-energy vibrations contribute most to the entropy of the crystal, so that entropy differences between polymorphs can be significant. Gavezzotti and Filippini (145) calculated vibrational contributions to entropy differences between 204 pairs of known polymorphs of organic molecules at T ¼ 300 K. The study found that DS can approach 15 J mol 1 K 1 (TDS 5 kj mol 1 at room temperature), but is usually much smaller; DS 5 6 J mol 1 K 1 (TDS kj mol 1 at room temperature) for about 80% of polymorphic pairs (Figure 6). While the entropy differences do seem generally quite small, TDS might often be greater than the lattice energy differences between predicted crystal structures (see Figures 2 4). Despite their probable importance, the vibrational contributions to the relative free energies have rarely been considered in CSP studies. One reason for ignoring the vibrational energy is the computational expense the calculations require an evaluation of the second derivatives (curvature) of the lattice energy surface with respect to atomic or Figure 6. Calculated differences in intermolecular vibrational entropy between known polymorphs. Data taken from (145).

28 Crystallography Reviews 29 molecular positions and such calculations would have to be performed for each predicted crystal structure. However, with modern computing power, calculating the second derivatives of the lattice energy is no longer a problem, even for the many structures typically generated in crystal structure searches. In fact, the lattice energy second derivatives should be calculated for each low-energy structure to ensure that a proper lattice energy minimum has been located by the energy minimization algorithm (29, 40, 60, 123) energy minimizing routines sometimes locate saddle points on the lattice energy surface, especially when space group symmetry is imposed during minimization. Such saddle points do not correspond to observable crystal structures and can only be identified through analysis of the second derivatives of the lattice energy. Another reason for ignoring vibrational energy contributions is that model potentials are often not considered to be accurate enough if we are not confident in the accuracy of the small lattice energy differences, is it worth the computational expense of correcting for missing thermal effects? Finally, errors in the actual calculated frequencies are a concern. We expect the greatest differences between polymorphs to be in the lattice vibrations, the rigid-body motions of molecules with respect to each other, as opposed to angle bending and bond stretching within the molecules. Due to the weak intermolecular interactions involved, as well as the large masses, the frequencies of lattice vibrations normally fall in the narrow range from about 10 to 150 cm 1. Errors of say cm 1, which seem typical when using simple model potentials (120, 146) are therefore relatively large and could lead to significant uncertainties in the computed vibrational partition function and hence the free energy. However, as the quality of model potentials is increasing, the accuracy of relative lattice energies and calculated frequencies is improving (31, 102, 120), and the neglect of vibrational energy contributions is now certainly as important as other errors in the calculations. As such, recent studies have examined the influence of vibrational energy contributions in CSP (21, 22, ) some of these results are discussed in the following sections Lattice dynamics. The most straightforward approach to modelling the vibrations in crystals is through the methods of lattice dynamics. The theory, which is based on a normal mode analysis of the lattice energy surface, is well-developed for atomic and ionic crystals (151) and is readily extended to crystals of rigid molecules ( ). Much of the early application of lattice dynamics to molecular organic crystals was summarized by Pertsin and Kitaigorodskii (26) and the method is useful for investigating the pressure temperature phase diagrams of simple molecular crystals (26, ). As with most crystal properties, calculated vibrational frequencies are sensitive to the form and parametrization of the model potential. The motions of molecules in crystals are largely determined by the close contacts between neighbouring molecules, so the model for short range repulsive interactions is crucial; small changes to, say, the exp-6 parameters often result in variations of up to 10% (5 10 cm 1 ) in calculated frequencies (120). For molecules with polar functional groups, the frequencies of certain motions are also especially dependent on the electrostatic model. Atomic partial charge models often underestimate the energy required for hydrogen bond stretching and bending. A more detailed description of the intermolecular electrostatic interactions (such as atomic multipoles) stabilizes these modes, generally improving the agreement between experimental observations and the calculations. The electrostatic model seems to be even more important for describing vibrations that distort weaker interactions with polar groups or

29 30 G.M. Day -electron density in aromatic rings (120). In fact, for even moderately polar molecules, atomic charge models can be inadequate for modelling the vibrations (164, 165) and a more accurate description is crucial. Here, we are interested in the contributions to the crystal free energy, which depend on the overall shape of the vibrational mode spectrum and contribute to the free energy through the sum of the frequencies,!, in the zero-point energy: ZPE vib ¼ðh=2Þ X q,i! i ðqþ, ð7þ and the logarithm of the vibrational partition function, ln Q vib : ln Q vib ¼ X q,i ½1 expð h! i ðqþ=ktþš, ð8þ where h is Planck s constant divided by 2, k is Boltzmann s constant and T is the temperature, and the summation extends over all wave vectors q and phonon branches i. For small rigid organic molecules, calculations with empirically derived model potentials can approximate the zero-point energies to within about 20% and ln Q vib to within about 10% (120). Such errors typically correspond to about 0.5 kj mol 1 in the zero-point energy and (at room temperature) 1 kj mol 1 in the entropic contribution to the free energy The influence of vibrational energy contributions in CSP. Vibrational energy contributions to the relative energies of polymorphs have been considered in several studies of CSP (105, ), so we can examine their influence on the energies of the lists of predicted crystal structures. The behaviour of hydrocarbon crystal structures is the most easily rationalized one. The crystals lattice energies are dominated by the repulsion and dispersion interactions, so that generally the denser structures are lowest in lattice energy, U latt (147). However, lower density structures allow more freedom for molecular motions and this greater freedom is reflected in lower frequency vibrations and greater vibrational entropy. Therefore, DS normally favours low density, higher lattice energy crystal structures. There is a resulting balance between the enthalpy and entropy, with increasing importance for TDS as the temperature is raised; this is supported by the observation that the low-temperature polymorph in an enantiotropic pair is usually the higher density structure (166). Dunitz et al. (147 ) found that TDS is often comparable to DU latt in the predicted structures of hydrocarbons and, because DS usually favours structures with higher U latt, the free energy differences between predicted crystal structures are even smaller than the lattice energy differences the energy spacing between structures is compressed and the multiple minima problem (see Section 2.3) is compounded. The entropy effect is less straightforward for molecules that interact through specific, directional interactions (such as the hydrogen bond). By optimizing such interactions, molecules can sacrifice some close packing, so that the structure with the best lattice energy is no longer necessarily one of those with highest density (e.g. see Figure 2c). Furthermore, the vibrational frequencies are as sensitive to the arrangement of strong interactions as to the crystal density. Now the overall effect of vibrational entropy might not be to compress the predicted structures into a smaller energy range; for the hydrogen-bonding parabanic acid, the overall range in energies of the predicted crystal structures actually expands when the vibrational energy is added (the 18 lowest energy structures cover

30 Crystallography Reviews 31 Figure 7. The resulting re-ranking when vibrational energy contributions are added to the lattice energy for the predicted pyridine crystal structures. The positions of the two known polymorphs (Z 0 ¼ 4 Form I and Z 0 ¼ 1 Form II) are indicated. Data taken from (149). a range of 6.6 kj mol 1 in lattice energy and 8.4 kj mol 1 in free energy (150)). For the moderately polar pyridine, the overall range in energy is about the same in lattice energy and free energy (149); Figure 7 shows how the ranking of predicted crystal structures of pyridine changes when vibrational contributions to the free energy are included. For both these rigid molecules, there is a large variation (up to about 2 kj mol 1 ) in vibrational contributions to the relative energies and these contributions improve the energies of the real structures relative to the rest of the predictions. van Eijck (148) reported similar experiences with flexible molecules the ranking of the real crystal structures of glycol and glycerol were both improved when the vibrational contributions to the free energy were taken into account. In this work, the individual contributions to the total energy were examined in detail, showing that differences in zero-point energy and entropy were equally important to the overall relative energies of predicted structures, each varying by up to about 3 kj mol 1 amongst the lowest energy predicted crystal structures Molecular dynamics. While lattice dynamics seems to be a useful step, many molecular crystals are quite close to their melting point at ambient conditions. At these temperatures, there may be wide-amplitude dynamical motions, which could have significant anharmonic character, perhaps even leading to transitions between the possible crystal structures. The basis of lattice dynamics is harmonic and, while anharmonic corrections are possible, a more natural approach to studying such behaviour is MD, where the positions of molecules evolve via Newton s classical equations of motion and therefore naturally include anharmonic motions. While MD calculations are more

31 32 G.M. Day computationally demanding than lattice dynamics, some MD studies of real (e.g. acetic acid (132), succinic anhydride (167), imidazole, 5-azauracil (168)) and predicted (e.g. acetic acid (169), coumarin (170) and fluorouracil (80)) molecular organic crystals have been performed. Not only do MD simulations allow an evaluation of dynamic contributions to the energies of putative crystal structure, but the temperature allows conversion between structures separated by low-energy barriers, as discussed in Section (169). The sheer number of nearly equi-energetic crystal structures that are found in lattice energy searches suggests a very bumpy potential energy surface. Visual inspection often suggests that many of these predicted structures are closely related and we can guess that some are separated by very small energy barriers (40, 149, 150, 169). As an extreme example, three very similar minima (differing by less than 0.01 A in cell parameters and by only 0.25 kj mol 1 in lattice energy) were found in a study of parabanic acid (150). The number of closely related lattice energy minima could be reality or an artefact of the atom atom approach for describing intermolecular interactions. Whether they are real or artefacts of the method, the energy barriers between these structures are undoubtedly so small that thermal motion would average over the structures. While the lattice energy search does not give any information about the energy barriers between minima, a MD simulation can be used to shake up the structures, providing enough thermal energy to allow small energy barriers to be crossed. Mooij and co-workers (169) took this approach in their study of acetic acid and, indeed, found that many of the structures interconvert during a short MD simulation at 200 K many higher energy structures convert into (already found) lower energy structures, significantly reducing the final number of possible structures. The metadynamics approach to CSP (described in Section 2.2.2) includes this smoothing out of the energy surface from the start of the calculations, sampling the free energy instead of the lattice energy surface when generating crystal structures Crystal properties and growth While structure prediction is a considerable task, the structure of a crystal is really just a starting point for understanding or predicting its properties. A few of the many properties of interest are the solubility, colour, mechanical properties (e.g. compressibility or plasticity) and morphology. When CSP is used as part of a screen for unknown polymorphs, we might not only ask if polymorphism is likely, but also whether there would be an important variation in property X amongst possible polymorphs. Variations in properties could be advantageous if the polymorph with a desirable property could be reliably produced, or disastrous if a polymorph with unwanted properties cannot be suppressed. Of course, modelling the properties of putative structures could help in the structure prediction itself; perhaps the most relevant property in this respect is the crystal morphology, which provides insight into crystal growth Morphology prediction. The morphology of a crystal reflects the relative growth rates of its faces; if the relative growth rates in all possible directions are known, then the morphology can be predicted. There is a wide range of sophistication in predicting these relative growth rates and it would not be possible to provide a thorough review of the subject here. More thorough reviews can be found elsewhere (see, e.g. (171) and references

32 Crystallography Reviews 33 therein). Instead, the most commonly used models are briefly described and some modern simulation methods are mentioned. The simplest approach to morphology prediction uses a purely geometrical model the Bravais Friedel Donnay Harker (BFDH; (172)) law, which states that the growth rate of each face is inversely proportional to the interplanar distance between layers parallel to that face. Because it only uses the geometry of the crystal, the BFDH model performs best when the strengths of interactions in the crystal are isotropic; most molecular crystals, with their hydrogen bonds and other directional interactions, do not obey this demand. Hartman and Perdok (173) took the first steps in considering the anisotropy in strengths of interactions. Invoking the idea of strong bonds between growth units (typically, the individual molecules), they defined three types of crystal faces flat (F), stepped (S) and kinked (K), with strong bonds in two directions (F), one direction (S) and no directions (K). In general, dominant crystal faces are F-faces. The strongest interactions between growth units are usually collected together in what is called a crystal graph and periodic chains of these interactions are called periodic bond chains (PBCs); stable F-faces are identified by searching for non-parallel, intersecting PBCs or connected nets. Only the growth rates of these F-faces need normally be considered for the crystal morphology. The most popular approach to predicting the relative growth rates is the attachment energy model the attachment energy, E att, is defined as the energy per molecule released when a growth slice is attached to the crystal face (173, 174). A crystal face s growth rate always increases as the attachment energy increases, although the exact relationship depends on many factors. Variations with supersaturation, temperature, solvent effect and growth mechanism are ignored in the simple attachment energy model and the growth rates of all crystal faces are assumed to be linearly proportional to their attachment energies. Therefore, prediction starts with a search for F-faces, attachment energies are calculated for each of these faces and these define the morphology. Such attachment energy calculations have been proposed as a way of distinguishing between nearly equi-energetic structures from lattice energy searches (40). As a very approximate consideration of kinetic differences between structures, the comparison of relative growth rates of different faces of a crystal is extended to the relative growth rates of faces on different, competing crystals. Initial studies have shown some promise for the attachment energy model in distinguishing real from unobserved crystal structures predicted relative growth rates (as judged by the attachment energies to the morphologically dominant faces (40, 149, 150, 175) or the total integrated volume of the predicted morphology (102, 175) of the known structures of paracetamol, pyridine and parabanic acid are amongst the highest in all of the low-energy predicted crystal structures. Important steps forward are being made towards more realistic models of crystal growth, such as Monte Carlo simulations which can incorporate growth mechanism, temperature and supersaturation (171, 176, 177 ); as methodologies of modelling growth of crystal faces become more advanced, they will surely play an increasingly important role in the ab initio predictive process from molecules to crystals to materials properties Prediction from known structures An alternative to calculating physical properties to distinguish between the low-energy predicted crystal structures of a molecule is to make use of comparisons to the wealth of information in the CSD to highlight the most promising structures. Although crystal structures themselves provide no information on the kinetics of nucleation and growth,

33 34 G.M. Day they are the end result of the crystallization process and so common patterns or interactions in the database may reflect thermodynamic as well as kinetic preferences. There are now over half a million crystal structures in the CSD and these must encode useful information for CSP. The database of known structures should be able to play an important role in the prediction of crystal structures, just as homology modelling is advancing predictive power in protein structure modelling (see Dunitz and Scheraga s (178) discussion of the similarities between protein and CSP). First, we should distinguish between objective use of the CSD and subjective attempts to assess computer-generated structures using crystallographic experience. One might expect that, to the eye of an experienced chemical crystallographer, some calculated crystal structures might look less good than others. If so, it might be possible to select the true crystal structure(s) from a list of possibilities by visual inspection. A recent study asked a large group of attendees at the 2005 congress of the IUCr to visually assess sets of predicted structures of two small molecules and select the structure that they considered to be most likely. The surprising result was that, for both molecules, the observed crystal structure received the least votes (179). Clearly, for these molecules the unobserved lowenergy predictions look just as plausible as the observed structures. Definite conclusions cannot be made from such a small test. However, it seems that a subjective assessment of computer-generated crystal structures is an unreliable method of assessing which of the low-energy possibilities will be observed; as long as a good quality model has been used during lattice energy minimization, all low-energy crystal structures will look reasonable. Therefore, assessment of predicted structures based on what has been seen before should rely on rigorous use of the CSD rather than simple intuition. Information from the CSD has been used both in the initial generation of crystal structures and in the final choice of the most likely crystal structure of a molecule. Motherwell s (76) genetic algorithm search method, which scores structures by comparison to distributions of inter-atomic separations, has already been described. Hofmann (83, 84) used data mining of the CSD to obtain atom atom potentials that can discriminate between real crystal structures and other plausible, but unobserved (wrong) structures. No a priori decision is made about the form of these trained potentials, but comparison to empirically derived models show that the data mining process does result in physically meaningful models. These trained potentials are used in Hofmann s (82, 83) FlexCryst program for CSP. Sarma and Desiraju (180) suggest re-ranking the results of a lattice energy search by consideration of the crystal structures of similar molecules chosen to have the same chemical functionality as the target molecule. In applying their method to one of the molecules in the 2001 CSP blind test (molecule IV, see Table 1), they started with lists of the lowest energy crystal structures from Polymorph Predictor simulated annealing calculations using simple atom atom model potentials. The known crystal structures of a set of 10 similar molecules were then unbuilt, the molecular structure exchanged for that of the blind test molecule and the resulting crystal structures were lattice energy minimized. The 10 resulting hypothetical crystal structures for the molecule were then compared to the Polymorph Predictor lists and those with good rankings in the lists were considered as the most probable crystal structures. One major choice for this particular molecule was whether it would crystallize with hydrogen bond dimers or chains. Of Sarma and Desiraju s (180) hypothetical structures with good lattice energy ranking, all but one formed hydrogen bond chains, which they proposed as the most likely hydrogen bond motif for molecule IV. They chose as their first prediction one of their hypothetical

34 Crystallography Reviews 35 structures that was quite well-ranked on lattice energy (third using one of the force fields they used) and was derived from a highly similar molecule (with similarity based on molecular volume and surface area). This chosen crystal structure did, in fact, correspond to the observed crystal structure for molecule IV. Overall, the approach seems useful, but only when reliable intermolecular interactions are present and the crystal structures of enough similar molecules can be found in the CSD. This approach has been further tested, with mixed success (181, 182). Such comparisons will benefit from a rapidly expanding database of crystal structures. Supplementing the physical approach (energy-based prediction methodologies) by comparisons to the CSD can offer short- and long-term benefits to crystal structure modelling. In the short term, the trends and patterns in the CSD have the potential to highlight the most likely candidates from the typically crowded list of lattice energy ranked putative crystal structures. In the longer term, these observations could guide developments of the physical approach. Take Sarma and Desiraju s (180) example of blind test molecule IV: their hypothetical structures derived from the CSD point to a preference for the hydrogen bond chain motif. Lattice energy predictions, on the other hand, consistently produced many possible crystal structures based on hydrogen bond dimers with as favourable, and often better, lattice energies (21). The fact that CSD-derived hypothetical structures show a preference for hydrogen bond chains suggests that the database holds some information that was not present in the lattice energy calculations. It is the nature of this information that is the question. Do the CSD-derived structures simply tell us that our model potentials are inadequate, that dynamical contributions to the energy are important, or that there is a kinetic preference for the formation of hydrogen bond chains during nucleation or crystal growth that defines the pattern of interactions in the final crystal structure? To discover the nature of the information that the CSD has provided, and so further our understanding of crystallization, the computational methods must be developed improved energy models and dynamic modelling of the crystal, its nucleation and growth. In the short term, the CSD can be used for empirical guidance; in the long term, it should direct and motivate the development of simulation methodologies. 3. The blind tests of CSP The blind tests of CSP have been organized every 2 3 years since 1999 and hosted by the Cambridge Crystallographic Data Centre as an objective evaluation of current methods in CSP. The results of the four blind tests (20 23) completed to date are summarized here. Many of the active researchers in CSP were invited to test their methodologies in the blind tests; the number of participating research groups grew from 11 in 1999 to 17 in 2001, and 18 in 2004, and dropped slightly to 14 in In these international collaborations, the molecular diagrams of a set of target molecules (Table 1) were circulated to the participants, while the recently determined crystal structures were withheld by an independent referee. Predicted crystal structures had to be submitted within 6 months of announcing the targets and each participant was allowed three predicted crystal structures for each molecule. Three categories of molecule have been used in each of the blind tests: small and rigid with only the elements C, N, O and H (category 1); rigid with more challenging elements or functional groups (category 2); and moderately flexible (category 3). In the first two blind tests (CSP1999 and CSP2001), the crystals were guaranteed to be in a common space group with Z 0 1, while no restriction was placed

35 36 G.M. Day on space group in the third round (CSP2004), except that Z 0 2. The rules were adjusted again in CSP2007 (23) so that the molecules in categories 1 3 were guaranteed to adopt a crystal structure with Z 0 1 (but with no restriction on space group). Instead, a new category was introduced specifically in CSP2007 to test predictions for crystal structures with more than one molecule in the asymmetric unit: in this year, a 1:1 cocrystal was issued as the challenge (target XV, Table 1). Table 1. Diagrams and reference numbers of the target molecules in the CSP blind tests (20 23). CSP1999 Category 1 Category 2 Category 3 Extra (I) (II) (III) (VII) CSP2001 (IV) (VI) (V) CSP2004 (VIII) (XI) (IX) (X) CSP2007 1:1 cocrystal (XII) (XIII) (XIV) (XV) In addition to the three official predictions that were allowed for each molecule, extended lists of predicted structures (about the 100 best structures from each participant) were analysed in CSP2004 and the authors reported enough information in the CSP2001 publication (21) to determine the success rate of the crystal structure generation step in each participant s method (Table 2). For the rigid molecules in categories 1 and 2, the structure generation was usually successful: 70 80% of participants generated the correct

36 Crystallography Reviews 37 Table 2. Summary of results of the first four blind tests of CSP (20 23). Statistics on the success of predictions are presented for each molecule and the molecules from the four blind tests are grouped by category. Molecule Number of participants Participants who generated the correct structure somewhere in their list Participants who submitted the correct structure as their first prediction Participants who submitted the correct structure anywhere in their third predictions Category 1 I (polymorph 1) 11 b 0 I (polymorph 2) a 3 IV 16 c 10 (of 14 d ) 1 c 3 VIII e XI XII Category 2 II 8 b 0 1 V (of 14 d ) 3 4 IX XIII Category 3 III 11 b 1 1 VI 11 4 (of 10 d ) 0 0 X XIV Extras VII 6 b 1 1 XV Notes: a Molecule I is polymorphic, but all successful predictions were for the metastable form. b The CSP1999 publication (20) does not report enough detail on each participant s results to know the overall success rate of the searches. c The strategy and results for one of the participants (Sarma and Desiraju (180)) are published separately from the CSP2001 publication (21). This search was the only one to rank the observed crystal structure first of the three guesses. d Some participants in CSP2002 borrowed the candidate structures from another participant, for re-ranking by another scoring function or energy model. There were, therefore, only 14 independent searches performed for molecules IV and V and 10 for molecule VI. e Molecule VIII might not be considered a true blind test, as a partial structure report was discovered part-way through the test. Most participants continued their calculations without using information in this report (22). 0 4 structure somewhere in their list for molecules IV, V, VIII and XII. The success of the crystal structure generation methods for molecules IX and XIII in category 2 was slightly lower and the problems with these molecules probably relate to shortcomings in modelling the intermolecular interactions involving halogen atoms. The problem was worst for the iodine-containing molecule IX, for which a spherical representation of the iodine atoms is particularly inadequate (22). Generating the observed crystal structure of molecule XI was problematic because the observed structure of this molecule contains two molecules in the asymmetric unit.

37 38 G.M. Day Due to time constraints (this molecule was introduced half-way through the course of CSP2004 (22)) or limitations of certain methodologies, the methods used by only eight of the 18 participants who attempted predictions could have generated crystal structures with the symmetry of the observed crystal (P2 1 /c, Z 0 ¼ 2). Only four of these eight did have the correct crystal structure somewhere in their list. With the six extra degrees of freedom that must be sampled for the second molecule in the asymmetric unit, the dimensionality of the search space is clearly a problem for the crystal structure generation step in CSP. Considering that over 10% of homomolecular organic crystals have more than one molecule in the asymmetric unit (52), this is a serious problem and the robustness of the search methods must be improved. CSP for crystal structures with more than one molecule in the asymmetric unit is also important if the methods are to be applied to multicomponent crystal structures: salts, cocrystals and solvates. The success rate of crystal structure generation for the one cocrystal (XV) that has been included in a blind test was, again, low: only five of 12 participants who attempted CSP on this target generated the known structure. For this cocrystal, the difficulty of generating structures with two molecules in the asymmetric unit was compounded by having to consider two conformations of the acid. For the flexible category three molecules in CSP2001 (VI) and CSP2004 (X), the success rate for generating the observed crystal structure somewhere in the extended list of predicted structures was below 50%. This failure to even generate the observed crystal structure guaranteed the failure of the prediction calculations for the majority of participants. A marked improvement was observed in CSP2007, when nine of 12 participants who attempted predictions on the flexible molecule XIV did generate the observed crystal structure. This encouraging improvement in search methods for the flexible molecule category probably reflects improvements in the search methodologies for flexible molecules. However, the calculations performed during the blind tests also revealed that the flexibility of the molecular structure of XIV was less important than for the molecules representing this category in the previous blind tests (23). This observation emphasizes the important point that it is not possible to guarantee the same level of difficulty from one blind test to another, because the difficulties involved in predicting crystal structures of a given molecule are often only apparent once the calculations have been performed. Therefore, while the blind tests are important, they cannot be considered a perfect measure of progress in the field of CSP. The overall results of the first three tests demonstrated that prediction is possible under blind test conditions while the overall success rate was low, eight of the 12 crystal structures were successfully predicted amongst the 3 allowed predictions by at least one participant. There were a few successes for the easiest category of molecule, but no method was consistently successful (20 22). The challenging functional groups in the category 2 set of rigid molecules (II, V and IX) lowered the success rate slightly, while the worst news is reserved for those interested in conformationally flexible molecules (III, VI and X). Only one successful prediction was made of a flexible molecule s crystal structure in these three blind tests. The results of the fourth blind test demonstrate encouraging improved success rates (23). In total, the 14 participating research groups achieved 13 successful predictions spread across the four targets, 10 of which were submitted as the participant s first choice. These correct predictions included three for the flexible molecule and two for the cocrystal. Furthermore, the successes were not restricted to a small number of groups: seven of the 14 participants correctly predicted at least one of the structures, four of whom had

38 Figure 8. The targets of the fifth blind test of CSP. Crystallography Reviews 39

39 40 G.M. Day multiple successes in CSP2007. One important observation is that five of the participating groups located the known crystal structures of all four CSP2007 targets somewhere in their lists of predictions. The methods employed by these groups were variations on random sampling of the structural variables (four groups) and Monte Carlo simulated annealing (one group) and their consistency of success is an indication that those methods are reliable for structure generation. The other main observation from the CSP2007 blind test was that the method used by one participating group successfully predicted all four crystal structures, each as the number 1 (lowest energy) structure (183). These calculations used the empirically corrected DFT þ D method parametrized by Neumann and Perrin (134) to lattice energy minimize and rank the lowest energy crystal structures for each blind test target. While the computational cost of their calculations was much greater than the other participants, the results are nonetheless impressive and demonstrate that the crystal structures of the relatively small molecules that have been studied in the blind tests are predictable by global lattice energy minimization methods. The focus of the most successful groups in CSP2007 has been on improving the description of the lattice energy surface, usually without including other contributions to the free energies of the crystal structures or any consideration of the crystal growth conditions. In these cases, it seems that the molecules find the global energy minimum structure during crystallization and that the global lattice energy minimum for each target corresponds to the free energy minimum. The results of all four blind tests are discussed in detail in the four main publications (20 23), and these studies provide a wealth of useful comparative information of the various methods employed for CSP. The blind tests should be continued as a means of objectively monitoring the progress that is made in developing methods for CSP and the fifth blind test is currently underway. Reflecting the progress that has been made in methods for CSP, the targets of the fifth blind test include a molecular salt, a hydrate of a flexible molecule and a larger, much more flexible molecule than has previously been set as a target (Figure 8). 4. Concluding remarks and the future In two and a half decades of increasing interest and effort, there have been considerable advances in methodology for the prediction of crystal structures from molecular information alone. The requirements for this goal bring together many challenges for molecular modelling: structure searching and global minimization methodologies; data mining of the CSD; development of intermolecular model potentials, force fields for molecular flexibility and first principles electronic structure mechanical methods; and development of dynamical simulations. An arsenal of methods has been developed for generating all of the possibilities for the crystal packing of a molecule. At least for small molecules, this will surely soon be a solved problem some methods already seem consistently successful (22, 23). As these search methods have been coupled to the significant progress in modelling the static energies of the crystal structures of rigid molecules, we are nearing a true representation of the energy landscape for crystals of small molecules. At least, we are piecing together a highly accurate and comprehensive static picture, where the relative depths of the valleys in the surface, and their corresponding structure, are known. A limiting factor now is our static representation of crystals, ignoring the vibrations of molecules about their equilibrium

40 Crystallography Reviews 41 positions and the corresponding contributions to the energy and entropy. In the few cases where such dynamic considerations have been included in the thermodynamic model for CSP, generally improved results have been reported (i.e. known crystal structures are more reliably found at, or closer to, the global free energy minimum). There has been recent progress in methods that can be applied to CSP of flexible molecules. Judging from the blind test results and the limited number of published studies, structure generation for flexible molecules seems to be less reliable than for rigid molecules and is hampered by the need to adequately sample all crystal packing possibilities of all relevant conformations. The need to simultaneously model intra- and intermolecular energies to the requisite accuracy for CSP (where relative energies are sometimes measured in fractions of a kj mol 1 ) is a further challenge, but promising results have been reported using accurate atom atom intermolecular models combined with quantum mechanical molecular energy calculations Requirements for progress in CSP and future prospects Structure searching To become a generally useful tool in crystal engineering and industry, it is necessary to be able to generate all possible crystal structures for rigid and flexible molecules, even in the less common space groups and with multiple independent molecules salts, cocrystals, and solvates are of obvious interest and even pure (neat) crystals have Z often enough to be a worry. The algorithms for generating structures are very advanced, so considering these complex systems is now largely a matter of computing power the parallelization of programs for crystal structure generation and the use of distributed computing now allows complete sets of structures to be compiled when an accurate enough energy model is available. (The last point is crucially important for flexible molecules; poor force fields can undermine powerful search methods.) The thermodynamic model The continued development of energy models is necessary for refinement of our thermodynamic model. Several promising approaches are being pursued: non-empirically derived, molecule-specific anisotropic atom atom model potentials and SCDS Pixel or quantum mechanical solid-state electronic structure calculations as alternatives to the atom atom approach. To make progress with flexible molecules, the increasingly accurate intermolecular models are being combined with better models for molecular flexibility than are currently provided by molecular mechanics methods. The use of quantum mechanically calculated conformational energies along with anisotropic atom atom models of intermolecular interactions has yielded promising results. A similar route has been envisioned by Gavezzotti (184) for the coupling of SCDS Pixel and quantum mechanically calculated conformational energies. One major challenge here is the scalability of the calculations to larger molecules, where molecular quantum mechanical calculations of a sufficiently high level for reliable conformational energies dominate the overall computing time needed for CSP. Inadequacies in the treatment of the long-range dispersion energy mean that periodic DFT for molecular organic crystals currently relies on empirically derived corrections.

41 42 G.M. Day The development of improved solid-state quantum mechanical methods is a very active area of development and increasingly available computing power at relatively low cost makes such calculations feasible for CSP. Dynamical simulations of molecules in crystals are a requirement for real thermodynamics and an understanding of the role of temperature and entropy, as well as low-energy transitions between structures. While most studies still focus on static lattice energies rather than free energies, methods of including the effects of dynamics in crystals should develop an increasingly prominent role in modern CSP methodologies Kinetics As we have known all along, the thermodynamic model is only a starting point a reliable method of CSP will need to reflect all factors that influence the crystallization of a molecule. We do need to be modelling the kinetics of crystal growth and nucleation. Simulations of crystal growth are advancing in sophistication, while the nucleation of molecular crystals is still a great, largely unexplored, challenge. Very simple considerations of crystal growth can give an initial indication of the relative growth rates of competing crystal structures and, until (if ever) nucleation and growth kinetics can be fully understood by molecular simulation, data mining of the CSD could help guide the final selection of the most likely structures from the crowded lists of low-energy crystal structures. The goal of CSP has required a significant level of development and careful validation of methods. While there is still much needed in terms of development of methods, the methods have reached a good level of reliability, at least for reasonably small molecules, to be a useful complementary method to experimental polymorph screening and possibly to help guide crystal engineering experiments for the design and discovery of new molecular materials. Acknowledgements There are many to thank for their contributions to the development of the field of CSP. Personally, I thank Professor William Jones, Dr W.D. Sam Motherwell and Dr Neil Feeder for many fruitful discussions and Professor Sally L. Price for introducing me to this fascinating and challenging research problem, as well as for passing on some of her vast experience in modelling intermolecular forces. I also thank the Royal Society for funding my own research via a University Research Fellowship and the Cambridge Crystallographic Data Centre for its hosting of the CSP blind tests. Note 1. Rice and Sorescu (30) presented a similar study of CSP for a large set of molecules with somewhat different results. Of the experimentally observed crystal structures 64% (111 of 174) were located as the lowest energy structure in the list of computer-generated structures. The very high success rate in this study could be due to an incomplete sampling of packing space (only 85% of observed structures were located at all) and the fact that rigid molecular models were taken directly from the experimentally determined crystal structures. This choice of molecular model, rather than an optimized molecular geometry, will bias the lattice energy ranking towards the observed crystal structures.

42 Notes on contributor Crystallography Reviews 43 Graeme M. Day obtained his BSc degree in Chemistry, Mathematics and Computing Science from Saint Mary s University (Nova, Scotia, Canada) in 1996, his MSc in Theoretical Chemistry from the University of Oxford in 1997 and PhD in 2003 from University College London with Prof Sally Price in the area of lattice dynamics and structure prediction of molecular crystals. He moved to the University of Cambridge in 2003 to work with Prof William Jones as a post-doctoral research associate in the Pfizer Institute for Pharmaceutical Materials Science. He has held a Royal Society University Research Fellowship at the University of Cambridge since 2005 to continue research in the areas of CSP and dynamics in molecular crystals. References [1] Chemburkar, S.R.; Bauer, J.; Deming, K.; Spiwek, H.; Patel, K.; Morris, J.; Henry, R.; Spanton, S.; Dziki, W.; Porter, W.; Quick, J.; Bauer, P.; Donaubauer, J.; Narayanan, B.A.; Soldani, M.; Riley, D.; McFarland, K. Dealing with the Impact of Ritonavir Polymorphs on the Late Stages of Bulk Drug Process Development. Org. Process Res. Dev. 2000, 4, [2] Bauer, J.; Spanton, S.; Henry, R.; Quick, J.; Dziki, W.; Porter, W.; Morris, J. Ritonavir: An Extraordinary Example of Conformational Polymorphism. Pharm. Res. 2001, 18, [3] Beyer, T.; Lewis, T.; Price, S.L. Which Organic Crystal Structures are Predictable by Lattice Energy Minimisation? CrystEngComm. 2001, 3, [4] Price, S.L. The Computational Prediction of Pharmaceutical Crystal Structures and Polymorphism. Adv. Drug Deliv. Rev. 2004, 56, [5] Day, G.M.; Motherwell, W.D.S.; Jones, W. A Strategy for Predicting the Crystal Structures of Flexible Molecules: The Polymorphism of Phenobarbital. Phys. Chem. Chem. Phys. 2007, 9, [6] Ouvrard, C.; Price, S.L. Toward Crystal Structure Prediction for Conformationally Flexible Molecules: The Headaches Illustrated by Aspirin. Cryst. Growth Des. 2004, 4, [7] Antoniadis, C.D.; D Oria, E.; Karamertzanis, P.G.; Tocher, D.A.; Florence, A.J.; Price, S.L.; Jones, A.G. A Computationally Inspired Investigation of the Solid Forms of (R)-1-phenylethylammonium-(S)-2-phenylbutyrate. Chirality 2010, 22, [8] D Oria, E.; Karamertzanis, P.G.; Price, S.L. Spontaneous Resolution of Enantiomers by Crystallization: Insights from Computed Crystal Energy Landscapes. Cryst. Growth Des. 2010, 10, [9] Gourlay, M.D.; Kendrick, J.; Leusen, F.J.J. Rationalization of Racemate Resolution: Predicting Spontaneous Resolution through Crystal Structure Prediction. Crys. Growth Des. 2007, 7, [10] Leusen, F.J.J. Crystal Structure Prediction of Diastereomeric Salts: A Step toward Rationalization of Racemate Resolution. Cryst. Growth Des. 2003, 3, [11] Go rbitz, C.H.; Dalhus, B.; Day, G.M. Pseudoracemic Amino Acid Complexes: Blind Predictions for Flexible Two-component Crystals. Phys. Chem. Chem. Phys. 2010, 12, [12] Karamertzanis, P.G.; Kazantsev, A.V.; Issa, N.; Welch, G.W.A.; Adjiman, C.S.; Pantelides, C.C.; Price, S.L. Can the Formation of Pharmaceutical Cocrystals be Computationally Predicted? II. Crystal Structure Prediction. J. Chem. Theory Comput. 2009, 5, [13] Cruz Cabeza, A.J.; Day, G.M.; Motherwell, W.D.S.; Jones, W. Prediction and Observation of Isostructurality Induced by Solvent Incorporation in Multicomponent Crystals. J. Am. Chem. Soc. 2006, 128,

Validation of Experimental Crystal Structures

Validation of Experimental Crystal Structures Aim This use case focuses on the subject of validating crystal structures using tools to analyse both molecular geometry and intermolecular packing. Introduction