Combining Docking and Molecular Dynamic Simulations in Drug Design

Size: px

Start display at page:

Download "Combining Docking and Molecular Dynamic Simulations in Drug Design"

Theodore Hudson
6 years ago
Views:

1 Combining Docking and Molecular Dynamic Simulations in Drug Design Hernán Alonso, 1 Andrey A. Bliznyuk, 2 Jill E. Gready 1 1 Computational Proteomics Group, John Curtin School of Medical Research, The Australian National University, Canberra ACT 0200, Australia 2 ANU Supercomputer Facility, The Australian National University, Canberra ACT 0200, Australia Published online 6 June 2006 in Wiley InterScience ( DOI /med.20067! Abstract: A rational approach is needed to maximize the chances of finding new drugs, and to exploit the opportunities of potential new drug targets emerging from genomic and proteomic initiatives, and from the large libraries of small compounds now readily available through combinatorial chemistry. Despite a shaky early history, computer-aided drug design techniques can now be effective in reducing costs and speeding up drug discovery. This happy outcome results from development of more accurate and reliable algorithms, use of more thoughtfully planned strategies to apply them, and greatly increased computer power to allow studies with the necessary reliability to be performed. Our review focuses on applications and protocols, with the main emphasis on critical analysis of recent studies where docking calculations and molecular dynamics (MD) simulations were combined to dock small molecules into protein receptors. We highlight successes to demonstrate what is possible now, but also point out drawbacks and future directions. The review is structured to lead the reader from the simpler to more compute-intensive methods. Thus, while inexpensive and fast docking algorithms can be used to scan large compound libraries and reduce their size, more accurate but expensive MD simulations can be applied when a few selected ligand candidates remain. MD simulations can be used: during the preparation of the protein receptor before docking, to optimize its structure and account for protein flexibility; for the refinement of docked complexes, to include solvent effects and account for induced fit; to calculate binding free energies, to provide an accurate ranking of the potential ligands; and in the latest developments, during the docking process itself to find the binding site and correctly dock the ligand a priori. ß 2006 Wiley Periodicals, Inc. Med Res Rev, 26, No. 5, , 2006 Key words: docking; molecular dynamics; drug design; binding free energies; protein flexibility; ligand conformations; protein-ligand interactions; ligand-binding site; scoring function; virtual screening; rotamer library; trajectory Correspondence to: Jill E. Gready, Computational Proteomics Group, John Curtin School of Medical Research,The Australian National University, P.O. Box 334,Canberra ACT 2601, Australia. jill.gready@anu.edu.au Medicinal Research Reviews, Vol. 26, No. 5, 531^568, 2006 ß 2006 Wiley Periodicals, Inc.

2 532 ALONSO, BLIZNYUK, AND GREADY 1. INTRODUCTION The development of new drugs is undoubtedly one of the most challenging tasks of today s science. Driven by the combined efforts of the pharmaceutical industry, biotech companies, regulatory authorities, academic researchers, and other private and public sectors, the development of new drugs is a very complex and demanding interdisciplinary process. This enterprise has produced not only a general improvement in health from the discovery and manufacture of new and more effective drugs, but has contributed to the advance of science itself, impelling the development of complex and more accurate tools and techniques for the discovery and improvement of new active compounds, and the understanding of their targets. After the completion of the human genome project, it was expected that a large number of new drug targets would be found expeditiously. However, the 30,000 or so genes encoded within the human genome did not turn out to offer a direct source for drug development, as it is not them, but the proteins they encode, that are the usual targets of drugs. This much larger proteome is far more complex than the collection of genes, as proteins may undergo post-translational modifications, associations with other molecules and prosthetic groups, and formation of multimeric complexes. 1 Moreover, most of these proteins have unknown or poorly characterized functions, and their connection with diseases is usually complex and difficult to define. It soon became evident that the blind expression, purification, and in vitro assay of hundreds if not thousands of proteins against libraries of hundred of thousands if not millions of compounds does not constitute a rational approach. The approaches and methodologies used in drug design have changed over time, exploiting and driving new technological advances to solve the varied bottlenecks found along the way. While until the 90s, the major issues were lead discovery and chemical synthesis of drug-like molecules, the emergence of combinatorial chemistry, 2 gene technology, and high-throughput tests 3,4 shifted the focus, with poor absorption, distribution, metabolism, and excretion (ADME) properties of the new drugs capturing more attention. 5 Today, the field of drug development may seem more fertile than ever before, with vast amounts of information from genomic and proteomic studies facilitating the finding of new targets, the usage of rational combinatorial chemistry for the production of libraries of compounds, the generation of genetically modified animal models for the development and testing of new drugs, and the possibility of using ultra-high-throughput test techniques for the screening of large libraries. However, despite all these advances, the revolutionary era of drug design has not arrived yet. 6 8 There is no unique solution to a drug design problem. The appropriate experimental techniques or computational methods to use will depend on the characteristics of the system itself and the information available. In the present review, we cover the case where both the structure of the protein receptor and the binding site are known. While it is possible to develop drugs without such information, the methods involved are quite different and are described elsewhere A variety of computational approaches can be applied at different stages of the drug-design process: in an early stage, these focus on reducing the number of possible ligands, while at the end, during lead-optimization stages, the emphasis is on decreasing experimental costs and reducing times. Although this is simple to articulate, it has been tried many times with only a few fruitful examples The lack of success has led to a re-examination of the underlying principles. For example, recent publications have shown that some of the hypotheses used during the enrichment steps may need to be refined. 21,22 While some drug developers opted for alternative experimental solutions, 23,24 others focused their attention on the improvement of computational protocols. These enhancements include, among others: incorporation of protein flexibility in the docking process, extensive exploration of the ligand conformation within the binding site, refinement and stability evaluation of the final complexes, and estimation of the binding free energies. Not surprisingly, molecular dynamics (MD) simulations have played a dominant role in these attempts to improve docking procedures; they are the focus of the present review.

3 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 533 Our emphasis is on protocols and approaches rather than on the theory behind the methods, as our intention is to provide the reader with a practical overview of the potential of combining docking and MD simulations for the rational design of novel drugs. The first section, Rational Drug Design, presents a brief introduction to application of computational techniques in the drug-design process. The Protein Flexibility section examines various ways of including flexibility of the target receptor in the docking using both approximate and MD approaches. The Refinement of Docked Complexes section looks into the applications of MD for the optimization and validation of the final complexes. The Free Energy Calculations section briefly describes widely used approaches for the evaluation of accurate binding energies. This is followed by MD Simulations at Different Docking Stages which reviews some published examples in which MD simulations have been used at several steps of the docking procedure. Finally, Docking with MD Simulations discusses how the docking of a small molecule into its protein target can be carried out using MD simulations exclusively. Terms which may be unfamiliar to the reader are marked in the text by and defined in the Glossary before the References. 2. RATIONAL DRUG DESIGN When the structure of the target protein is known, the drug discovery process usually follows a wellestablished procedure shown schematically in Figure 1. Virtual screening techniques are applied early during the docking protocol to reduce the size of large compound libraries. 10,11 Initially, libraries are pre-filtered using a series of simple physicochemical descriptors to eliminate compounds not expected to be suitable drugs. Pharmacophore analysis, neural nets, similarity analysis, scaffold analysis, Lipinski s rule of five 25,26, and garbage filters are used to sort out molecules according to their ADME properties, among others This procedure, which reduces the size of the library to a group of molecules more likely to bind the target receptor, is known as enrichment. It is necessary to stress that the selection criteria used during the enrichment steps need to be carefully chosen, as application of too stringent filters may lead to early exclusion of potential leads. 21,22 Similarly, drug-likeness of potential leads may be less important at the early stages than ease of the molecule to experimental validation with in vitro assays and X-ray crystallography. Similar compounds can be further grouped together and arranged in smaller assemblies to assist the screening process. The use of several small libraries is not only a more cost-effective approach, but can usually provide a broader chemical diversity than a single large library. Once an optimum library has been produced, molecules are docked to the target receptor to reduce further the number of candidates. This initial screening makes use of fast, but not very accurate, ranking functions to evaluate the relative stability of the docked complexes. The selected candidates, usually a few hundred, are subject to further docking experiments using more sophisticated scoring functions. A. Docking Docking techniques, designed to find the correct conformation of a ligand and its receptor, have now been used for decades (for recent reviews and comparisons see References 31 36). The process of binding a small molecule to its protein target is not simple; several entropic and enthalpic factors influence the interactions between them. The mobility of both ligand and receptor, the effect of the protein environment on the charge distribution over the ligand, 37 and their interactions with the surrounding water molecules, further complicate the quantitative description of the process. The idea behind this technique is to generate a comprehensive set of conformations of the receptor complex, and then to rank them according to their stability. The most popular docking programs include DOCK, 38,39 AutoDock, 40 FlexX, 41 GOLD, 42 and GLIDE, 43,44 among others.

4 534 ALONSO, BLIZNYUK, AND GREADY Figure 1. Schematic representation of the protocol commonly followed during a drug-design process, when the structure of the protein target is known or can be modeled. Steps within square brackets are not always performed, and those shaded in gray may incorporate MD simulations. B. MD Simulations Molecular dynamics simulations are one of the most versatile and widely applied computational techniques for the study of biological macromolecules They are very valuable for understanding the dynamic behavior of proteins at different timescales, from fast internal motions to slow conformational changes or even protein folding processes. 48 It is also possible to study the effect of explicit solvent molecules on protein structure and stability to obtain time-averaged properties of the biomolecular system, such as density, conductivity, and dipolar moment, as well as different thermodynamic parameters, including interactions energies and entropies. MD is useful not only for rationalizing experimentally measured properties at the molecular level, but it is well known that most structures determined by X-ray or NMR methods have been refined using MD methods. Therefore, the interplay between computational and experimental techniques in the area of MD simulations is longstanding, with the theoretical methods assisting in understanding and analyzing

5 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 535 experimental data. These, in turn, are vital for the validation and improvement of computational techniques and protocols. Although the first protein MD simulation (bovine pancreatic trypsin inhibitor; 58 residues and 450 atoms) was done in vacuo and for only 8.8 psec, 49 enormous increases in computer power nowadays permit simulations of systems comprising atoms 50,51 and simulation times in the order of nsec to msec. 52 Simulations of more realistic systems, including explicit water molecules, counterions, and even a complete membrane-like environment are possible, and new properties can now be studied as they evolve in real time. This progress in system representation has been accompanied by methodological improvements: better force fields, 53,54 improved treatment of longrange electrostatic interactions and system boundary conditions, and better algorithms used to control temperature and pressure. However, despite all these advances, the set up of an MD simulation can be far from trivial. Parameters used to describe proteins and their interactions are normally found within modern force fields, but adequate descriptors for non-standard molecules, such as ligands, might be missing. In such cases, the determination and fitting of new parameters is usually straightforward, but may be a time-consuming process if it needs to be done for many ligands, limiting the general applicability of the method. Commonly used programs for MD simulations of biomolecules include Amber, 55 CHARMM, 56 GROMOS, 57 and NAMD, 58 among others. C. Combined Docking and MD Simulations Fast and inexpensive docking protocols can be combined with accurate but more costly MD techniques to predict more reliable protein ligand complexes. The strength of this combination lies in their complementary strengths and weaknesses. One the one hand, docking techniques are used to explore the vast conformational space of ligands in a short time, allowing the scrutiny of large libraries of drug-like compounds at a reasonable cost. The major drawbacks are the lack, or poor flexibility of the protein, which is not permitted to adjust its conformation upon ligand binding, and the absence of a unique and widely applicable scoring function, necessary to generate a reliable ranking of the final complexes. On the other hand, MD simulations can treat both ligand and protein in a flexible way, allowing for an induced fit of the receptor-binding site around the newly introduced ligand. In addition, the effect of explicit water molecules can be studied directly, and very accurate binding free energies can be obtained. However, the main problems with MD simulations are that they are time-consuming and that the system can get trapped in local minima. Therefore, the combination of the two techniques in a protocol where docking is used for the fast screening of large libraries and MD simulations are then applied to explore conformations of the protein receptor, optimize the structures of the final complexes, and calculate accurate energies, is a logical approach to improving the drug-design process. 3. PROTEIN FLEXIBILITY It is now accepted that the old idea of the key and lock interaction of a ligand and its protein receptor is not an accurate description of most biological complexes. The ligand protein interactions resemble more a hand and glove association, where both parts are flexible and adjust to complement each other induced fit. They can modify their shape and mould their complementarity so as to increase favorable contacts and reduce adverse interactions, maximizing the total bindingfree energy. 59 It has been found that active-site regions of enzymes appear to present areas of both low and high conformational stability. 60 Mobile loops that close over the ligand upon binding are included within the flexible parts, while catalytic residues, for example, are usually structurally stable. This dual character of the active-site environment appears important for optimum binding.

6 536 ALONSO, BLIZNYUK, AND GREADY A. Receptor Conformation The three dimensional (3-D) structure of both ligand and protein are necessary for the application of docking techniques. While the manifold of conformational structures of small molecules may be relatively easy to predict, the lowest energy conformation obtained may not correspond to that of the bound ligand. The structures of proteins present a bigger challenge. Although experimental techniques involving X-ray and NMR analysis are now routine, inherent difficulties in the preparation of samples and data collection and interpretation mean we are still far from a complete automated and high-throughout process. 61 Many proteins targeted for drug design do not have an experimentally determined structure and, therefore, docking studies cannot be performed directly. In some cases, computational techniques can be used to predict the 3-D structure of a protein provided the structure of a closely related protein homolog is known. Homology modeling or sequence threading techniques may be used to generate models of protein structures which, although not as good as experimentally determined structures, can be used as docking targets Several studies have highlighted the importance of the conformation of a protein receptor for docking analysis. 68,71,72 McGovern and Shoichet 68 analyzed the influence of the receptor conformation on the final outcome of a docking screening of 95,000 small molecules. Three different conformations of 10 different target enzymes including a holo (ligand-bound), an apo (unligated), and a homology-modeled structure were used. The level of enrichment attained during the screening process was greatly affected by the quality of the protein structures, decreasing from the holo to the apo to the modeled structures as the conformation of the receptor is less prepared to accommodate the ligand. Despite this general trend, interesting exceptions were observed. In a few cases, the conformation of the holo protein was such that only molecules structurally similar to that present in the original crystal-structure determination were recognized as potential ligands, missing all other molecules that exhibited a different binding mode. In the case of the apo structures, their conformations may be inadequate to accommodate a ligand, because of wrongly positioned residues or the presence of loops blocking access to the binding site. Modeled molecules, even those modeled with a template of high sequence identity, can have badly placed side chains or missing loops or residues, hindering the docking process. In the latter case, it has been reported that the use of multiple homology models constructed from different crystal structures could provide a better representation of the protein receptor and improve the docking. 69 The biased selection of ligands as a result of using ligand-bound protein structures during a docking process was clearly shown in a series of cross-docking analyses performed by Murray et al. 72 Three different cases were studied; thrombin, thermolysin, and influenza virus neuraminidase. A series of crystal structures for each protein complexed with several ligands was used as docking targets. As expected, the best results were obtained when a given ligand was docked onto its own original structure, while poor placements were found when docking was done on the crystal conformation of a different complex. Most failures were because of movements of side chains, displacement of particular portions of the protein backbone, and mobility of metal atoms found within the active site. It was observed that the movements of side chains were usually linked to those of the backbone Ca atoms and, as a result, it was concluded necessary to consider more than sidechain flexibility to avoid mis-docking of ligands. In summary, it is of great importance to carefully prepare the structure of the protein target before the docking process. While structures of ligand-bound protein may provide the highest enrichments, the final results might be biased towards particular types of ligands. An example illustrating this effect for three trypsin inhibitor complexes is shown in Figure 2. On the other hand, while this could be avoided by using the structure of the unbound receptor, the conformation of the apo protein may be inadequate for accommodating the ligand (e.g., closed conformation of a loop). A desirable alternative is to treat the receptor as a flexible molecule, and to allow conformational changes during the docking process. Methods to allow this are reviewed in the next sections and summarized in Figure 3.

DOCKING AND MD SIMULATIONS IN DRUG DESIGN 537 Figure 2. Several distinct binding modes for different ligands to a single protein receptor are possible.

7 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 537 Figure 2. Several distinct binding modes for different ligands to a single protein receptor are possible. This superposition of three different complexes of the enzyme trypsin with the bis-phenylamidine inhibitor (1AZ8, blue), BX5633 (1MTV, green), and 1-(2-amidinophenyl)-3-(phenoxyphenyl)urea (1BJV, orange) shows that the ligands can adopt quite different orientations with the side chains ofthe protein presenting different conformations depending on which ligand is bound. B. Approximate Methods When receptor flexibility is included during the docking process, the risks associated with inadequate conformation of the protein target are reduced Although originally restricted to the docking of rigid ligands into rigid receptors, recent advances in docking algorithms have allowed incorporation of ligand flexibility and, to less extent, protein mobility, during the docking procedure. Most modern Soft Docking ligand "penetrates" protein Protein Flexibility Single Protein Conformation Molecular Dynamics Side Chain Flexibility Before docking: During docking: alternative receptor conformations dynamic exploration After docking: final optimization of contacts Multiple Protein Conformations Average Grid: single docking grid United description of the protein: rigid framework, flexible regions Individual conformations: docking into several conformers Figure 3. Different approaches that can be used during docking studies to incorporate proteinflexibility, at least partially.

8 538 ALONSO, BLIZNYUK, AND GREADY algorithms account for ligand flexibility; this can be addressed by systematic methods (i.e., incremental search), stochastic methods (i.e., Monte Carlo simulation), and deterministic search (i.e., MD simulation). 31 Programs that incorporate protein receptor flexibility, at least partially, began to appear more recently. 42,73 75 The size and complexity of proteins makes it difficult to fully account for their mobility during a docking process and, therefore, its treatment is usually restricted to selected residues. 1. Soft docking The simpler approaches deal with protein flexibility in an indirect way. Despite treating the receptor as a rigid object, the repulsive terms of the Lennard-Jones potential can be attenuated by generating a soft interaction. Thus, the ligand is allowed to penetrate the protein surface to some extent and to account for small and localized changes that would take place in a flexible environment Although this approach does not increase computational costs, the changes in protein conformation that can be accounted for are minimal. Ferrari et al. 79 performed a comparison between a soft docking and a multiple-structure docking approach (see below) for virtual screening. They concluded that while a soft scoring function performs better than a hard scoring function when a single configuration of the receptor is used, use of the hard function is recommended when multiple conformations of the receptor are considered. Overall, docking against multiple conformations of the receptor led to qualitatively different and better results than soft docking against a single structure of the protein studied. 2. Sidechain flexibility In a different and more comprehensive approach, the mobility of some residues, particularly those within an enzyme active site, can be treated explicitly either during the docking process or after the ligand has been approximately placed. 80,81 A set of rotamer libraries can be used to explore the conformational space of selected side chains Leach 84 was among the first to introduce receptor flexibility using rotamer libraries. One of the major limitations of this early approach was use of discrete pre-determined conformations of both the ligand and side chains. Schnecke and Kuhn 80 presented a new docking algorithm, SLIDE, which incorporates side-chain mobility. A rigid anchor fragment of the ligand is initially positioned, followed by addition of the remaining fragments according to their database conformation. Clashes between the ligand and the receptor are finally resolved by rotations of single bonds of non-anchor regions of the ligand and protein side chains. The main drawback of this program is the restricted flexibility of the ligand and the post-docking treatment of side-chain flexibility. Kallblad and Dean 83 incorporated side-chain flexibility within the binding site by pre-generating an ensemble of protein conformers using a rotamer library. A limited number of representative structures from this random ensemble was selected and used as targets for rigid docking. They found that while the synthetic inhibitor of human collagenase-1, RS , could not be properly docked into the crystal structure, some members of the ensemble could accommodate the molecule. In a similar approach, Frimurer et al. 82 generated an ensemble of tyrosine phosphatase B1 structures using a rotamer library for three selected active-site residues, and then performed docking of the flexible ligand with the program FlexX. 41 They obtained improved binding conformations and energies compared with the case where only a single conformation of the enzyme was used. Although consideration of side-chain flexibility increases the computational cost of the docking process, it allows localized protein movement and results in improved fit of the ligand. As only the side chains of selected residues are allowed to move, important changes in the protein backbone, such as those involved in loop movements, are not considered.

9 DOCKING AND MD SIMULATIONS IN DRUG DESIGN Combined protein grid Several alternative structures of the protein receptor can be combined into a single representation of the ensemble to account for bigger conformational changes that may be critical for the binding process. The averaging can be done over atom coordinates, to generate a final average structure, or over the grid representation of all receptor conformations, to produce an average docking grid. These grids, 85 or pre-calculated two-body potentials, are usually focused around the binding site and are used during the docking process to determine the interaction energy of different conformations of the ligand and the active site, in a fast and computationally inexpensive way. Different grids produced from several conformations of the receptor can be combined into a single global grid using a simple average-weighted scheme, or a differential weighting scheme which favors the contribution of some conformations over others. Knegtel et al. 86 were among the first to use multiple protein structures to account for protein flexibility during docking analysis. In their original study, they evaluated two different ways of combining several experimentally determined structures into an average representation; an energyweighted average composition, based on a weighted grid potential for each atom, and a geometryweighted average, based on the positional variation of each atom. For five different systems analyzed, these proved to offer a better representation of the receptor than single structures. The ensemble-based grids minimized the effect of steric clashes between particular conformations of receptor and ligand, allowing the establishment of more favorable interactions. It was concluded that ensemble-based grids presented a good filter for ligand database searches, as they offer a relatively inexpensive approach for considering receptor conformational variability during the docking process. Goodsell and co-workers 87 compared four different combination protocols to group 21 crystal structures of HIV-1 protease complexed with peptidomimetic inhibitors. All 21 crystal structures were combined into a docking grid to incorporate protein flexibility and water structure variations in a single representation of the target. Of the four combination protocols, neither the grid of mean values (too stringent), nor that of minimum energy (too permissive), led to an adequate representation of the ensemble. On the other hand, the two weighted-average grids constructed using the energy-weighted and the clamped techniques resulted in good docked conformations with accurate free energies. Broughton 88 arrived at similar conclusions in his studies of dihydrofolate reductase and cyclooxygenase-2. These studies also found that although the preparation of average grids is a timeconsuming process, the overall time of the docking procedure is noticeably improved, as the smoother surface of the average grids requires less scanning by the docking algorithm than the usual crystal-derived grids. In summary, docking methods that have combined multiple protein structures, whether from NMR, X-ray complexes, or MD simulations, into a single grid representation of the target molecule have provided better results than grids from single structures. The choice of combination procedure, however, has an even greater impact on the final outcome, with weighted-averaged protocols providing better results than simple average combination of structures. 4. United description of the receptor The docking program FlexE 74 implements a different solution to the protein flexibility problem. Instead of combining different conformations of the protein receptor into a single docking grid, a united protein description of the target is created. The alternative conformations are superimposed and a rigid average structure is constructed from the most conserved structural features. For the variable regions, different conformations are explicitly considered and retained as an ensemble, which can be combinatorially explored during the docking process to generate novel protein

10 540 ALONSO, BLIZNYUK, AND GREADY structures. The ligand structure is incrementally built within the active site (as in FlexX), 41 and after placement of each new fragment, all possible interactions of the partially built ligand with the alternative protein conformations are evaluated. Those protein conformations that best accommodate the partially grown ligand are retained for further cycles of growth and optimization. The program was evaluated using 10 different proteins, with 7 16 different crystal structures each. The docking of 60 different ligands produced conformations within 2.0 Å of the crystal position in 83% of the cases. However, the scoring function was not able to rank correctly the best complexes; these had to be identified by comparison with the crystal structures. Although the final results proved to be similar to those obtained by sequential docking to each protein structure, use of a united description of the ensemble reduced the docking times significantly. In a similar approach, Wei et al. 89 used a modified version of the program DOCK 39 to incorporate receptor flexibility during the docking process. In this case, the interactions between a given configuration of the ligand and different flexible parts of the receptor were calculated independently. Then, those flexible regions that presented the best interaction energies with the ligand were recombined into a final representation of the protein receptor. One of the most significant observations of this study was the importance of the receptor conformational energy for the final ranking. When protein flexibility is taken into account during the docking process, not only is the interaction energy between ligand and receptor important, but the internal energy of the protein also provides a major contribution. When this energy was ignored, many known ligands ranked poorly because of the presence of decoys that could complement better some high-energy conformations of the receptor. Clearly, the inclusion of protein flexibility during the docking procedure using a united description of the protein with rigid conserved regions and alternative flexible parts can improve the screening process and produce new hits in a shorter time than simple sequential docking against each protein structure. To obtain an adequate ranking of the final complexes, the internal energy of the receptor must be taken into account, as high-energy conformations of the protein may lead to unrealistic low-energy positioning of a ligand. C. MD Simulations for Receptor Flexibility Proteins in solution are mobile molecules. They do not exist in a single conformation, but in a manifold of different conformational states separated by low-and higher-energy barriers. An example of such mobility is shown for two ternary complexes of dihydrofolate reductase in Figure 4. The distribution and stability of each conformational state will depend on the physicochemical properties of the environment and the protein itself (e.g., free or ligand-bound). 90 Moreover, not all these conformations will be equally able to bind productively with a given ligand. Some will be more likely to accommodate the ligand molecule within the binding site without having to undergo large changes, while others will be less likely, or even incapable, of accommodating the ligand due, for example, to loop conformations that block the access to the binding site. 91 The presence of the ligand itself is expected to affect the structure of the binding site and the dynamic equilibrium between different conformational states of the protein. 92 During a binding event, the protein conformer most likely to accommodate the ligand will be depleted from solution to form a ligand-bound complex, and other conformers will then adjust to fill the vacated conformational space, driving the binding process forward. 90 Therefore, an ensemble of receptor conformations and not a single structure is expected to provide a better representation of the system. Docking against several structures of the protein increases the chances of finding a receptor in the right conformational state to accommodate a particular ligand. However, it also reduces the selectivity of the docking process, as a wider variety of ligands will be able to fit in this more relaxed representation of the protein. It is important, therefore, to use accurate scoring functions during the final screening process to maximize selection of the most active ligands.

11 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 541 Figure 4. The flexibility of proteins is clearly shown in these two superimposed ternary complexes of the enzyme dihydrofolate reductase with folate and NADP þ (1RA2 and1rx2).the flexible loop has been shown to adopt different conformations during the catalytic process; here it is seen frozen in the open (orange) and closed (green) states in two different crystal forms of the same complex. 1. Generation of multiple protein conformations Multiple structures of the protein receptor could be obtained from experimental studies, such as NMR and X-ray analysis, or generated using computational tools. Philippopoulos and Lim 93 suggested that the best source of protein conformations is NMR studies. A set of 15 NMR structures of E. coli ribonuclease HI was shown to explore a bigger conformational space than that of a conventional 1.7 nsec MD simulation of the system. Although both NMR and MD sampled similar conformations, NMR conformers covered a larger space with increased side-chain and protein-backbone mobility. As a caveat on these conclusions, it should be noted that this study employed conventional MD singletrajectory simulations, which are known to be inadequate for exploring large conformational spaces. Multiple-trajectory and replica-exchange MD methods introduced more recently for protein systems, and other modified MD simulations designed to improve the conformational sampling of the system (see Docking with MD Simulations section), would likely produce better results. Also, MD simulations provide an easy practical alternative to explore the conformational space of the protein receptor in the many cases where multiple experimental conformations are not available. Several different studies have shown that MD simulations are generally in good agreement with experimental results in reproducing the general protein structure and dynamic processes occurring on the psectimescale In a different approach, Thorpe and co-workers 100 used graph and constraint theories to identify possible movements of a protein structure. Although several protein conformations could be easily generated, their relative energies were not computed and, therefore, it was not possible to select the most stable conformations for docking. When preparing MD simulations for exploration of the conformational space of the protein receptor and generation of a proper ensemble of conformations, it should be remembered that the

12 542 ALONSO, BLIZNYUK, AND GREADY dynamic behavior of the free and ligand-bound forms of the protein might be very different. Kua et al. 101 studied the binding specificity of acetylcholinesterase (AChE) using a combined MD/ docking approach. A series of ligands was docked to several snapshots obtained from two different MD simulations, one for the apo-ache and another for the acetylcholine AChE complex. Although it was found that acetylcholine was correctly docked to more than 95% of the snapshots of both simulations, the energies of the complexes obtained from the ligand-bound trajectory were 0.7 kcal/mol more stable. The increased stability resulted from the induced fit observed during the simulation of the complex. However, as noted in a previous section, receptor conformations obtained from the simulation of a ligand-bound protein may be biased to accommodate particular types of ligands with particular binding modes. If the objective of a docking search is to find novel inhibitors with new binding modes, simulations of the apo protein may provide a more suitable variety of conformations which are not tuned to interact with a particular ligand and, thus, may offer a more versatile target for the docking protocol. Once a set of adequate structures has been obtained, it is necessary to determine how to use this ensemble of conformations to account for protein mobility during the docking process. We introduced two alternatives before; combination of the structures into a single docking grid or the generation of a united description of the protein with conserved and mobile regions. While these two alternatives are the most convenient ones for virtual screening of large libraries, there is a third approach that involves docking the ligand to every single conformation. This last alternative has been applied particularly in cases where several protein conformations have been obtained from MD simulations. Some literature examples are reviewed below, and more technical details for several studies are compiled in Table Docking into several individual protein conformations Docking the ligand against each protein structure in the ensemble constitutes the most comprehensive, although expensive, approach. While this strategy is not a realistic option for the virtual screening of a large library, it is a valid approach for difficult docking problems where even minor conformational changes of the receptor are expected to have a major influence on the binding process. Carlson et al. 102 developed dynamic pharmacophore models of HIV-1 integrase using several snapshots from an MD simulation. Hundreds of probe molecules were energy minimized within the binding site of several snapshots. The probe molecules mapped the most favorable positions for certain functional groups within the receptor. Binding sites conserved during the MD simulation were combined into a dynamic pharmacophore model. While the composite model was able to accommodate known inhibitors, a model from a single crystal structure failed to do so. Even the use of just two crystal-structure models produced improved results over single-structure models. 103 McCammon and co-workers 104 introduced the so-called relaxed-complex scheme, which takes into account the possibility that a ligand may bind to only a few conformations of the receptor. A long MD simulation of the apo receptor is first conducted to sample extensively its conformational space, followed by the rapid docking of mini-libraries of candidate inhibitors against a large ensemble of snapshots. In their original work, the FK506 binding protein, FKBP, was studied. Two different compounds, trimethoxyphenyl pipecolinic acid and 4-hydroxy(1-hydroxy) benzanilide, were sequentially docked to 200 snapshots. Although the final ternary complex was in good agreement with the experimental structure, the AutoDock 40 scoring function did not properly discriminate between different conformations of the ligands. In a second article, the Molecular Mechanics/Poisson- Boltzmann Surface Area (MM/PBSA) approach was employed to re-score the docking results and the best complex was found accurately 105 (see Free Energy Calculations section). The advantage of performing MD simulations of the protein receptor prior to the docking analysis has been clearly shown in another application of the relaxed-complex scheme. Schames

13 Table 1. Summary of Docking Studies That Made Use of MD Simulations for the Generation of Alternative Protein Receptor Conformations DOCKING AND MD SIMULATIONS IN DRUG DESIGN 543

14 544 ALONSO, BLIZNYUK, AND GREADY et al. 106 discovered a novel binding trench in HIV-1 integrase by docking the 5CITEP inhibitor to snapshots of a 2 nsec trajectory. The docking procedure revealed the existence of two different binding modes of the ligand, one of which made use of a new open space adjacent to the binding site. A series of butterfly compounds, with the ability to bind simultaneously to both the binding site and the trench, were designed and shown to dock into the open state of the protein receptor. The discovery of this new binding trench would not have been possible without the initial MD simulations of the receptor. Although computationally expensive, docking against individual protein structures has proven to be effective not only in finding the correct docking pose within a flexible receptor (both in evaluative and predictive contexts), but has been found useful also for discovering alternative binding modes otherwise not apparent from the rigid picture of proteins extracted from crystal structures. This method can have important applications in lead optimization and refinement, despite not being useful for the virtual screening of large libraries. Inclusion of protein flexibility does not necessarily lead to improvements in the final docking results. Increased capacity of the receptor to accommodate several ligand conformations may lead to the generation of very similar complexes not distinguishable by modern scoring functions. Therefore, the validity of the final predictions should be assessed experimentally. 3. Value of MD simulations before the docking process In summary, the application of MD before the docking process offers a suitable approach to explore the conformational space of the protein receptor. This information can then be transferred to the docking protocol in several ways. The simplest and most computationally expensive approaches use docking against individual snapshots of the receptor to generate a collection of docked complexes of different stabilities. To decrease computing times and make possible the virtual screening of compound libraries, the group of snapshots can be combined into a single representation of the ensemble. This combination could involve docking grids, where the grid representation of several protein structures is joined into a single average-weighted docking grid. Alternatively, a united description of the protein could be constructed, in which relatively stable regions of the protein are averaged into a single rigid framework and the flexible parts are treated as an ensemble of alternative positions, which can recombine during the docking process. Regardless of the specific approach used to deal with multiple protein structures, it is clear that the consideration of different possible receptor conformations can increase the accuracy of the docking process and opens new opportunities for the discovery of novel potential drugs. 4. REFINEMENT OF DOCKED COMPLEXES The most practical and convenient approach to address the docking problem seems to be a two-step protocol. Fast and less accurate algorithms are first used to scan large databases of molecules and reduce their size to a reasonable number of hits. This step is then followed by application of more accurate and time-consuming methods which can refine the conformation of the complexes and produce accurate free energies. 107,108 Molecular dynamics simulations present an attractive alternative for structural refinement of the final docked complexes. They incorporate flexibility of both ligand and receptor, improving interactions and enhancing complementarity between them, and thus accounting for induced fit. Moreover, the evolution of the complexes over the simulation timecourse is an indication of their stability and reliability; incorrectly docked structures are likely to produce unstable trajectories, leading to the disruption of the complex, while realistic complexes will show stable behavior. In addition, the ability to incorporate explicit solvent molecules and their interactions in the simulations

15 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 545 of the docked systems is very important for understanding the role of water and its effect on the stability of the ligand protein complexes. Some literature examples in which simulations were used to optimize docked structures are reviewed below, and more technical details for several studies are compiled in Table 2. Park et al. 109 studied the differential inhibition of two cyclin-dependent kinases (CDKs), CDK2, and CDK4, by three different selective inhibitors. The final MD simulations of the docked complexes provided molecular insight into the preferential binding of the inhibitors to CDK4. It was seen that the presence of the inhibitors reduced the mobility of a disordered loop in the case of CDK4, but did not seriously affect CDK2. Not only protein mobility but also the effect of explicit water molecules was analyzed. Tighter binding within CDK4 was reflected in a smaller number of water molecules diffusing into the active site compared with the CDK2 complexes. In the latter case, weaker hydrogen bonding with active-site residues and greater exposure to bulk solvent resulted in less stable complexes. Therefore, MD simulations of the final docked structures in an aqueous environment helped to rationalize at the molecular level, the differential inhibition observed experimentally. Another study that made use of MD simulations to analyze the relative stability of different docked complexes was published by Cavalli et al. 110 MD simulations of the final docked complexes of propidium within human acetylcholinesterase (HuAChE) were found to be useful not only for relaxing the protein receptor and accounting for the induced-fit effects, but also for discriminating among conformations of different stability. The dynamically most stable structures were in good agreement with the two possible binding modes found experimentally, while other intermediate configurations produced unstable trajectories. The authors highlight the importance of their combined approach: docking calculations to provide reliable starting structures and MD simulations to incorporate protein flexibility and analyze complex stability. Karplus and co-workers 111 analyzed the binding of D-glucose onto the surface of insulin. Several possible binding sites were found after docking, and MD simulations were used to study their kinetic stabilities. It was found that the best-ranked docked conformers produced stable trajectories, and although the glucose molecule actively explored different conformations, it never left the binding pocket. The low number and unstable character of hydrogen bonds between glucose and insulin were in agreement with the experimental low binding free energy. On the other hand, MD simulations of complexes where the glucose was bound elsewhere on the surface appeared less stable, providing information otherwise unobtainable on the relevance and stability of these different binding modes. Rasteli et al. 112 performed a database screening to find novel inhibitors of aldose reductase, and then used MD to optimize the structures of selected candidates. One interesting outcome of the study was that sulfonamide derivatives, one of the chemical families predicted to have good binding, were experimentally inactive. Seeking a molecular explanation for this finding, the authors performed MD simulations on different complexes. They found that during the trajectory, a water molecule entered the active site and hydrogen bonded to a key residue, weakening the interaction with the sulfonamide inhibitor and, thus, reducing its potential activity. This type of effect could not be predicted by docking analysis, again highlighting the value of MD simulations in accounting for solvent effects. In our work on bacterial R67 dihydrofolate reductase (DHFR) 113, we used MD simulations to test the relative stability of several different binding modes of the ligands dihydrofolate and NADPH (see Fig. 5). After docking analysis, it was found that several complexes presented stable MD trajectories and protein ligand interactions in good agreement with experimental data, despite having different global conformations of the ligand. We concluded that more than one possible ligand conformation was stable within the spacious and symmetric active-site pore, which is provided by the unusual tetrameric structure of the enzyme. While the reacting rings adopt a stacked conformation close to the center of the active-site cavity, where there is little if any water access, the long charged tails of the ligands extend towards opposite directions adopting multiple conformations in a solventrich environment.

16 546 ALONSO, BLIZNYUK, AND GREADY Table 2. Summary of Docking Studies That Made Use of MD Simulations for the Optimization of Final Docked Structures

17 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 547 Figure 5. Evolutionduring MDsimulationsofthedistancebetweenthereactingringsoftheligandsdihydrofolateand NADPHinfour different ternary complexes of R67 dihydrofolate reductase. It may be seenthat while the ligands remain within reacting distance for Complexes 2 and 4, theyseparatefrom each other in Complexes1and 4.This differentialbehavioralongthe MD trajectoriesallowed properly docked complexes to be distinguished from incorrect unstable complexes. These represent some selected examples of work where MD simulations have been applied after docking analysis to optimize the final structures, analyze the stability of different complexes, and account for solvent effects. Other studies include the work by Cannizzaro et al. 114 on the origins of the enantioselectivity of an antibody catalyzed Diels-Alder reaction, Garcia-Nieto et al. 115 on the interaction modes of nimesulide and prostaglandin-endoperoxide synthase-2, and Hammer et al. 116 who used MD simulations to optimize the manually docked structures of several glucocorticoids within a model of the glucocorticoid receptor. 5. FREE ENERGY CALCULATIONS For a docking process to be successful, it is necessary that both the right conformation of the ligand receptor complex is predicted, and that the ranking of final structures is correct. The procedure needs to be able to differentiate among similar conformations of the same system, as well as to predict the relative stability of different complexes. There are several different scoring functions for this purpose (for recent comparisons of scoring functions see ). As most contain empirically fitted parameters, their performance on any particular problem will depend on the set of structures used for the calibration. So far, no scoring function has proven to be reliable for every docking case tested. The main constraint on their improvement rests with the need for speed; when ranking hundreds, if not thousands, of complexes a compromise in accuracy must be made. Knowledge-based functions used in the ranking of molecular interactions may not be general and accurate enough, because of the limited number of interactions that can be inferred from crystal structures and the inadequate description of repulsive forces. MMbased functions, on the other hand, inherit all common problems of molecular mechanics parameters, and recent calculations have shown that they may result in large electrostatic errors Several pilot studies on the use of semi-empirical quantum mechanical methods for a more accurate description of the interactions of proteins with small ligands have been recently published

18 548 ALONSO, BLIZNYUK, AND GREADY Taking account of these factors, the type of scoring functions currently implemented in docking programs cannot be expected to distinguish energetically between close conformations of the same molecule, or even to rank properly a group of ligands of similar activity. Although the combination of several scoring functions into a consensus score has been shown to provide better results, this merely produces a ranking of complexes without offering final energies. While knowledge of the relative stability of different complexes may be an adequate result for an initial screening protocol, estimates of the absolute binding free energy may be necessary in later stages of docking or during lead refinement, when only few selected ligands remain. If stringent rankings or accurate energies are needed, different MD-based calculations can be carried out on the final complexes to estimate the binding free energy. 84, Thermodynamic integration (TI) and free energy perturbation (FEP) are among the most rigorous methods currently available for the calculation of free energies. Despite providing very accurate free energies, they are not widely applied as they are computationally expensive. 136,139,140 The main limitation of these approaches is the exhaustive conformational sampling required to obtained a proper averaged ensemble, and their slow convergence. Inefficiencies in configurational sampling because of the appearance/disappearance of atoms (explained in more detail below) restrict their use to small transformations, and limit analysis to a few closely related compounds. Recently developed approaches that provide relatively good energy values at a moderate cost include MD-based methods such as the linear interaction energy (LIE) method, 130, and the socalled MM-PBSA method. 145 As for previous sections, we review below some literature examples in which simulations were used to calculate binding free energies and provide more technical details in Table 3. A. Free Energy Perturbation Free energy perturbation methods can be applied to predict the relative binding strength of different complexes. The difference in binding free energy between two given ligands L and L 0 and the receptor R is calculated using the following thermodynamic cycle: L þ R G w mut # L 0 þ R G bind ðlþ!! G bind ðl 0 Þ LR # Gp mut L 0 R Instead of calculating the individual binding energies (DG bind ) to determine the relative bindingfree energy (DDG bind ), the energies of the non-physical transformations L! L 0 ðg w mut Þ in solution, and LR! L 0 RðG p mutþ when bound to the protein, are estimated instead using bind ¼ G bind ðl 0 Þ G bind ðlþ ¼G p mut G w mut To effect this, the states L and L 0 are linearly combined using a coupling parameter l, and an MD simulation is used to slowly transform one ligand (L, l ¼ 0) into the other (L 0, l ¼ 1) in both the free and receptor-bound forms. This type of alchemic transformation can be used to determine relative free energies, as the free energy is a state function which can be calculated by any reversible path between the initial and final states. Park and Lee 146 combined homology modeling, docking, and free-energy calculations to optimize the activities of histone deacetylase inhibitors. A series of 12 hydroxamate inhibitors on three different scaffolds were automatically docked onto the protein. The best complexes were energy minimized and submitted to MD simulations for FEP calculations. The final relative energies of the 12 complexes were in good agreement with experimental results. As the chemical

19 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 549 Table 3. Summary of Docking Studies That Made Use of MD Simulations for the Calculation of Accurate Binding Free Energies (Continued )

20 550 ALONSO, BLIZNYUK, AND GREADY Table 3. (Continued ) a FEP, free energy perturbation. b LIE, linear interaction energymethod. c MM-PBSA, molecular mechanics/poisson^boltzmann surface area method.

21 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 551 modifications of the scaffolds directed towards creating stronger enzyme ligand interactions usually resulted in stabilization of the free ligand in solution, it was inferred that modifications needed to be carefully planned to produce a net increase in the inhibitor potency. In a similar approach, the same authors used the FEP method to study the selectivity of different cyclooxygenase-2 inhibitors. 147 In this case, two different cyclooxygenases, COX-2 and COX-1, were used as protein receptor targets for the binding of 10 structurally different inhibitors. MD simulations were used to perform a single mutation of the receptor COX-2 into a close model of COX- 1 by changing a valine residue into an isoleucine. A wide range of structurally different inhibitors could be studied as it was the receptor, and not the ligands, that was involved in the non-physical transformation. The final results were in good agreement with experimentally determined IC 50 values and offered a structural explanation for the selectivity of known COX inhibitors for one of the two isozymes. Luzhkov et al. 148 analyzed the binding of three tetraalkylammonium ions to the KcsA potassium channel. The predicted binding free energies were in good agreement with experimental data; they suggested that the preferred binding of tetraethylammonium over the other two inhibitors originates from the van der Waals interactions and the steric response of the binding site, with only very small electrostatic contributions. One of the most important limitations in free energy calculations is the sampling of the conformational space. 149 Exploration of the appropriate conformations is not guaranteed simply by longer simulations. To avoid convergence problems and inadequate sampling during the simulations, only transformations between similar molecules are feasible, constraining the type of ligands that can be compared. This, together with the computational cost of such approaches, has prevented the wide application of FEP for determining binding free energies, despite its accuracy. B. Linear Interaction Energy Method Aqvist et al. 130 introduced the LIE semi-empirical MD approach for the estimation of binding free energies. 137,150 This method assumes that the binding free energy can be extracted from simulations of the free and bound state of the ligand. The energy is divided into electrostatic and van der Waals components, and the final binding energy is calculated as where Vbound elec Velec free G bind ¼ Vbound elec Velec free þ V vdw bound Vvdw free þ represents the averaged change in electrostatic energy and V vdw bound Vvdw free the averaged change in van der Waals energy in going from an aqueous solution to a protein environment.,, and are empirically determined constants. Two different MD simulations, one for the ligand bound to the protein and another for the free ligand in water, are used to calculate the energies. During the early applications of the LIE approach, only two coefficients, and, were considered. Although, the electrostatic coefficient, appeared to have a constant value of 0.5 for several protein systems, as predicted by the linear response approximation, 130 the van der Waals coefficient,, seemed to adopt various values depending on the characteristics of the protein receptor. 141,142,151,152 Kollman and co-workers 144 suggested that the value of depended on the hydrophobicity of the binding site, and that it could be predicted by calculating the weighted desolvation non-polar ratio (WDNR) of the system. Jorgensen s group extended the method to calculate both the hydration and binding free energy, adding a new term to account for the solvent accessible surface and scaling it by a new empirical coefficient. 134,153,154 It was later found, however, that the non-polar component, although considered zero in many cases, 130,143 could adopt different values 155 and account for the variability earlier assigned to. In a recent study, Aqvist and coworkers 156 performed a systematic analysis of several ligands in complex with P450cam. Using fixed

22 552 ALONSO, BLIZNYUK, AND GREADY values for and, while optimizing, not only provided the best absolute binding free energies for the ligands but also showed that the coefficients of the LIE method are independent of the force field used and that only might need to be optimized to account for the hydrophobicity of the active site. Gutierrez-de-Teran et al. 157 used a two-step approach to analyze the binding modes of different agonists on human A 1 adenosine receptor (ha 1 AR). The natural agonist adenosine and three synthetic derivatives were docked onto a theoretical model of ha 1 AR. As two different binding modes were found for the ligands, binding free energies were calculated using the LIE method. The final energies permitted the selection of one preferred binding mode, which was favored by better interactions between the ligands and the protein. These results suggested that there is a single preferred binding mode for adenosine and its derivatives within ha 1 AR. Osterberg et al. 158 studied the binding of several sertindole analogs, which are strong blockers of the herg K þ channel. The different blockers were docked against a homology model of the open channel. A few highly populated clusters, representing different binding modes, were obtained for most ligands. As the scoring function of the docking program was not able to discriminate between the good and weak binders, representative conformations from the best clusters were submitted to MD simulations. Not only protein flexibility and solvent effects were studied, but also the binding free energies were estimated using the LIE approach. The final relative LIE energies were in excellent agreement with the experimental values, thus validating the model used. Luzhkov et al. 159 studied the binding of the tetraethylammonium ion (TEA) to the KcsA potassium channel. The inhibitor was docked automatically to the crystal structure of the channel; two major binding regions near the intracellular and extracellular entrances were found, in agreement with experiment. The final complexes were grouped and the most stable ones were selected for further analysis by LIE. It was found that binding of TEA depends on the number of Kþ ions within the channel, and that four tyrosine residues at the entrance of the pore form a hydrophobic cage that stabilizes the binding of the inhibitor. The binding free energies obtained in all these cases were in very good agreement with experimental results and the LIE approach seems to be a good alternative to the more expensive FEP calculations. The two main shortcomings of the method are the need for two different MD simulations, one of the complex structure and another for the free ligand in water, and the use of empirically derived constants which may need to be modified for each particular system. These requirements restrict the broad application of the LIE method in docking/scoring procedures. C. Molecular Mechanics/Poisson Boltzmann Surface Area Method The MM/PBSA method 132,160 was introduced by Srinivasan et al. 161 It combines molecular mechanics (MM) and continuum solvent approaches to estimate binding energies. An initial MD simulation in explicit solvent provides a thermally average ensemble of structures. Several snapshots are then processed, removing all water and counterion molecules, and used to calculate the total binding free energy of the system with the equation G bind ¼ G complex ½G protein þ G ligand Š where the average free energy G of the complex, protein, and ligand, are calculated according to the following equations: G ¼ E MM þ G solvation TS E MM ¼ E int þ E elec þ E vdw G solvation ¼ G polar þ G non polar E MM is the average MM energy in the gas phase, calculated for each desolvated snapshot with the same MM potential used during the simulation but with no cut-offs. G solvation, the solvation free

23 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 553 energy, is calculated in two parts, the electrostatic component G polar using a Poisson Boltzmann approach, and a non-polar part using the solvent-accessible surface area (SASA) model. 162 The entropy ðtsþ is the most difficult term to evaluate; it can be estimated by quasi-harmonic analysis of the trajectory or using normal mode analysis. 161,164,165 The entropy change can be omitted if only the relative binding energies of a series of structurally similar compounds is required, but if the absolute energy is important, or if the compounds are notably different, then its contribution to the final free energy cannot be ignored. A recent study by Kuhn et al. 166 suggests that the MM-PBSA function could be used as a post-docking filter during the virtual screening of compounds, as their use of a single relaxed structure provided better results than usual averaging over MD simulation snapshots. However, as the simulation conditions used in this work were not optimal, improved calculations could lead to significantly different conclusions. Although only a single MD simulation of the complex is commonly used to determine the conformational free energy, 145 as the structures for both the free ligand and ligand-free protein molecules are extracted from the simulation for the protein ligand complex, this approach might not be the best. A recent study by Pearlman 167 showed that using a single simulation to generate all structures for a series of complexes of p38 MAP kinase and 16 different ligands provides final results that are significantly worse than those from separate simulations, and that savings achieved in computing time are minimal and do not justify the simplification. Application of the MM-PBSA approach has produced reasonable binding energies for several systems, 161, but not for others. 167 Evaluation of the MM-PBSA method using a series of p38 MAP kinase complexes resulted in very poor results compared with other approaches, and at an appreciably larger computational cost. 167 In a very interesting work, von Langen et al. 171 studied the selectivity of the human glucocorticoid receptor (hgr) both experimentally and theoretically. The experimental relative binding affinity of five steroids with similar carbon skeletons showed that the natural ligand cortisol presents the highest affinity (100%) followed by progesterone (22%), aldosterone (20%), testosterone (1.5%), and estradiol (0.2%). To rationalize the observed selectivity at the molecular level, several different theoretical studies were done. A homology model of the hgr ligand-binding domain was constructed and used as a target to dock the five different steroids. The ranking of the final complexes provided by FlexX 41 was not in agreement with the experimental affinities. All five complexes were submitted to MD simulations to further study their characteristics and stabilities. During the 4 nsec trajectories, it was seen that the complexes of cortisol and aldosterone were the most stable, while those of the other steroids showed an increased mobility of the protein and a collapse or an expansion of the active site. The binding free energy for the different complexes was calculated using the MM/PBSA method. Although the approach could properly discriminate compounds with strong affinity from those with weak binding, it could not correctly rank low-affinity ligands. Further docking of the ligands to an average structure from the MD simulations showed better results than the initial docking to an energy-minimized homology model structure, highlighting the importance of a proper conformation of the protein receptor for docking studies. Altogether, utilization of these computational techniques allowed the authors to understand the selectivity of hgr for cortisol in molecular detail. Although similar steroids may fit within the active site, the interactions they establish with the surrounding protein environment may not be adequate to generate a stable and active receptor conformation. Kollman and co-workers 135 presented a combined approach that implements docking, MD simulations, and MM-PBSA, and used it to predict the binding mode of the inhibitor efavirenz to HIV-1 reverse transcriptase. Initially, they evaluated the capacity of combined MD simulations and MM-PBSA to reproduce binding free energies of 12 crystal structures of HIV-1 RT complexed with different TIBO-like inhibitors. They found that both relative and absolute free energies were correctly predicted with an error of 1.0 kcal/mol. For the docking of efavirenz, five different binding modes were submitted to MD simulation and further processed using the MM-PBSA

24 554 ALONSO, BLIZNYUK, AND GREADY approach. The most stable binding mode was clearly identified, with a binding free energy of 13.2 kcal/mol in good agreement with the experimental value of 11.6 kcal/mol. The final structurewasfoundtobeinverygoodagreement with a crystal structure of the complex, not initially available to the authors. They concluded that molecular docking combined with MD simulations followed by MM-PBSA analysis presented a reasonable approach for modeling protein complexes a priori. Others studies that employed MM-PBSA calculations include the analysis of cathepsin D- inhibitors by Huo et al. 131 and the study of avidin ligands by Kuhn and Kollman. 133 In the latter case, it was found that free energy components for solute entropy were quite variable depending on the snapshots analyzed, and the authors concluded that more accurate methods to predict entropic changes may be required. The MM-PBSA method has been shown to produce accurate free energies at a moderate computational cost. Its main advantages are the lack of adjustable parameters and the option of using a single MD simulation for the complete system to determine all energy values. Nevertheless, this approach does have drawbacks, including the difficulties of predicting the entropic component of the free energy and the fact that the changes in internal energy of the ligand and receptor upon complex formation are neglected, which would produce significant errors in flexible systems where there is an important induced-fit effect. D. Value of MD Simulations After Docking In the previous two sections, we have shown the advantages of applying MD simulations to the final complexes of a docking study. Such simulations can have a dual use; they can refine the final structures and also be used to predict accurate binding free energies. In terms of structure optimization, MD simulations allow flexibility for both the ligand and protein receptor, facilitating the relaxation of the complete system and accounting for induced-fit effects. The effect of solvent molecules can also be treated explicitly; with the incorporation of water molecules in the simulated system, important stabilizing/destabilizing effects and water-mediated interactions can be observed. Furthermore, the time-dependent evolution of the system during the simulation provides a dynamic picture of the complex and helps to discriminate the correctly docked conformations from the unstable ones. With respect to free energy calculations, we have pointed out that scoring functions implemented within docking programs are not sufficiently accurate to identify, in every case, the most stable conformation of a given ligand or drug with the highest binding affinity among a set of compounds. Although library-screening processes require fast and inexpensive scoring functions, more accurate and expensive calculations can be employed in the last stages of a docking process, when only a few possible candidates are left, or during lead optimization. MD-based methods are among the most accurate current techniques available for the calculation of free energies. FEP and the more recent LIE and MM-PBSA approaches have been used successfully to predict both relative and absolute binding free energies of many different complexes with errors of chemical accuracy, that is, 1 2 kcal/mol. 6. MD SIMULATIONS AT DIFFERENT DOCKING STAGES So far we have described various studies in which authors used MD simulations at different stages of the docking process so as to improve the final results. Here we summarize two different works in which MD simulations were used both before and after the actual docking to account for protein receptor flexibility, to optimize the final complexes and to obtain accurate free energies. Technical details of these studies are given in Table 4.

25 DOCKING AND MD SIMULATIONS IN DRUG DESIGN 555 Table 4. Summary of Docking Studies That Made Use of MD Simulations During Most Stages of the Docking Process a MM-PBSA, molecular mechanics/poisson^boltzmannsurfaceareamethod.

Structural Bioinformatics (C3210) Molecular Docking

Structural Bioinformatics (C3210) Molecular Docking Molecular Recognition, Molecular Docking Molecular recognition is the ability of biomolecules to recognize other biomolecules and selectively interact