Softwares for Molecular Docking Lokesh P. Tripathi NCBS 17 December 2007
Molecular Docking Attempt to predict structures of an intermolecular complex between two or more molecules Receptor-ligand (or drug) Enzyme-substrate Protein-DNA (or RNA) Protein-protein
Brief History of Docking Crick (1953) suggested that complementarity in helical coils could be modelled as knobs fitting into holes DOCK (Kuntz, 1982) pioneered the field of molecular docking GRID (Goodford, 1985) too became a part of many subsequent softwares
General considerations Molecular representations Abstract or atoms Fixed or flexible Juxtaposition of molecules Interactive or automated Search algorithm to create conformations Evaluating complementarity (ranking) Scoring function Force field energy functions
Search Algorithms Potentially several ways of putting two molecules together; possibilities increase exponentially with size of molecules involved Attempt to locate the most stable state in the energy landscape Broadly two types: 1) full solution space search; 2) guided search through solution space
Search Algorithms Random Genetic algorithms Monte Carlo methods Tabu search Systematic Fragment-based methods Point complementary methods Distance geometry methods Database Simulation Molecular dynamics Energy minimisation Multiple methods Algorithms
Docking Softwares Virtual screening AutoDock DOCK FlexX/E SLIDE Surflex ICM GOLD De novo design LUDI GRID MCSS SMoG GrowMol SPROUT
Random methods Sample the conformation space by making single change to a ligand or a population of ligands Alteration performed at each step and accepted or rejected based on a predetermined probability function Include Monte Carlo (MC) methods; Genetic Algorithm (GA) methods; Tabu search methods
Monte Carlo methods Use a simple energy function Makes random moves and accepting or rejecting based on Boltzmann probability function More efficient in stepping over energy barriers, allowing more complete searches of conformation space PRODOCK, MC-DOCK, ICM, DockVision, QXP, GLIDE; too slow for extensive flexible docking
Energy global minimum conformers generated by Monte Carlo method
Genetic Algorithm methods Apply ideas of genetics and evolution in docking Start with an initial population of random ligand conformers wrt protein, each defined by a set of variables called genes Genetic operators (mutations, crossovers) applied to sample conformation space till optimal population is derived AUTODOCK, GOLD, DIVALI, DARWIN; too slow for extensive flexible docking
Autodock Suite of automated docking tools Designed to predict how small molecules (ligands drug candidates) bind to a receptor; AMBER force field Three constituent programs -Autotors- define torsions in the ligand -Autogrid- calculate grids -Autodock- docking tool -AutoDockTools (ADT)- GUI to facilitate above and other modules accompanying AutoDock
Autodock Lamarckian GA LGA encompasses a genotypic and phenotypic phase i.e. genetic operations and energy function to be optimised Energy minimisation performed after genotypic changes and these phenotypic changes mapped back onto genes (by changing ligand coordinates. Most efficient and reliable of random methods
Autodock Grid maps Pre-calculated Grid for each atom type (e.g. C, H, O, N) Consists of 3D lattice of regularly spaced points, surrounding and centered on region of interest in the macromolecule Typical spacing is 0.375 Å Probe atom placed at each grid point and energy calculated
GOLD Genetic Optimisation and Ligand Docking, uses multiple subpopulations of ligand Force-field based scoring function, includes three terms: H-bonding term, intermolecular dispersion potential, intramolecular potential 71% success in identifying experimental binding mode in 100 protein complexes
Tabu Search methods Impose restrictions preventing searches from repeating already explored conformations New conformation is compared to the previous ones based on RMSD values which determine acceptance PRO-LEADS
Systematic Search methods Attempt to explore all degrees of freedom in a molecule Can be divided into three types: conformational search methods, fragmentation methods, and database methods
Conformational Search methods Brute force or shotgun methods of docking All rotatable bonds in ligand rotated through 360 till in fixed increments till all possible combinations generated and evaluated Number of structures generated increases exponentially with number of rotatable bondscombinatorial explosion
Fragmentation Search methods Incrementally grow ligand into the active site, by docking several fragments into the active site followed by covalent-linking to recreate the initial ligand Rigid core-fragment of the ligand is docked first followed by addition of flexible regions DOCK, FlexX, LUDI, ADAM, Hammerhead
DOCK Methodology
FlexX Base fragment is picked up and docked using pose-clustering algorithm Clustering algorithm is implemented to merge similar ligand transformations into active site Flexible fragments are added incrementally using MIMUMBA and evaluated using overlap function, followed by energy calculations till the ligand is completely built Final evaluation through Böhm s scoring function that includes H-bonds, ionic, aromatic and lipophilic terms
Database methods Tackle combinatorial explosion by using libraries of pregenerated conformations to deal with ligand flexibility FLOG generates and docks conformational libraries called Flexibases using distance geometry EUDOC uses conformational searches of ligands to generate different structures, which are placed into receptor active-site followed by energy evaluation
Scoring Essential to rank the ligand conformations determined by the search algorithms Scoring function must be able to distinguish between true binding modes and others Speed and accuracy are most desirable Three major classes: force-field based; empirical; knowledge-based
GoldScore, G-SCORE, D-SCORE, AMBER, CHARRM, GROMOS Force-field based Scoring Quantify sum of two energies-interaction energy between receptor-ligand; internal energy of the ligand Consist of van der Waals (Lennard-Jones potential) + electrostatic energy terms (Coulombic function) Do not include solvation and entropic terms
Empirical Scoring Designed to reproduce experimental data; binding energy can be approximated by sum of individual uncorrelated terms Experimentally determined binding energies used to quantify individual terms Easy computation, but non-versatile due to dependence on experimental datasets ChemScore, Böhm s scoring function, F-Score, X-Score
Knowledge-based Scoring Statistically derived principles that aim to replicate experimentally determined structures Employ simple interactions to screen large databases Dependent on information available in preexisting datasets DrugScore, SMoG score, Potential of Mean force (PMF)
Consensus Scoring Combines information from different scoring schemes to compensate for individual limitations Correlation of individual scoring systems may be a problem X-SCORE combines functions from PMF, ChemScore, PMF with FlexX
Protein-protein Docking Prediction of protein complex structure given individual components structures Huge number of degrees of freedom; docking largely performed as rigid body docking Z-DOCK, a Fast Fourier Transform-based rigid body docking program, is one of the most accurate programs as rated in Critical Assessment of Predicted Interactions (CAPRI)
Docking- strengths and limitations Most available softwares are able to predict known protein-bound conformations with an accuracy of 1.5-2 Å; 70-80% success rate Scoring function- major limitation factor due to simplifications and assumptions Solvation effects, quality of crystallographic data
Comparing Docking softwares in difficult Several studies compare docking programs but conclusions of general applicability are not evident Minor differences in methodology can have significant impact on success rates of various docking programs Cole et al., 2005 PROTEINS 60, 325-332 provide a list of recommendations in assessing docking programs
Docking softwares representations in citations
Docking Softwares- Citations per year
Challenges Predicting structures of multi-domain, multisubunit protein complexes Prediction and specificity in protein-nucleic acid interactions Protein-docking with backbone flexibility