Schrödinger Workshop 2012 Structure Based Virtual Screening -Various Approaches Jas Bhachoo Senior Applications Scientist Structure Based Virtual Screening Virtual screening is a cost-effective early stage lead generation method How do we design a successful structure based virtual screening campaign? 1
The road to success : What we have found Careful protein preparation Know your target Careful ligand preparation Enumerate states and conformations Pilot screening Know the best combination of constraints and scores Screening Post-screening processing Purchase and assay Questions we will be Asking Today Do we have a crystal structure or an homology model? Is the target drug-able? What is the quality of the model - electron density? Understanding the binding site: big small buried pockets general-properties? Are there known ligands for this target, and how can I use that information? How to screen compounds How to improve the quality of screened outputs? How to filter and cluster the output from screen to manageable numbers for synthesis or purchase 2
Putting everything together Tailored Protocols for XYZ Protein v Databases 1. CACDB2010 lead/drug-like set. 2. Phase mining, with multiple hypotheses, of CACDB phase database using ABCDE as queries (shape similarity > 0.6 or top 0.5%). * 3. Fingerprint-based similarity search of whole CACDB using HTS hits as queries (Tanimoto>= 0.6 or top 0.5%). * v ABCDE pocket of XYZ 1. Receptor of ABCDE site with HB to Res123 (-C=O and NH) and/or Res 122 (-NH) on chain A. 2. Receptor of ABCDE site with HB to Res123 (-C=O and NH) and/or Res 122 (-NH) on chain B. 3. Receptor of ABCDE site with HB to Res123 (-C=O and NH) on both chain A and B v Glide HTVS Two conformations of ligandsfor SP docking Glide shows dependency on input conformations (25 % top scoring) v Glide SP* Three conformations of ligandsfor XP docking ConfGen/MM/multiple FFs (15 % top scoring) *: Combined structures taken directly to XP v Glide XP* (post-processing of ensemble) 1. Mine for poses with desirable interactions, i.e., hydrogen bonding. 2. Rescore with Epik state and strain penalties 3. Take top scoring 5000 to next step for visualization 4. Select molecules for sourcing with help of clustering tools Target structure preparation, prediction and characterization 3
Problems with PDB structures XP GlideScore = -6.9 kcal/mol Lys58 rotamer χ 1 = -73.5 χ 2 = +179.2 χ 3 = +154.2 χ 4 = +85.4 Extremely rare Lys rotamer Crystallographic refinement (PrimeX) XP GlideScore = -8.7 kcal/mol Lys58 rotamer χ 1 = -69.8 χ 2 = -178.4 χ 3 = +178.2 χ 4 = -179.9 Most common Lys rotamer 4
Challenges in homology modeling Accurate alignment High sequence identity is straight forward and can produce high quality structures Low sequence identity requires assistance and manual editing Depends on human intervention, experimental data Model refinement Side chain conformation Loop conformation Binding site conformation Depends strongly on the quality of the force field High Resolution Protein Structure Prediction: Comparative Modeling Query sequence Blast/PSI-Blast Align Template(s), PSI-Blast Profile (PSSM), Query Secondary Structure Predictions, Multiple Structure Alignment Profile Query/Template Alignment Build Refine Homology model Refined protein structure Validate Protein Report 5
Alignments of GPCRs: Example Sequence alignment between human β 2 -adrenergic and human Melaninconcentrating hormone-1 sequences using the Align GPCR mode Align GPCR mode correctly aligned all helices, including the challenging helix 5, without any user intervention. All gaps in the alignment are located in the intracellular and extracellular loop regions, and not in TM regions Representation of loops - Variable Dielectric Constants G es 1 qq 1 1 1 = ( ) 2 r ε 2 ε ε qq i j i j i< j ij in( ij) in( ij) sol ij GB ε = max( ε, ε ) in( ij) in() i in( j) f continuum solvent ε=80 ε=2 ε=1 ε=4 Reparametrization of internal dielectric constants for charged side chains: Lys = 4 Glu = 3 Asp = 2 Arg = 2 His = 2 Others = 1 6
Improvement of solvation model impacts on loop prediction accuracy Number of Cases Uniform Dielectric Variable Dielectric RMSD (Å) Uniform + Hydrophobic Variable + Hydrophobic 6 residue 99 0.48 0.40 0.46 0.41 8 residue 65 0.84 0.79 0.76 0.74 10 residue 41 1.27 0.73 1.05 0.76 13 residue 35 2.73 1.62 1.29 1.08 Tips and tricks Induced fit docking to generate bioactive conformations Prime side chain predictions, Macromodel side chain conformation search and hand tweaks if needed Molecular dynamics simulations to measure the stability of your protein model 7
Screening 1 LigandBased Methods Putting everything together Tailored Protocols for XYZ Protein v Databases 1. Screening_Inital_Data.SDF 2. Canvas > Filter using Properties > Filtered Set for 3D Screeing.mae 3. Convert to 3d with chemical enumeration 4. Prepare the protein chemically accurate and minimised 5. Characterise the protein 6. Dock xtaly bound ligandfor initial test 7. Dock 3d prepared ligand dataset v Glide HTVS v ABCDE pocket of XYZ 1. Create e-pharmuse for ligandbased screening of FXA_db 2. Shape based searching using xtal ligand> searching FX_db (25 % top scoring) v Glide SP* (15 % top scoring) v Glide XP* (post-processing of ensemble) 1. Mine for poses with desirable interactions, i.e., hydrogen bonding. 2. Rescore with Epik state and strain penalties 3. Take top scoring 5000 to next step for visualization 4. Select molecules for sourcing with help of clustering tools 8
Ligand preparation 2D Perspective Smart filtering of your screening deck for optimal druglike and leadlike properties (REOS, Ligparse, Ligfilter) 3D Perspective Chemically accurate ligand structures (tautomers, ionization states, stereoisomers..) Multiple input ligand conformations Confgen and MacroModel Estimate state penalties Fingerprints in Canvas Fingerprints are defined by fragments used Rules of making fragment are different how you define the path through the molecule, linearly or via torsions, or through pre-defined libraries like MACCS Available MACCS and Custom MACCS & Custom are fragment based More specific Torsion Pairwise Triplet Linear (default) Daylight Dendritic Molprint2D Radial Circular SciTegic Others are based on topology (exhaustive) 9
High Performance Chemical Spreadsheet Canvas 1.1.008 /home/dixon/project1.cnv (Master view) File Edit View Insert Chemistry 1 2 3 4 5 Structure 7-Methyltestosterone OH O Chlorpheniramine N Bupropion O Cl N Phenylpropanolamine OH NH 2 Guaifenesin O O H N OH Cl OH MW 302.4599 318.8659 239.7469 151.2099 198.2219 Data Query Applications Maestro Help ALogP98 3.4784 4.7399 3.4500 1.3586 0.7471 H-bond Acceptors 2 3 3 2 4 H-bond Donors 3 4 1 0 1 2 2 5 6 5 1 Project1 Files + Bayes Clustering PDR_LinFP PDR_RadFP MLR CCR4 Estrogen_a Estrogen_b + RP Charts LogP_HBA Rot_bonds Structures/properties retrieved from SQLite database as needed Scroll smoothly through 10 6 compounds, hundreds of columns Create custom views from sorting, filtering, and chart selections 3 4 6 Fingerprints in Canvas Linear Dendritic Radial MOLPRINT2D Pairwise Triplet Torsion MACCS keys 10
Practical 1 Fast Screening Using 2D Approaches (.../Cheminformatics) Create a Canvas project and import FXA_all_initial_data.sdf / *ligprep.out Note you may want to start from different points 2D filtering > 3D preparation > 2D Filtering... 3D preparation > Shape filtering > 2D Filtering... Generate molecular properties Applications -> Molecular properties Incorporate the results Filter by properties using Data -> Property Filter Scatter plots Similarity Searches if you have data on known ligands Clustering data with different Clustering methods Path dependent fingerprint methods Linear fingerprint codes for all linear path up to 7 bonds codes up to 14 bonds for ring closures Dendritic fingerprint codes for branches up to 5 bonds 11
Circular fingerprint methods Radial fingerprint (Extended Connectivity fingerprint) generated by fragmenting a structure into pieces that grow radially from each heavy atom over a series of iterations (4 by default) Each atom identified by its atom type and connecting bond types. MOLPRINT2D fingerprint each heavy atom in a structure is characterized by an environment that consists of all other heavy atoms within a distance of two bonds Pairwise, Triplet and Torsion fingerprints Pairwise fingerprint two atom types and the distance separating them: Type i -Type j -d ij. Triplet fingerprint three atoms and the distances separating them Torsion fingerprint every fragment consists of a linear path of four atoms that are differentiated by type 12
What are the recommendations? There is no single best setting for all targets and query molecules. Pairwise and Triplet methods exhibit size dependency. Without prior knowledge about the performance of fingerprint methods for a target, the best choice can be MOLPRINT2D, Dendritic Fingerprint combination Fingerprint averaging: modal fingerprint More specific atom typing schemes (Daylight, Mol2, Carhart) are best but probably less suited for lead hopping Practical 2 Preparing 3D Ligands for 3D Screening (.../Ligand preparation) In Maestro import a simple example of starting ligands Import 2D_variations.sdf In the first structure note, it has two ionisable groups, an ammonium counter ion and there are three chiral centres (two marked) Run LigPrep Default options Start and Append new entries as a new group Observe results in Maestro Tile and label the structures to see them individually In the first structure note, carboxylate is unprotonated, pyridine is both protonated and unprotonated, the variety of R/S chiralities In Maestro import FXA_ligprep-out.mae Only first few ligands using the Advanced options. We do not need to see the entire file as it is very large. 13
Conformer Generation Lets see the White Paper Screening 2 Structure Based Methods 14
Protein preparation wizard prepare and repair PDB structures Cleaning up raw PDB files Assign bond order Add hydrogen atoms Delete unwanted part of the system Optimize the hydrogen bond networks (flip of residues like ASN, GLN, tautomer determination: HIE, HID or protonation state HIP ) Remove putative clashes in your structure (ideally with diffraction data) Missing information Important side-chains are missing Important loops are missing Practical 3 (.../Virtual Screening) Preparing a PDB Structure for Virtual Screening Download 1FJS structures in Protein preparation wizard. Extra: go to EDS (http://eds.bmc.uu.se/eds/) and download the CNS format map (2mFo-DFc) for 1FJS; Examine the electron density Notes on electron density: do the residues ligand protein water sit in the electron density or is there an anomoloy?... Prepare 1FJS Set up Glide grids (Applications -> Glide -> Receptor grid generation) 15
Characterize the binding pocket - Sitemap Early stage analysis tool Summarises key parts of the protein structure Find potential binding sites Characterize known binding sites Is the binding site drug-able? Potential binding sites characterized by Hydrophobic, hydrophilic, hbond donor/acceptor isosurfaces, volume etc Scoring used to determine drug-ability Site points can be used to define Glide grids (treat as ligand entry) Validated for site druggability Halgren, T., "New Method for Fast and Accurate Binding-site Identification and Analysis", Chem. Biol. Drug Des., 2007, 69, 146 148. Halgren, T., "Identifying and Characterizing Binding Sites and Assessing Druggability, J. Chem. Inf. Model., 2009, 49, 377 389. SiteMap Feature Detection Thrombin (1ett) 16
Druggability Dataset Druggable Prodrug/transporUndruggable ACE-1 Acetylcholinesterase Cathepsin K Aldose reductase Thrombin PTP-1B cabl kinase Neuraminidase Caspase 1 (ICE-1) CDK2 IMPDH HIV integrase Cyclooxygenase 2 Penicillin binding protein DNA gyrase B HIV RT (nucleotide site) EGFR kinase Enoyl reductase Factor X Fungal Cyp51 HIV RT (NNRTI site) HIV-1 Protease HMG CoA reductase MDM2 P38 kinase PDE 4D PDE 5A Thrombin -diverse set of pharmaceutically relevant targets -widely used in benchmark studies to study the properties of binding sites SiteMap Druggability Results MAP POD is from Cheng et al. Undruggable Difficult 17
Practical 4 (.../Virtual Screening) Property Mapping the Xtal Structure Run Sitemap for 1FJS Try Evaluate... Tasks. Analyse the results. Where are the hydrophobic areas and polar areas? * Is the target druggable? * See next slide Rule of thumb * Things to look out for in SiteMap The active site of factor Xa is divided into four sub pockets as S1, S2, S3 and S4. The S1 subpocket determines the major component of selectivity and binding. The S2 sub-pocket is small, shallow and not well defined. It merges with the S4 subpocket. The S3 sub-pocket is located on the rim of the S1 pocket and is quite exposed to solvent. The S4 sub-pocket has 3 ligand binding domains, namely the hydrophobic box, the cationic hole and the water site. Factor Xa inhibitors generally bind in an L-shaped conformation,where one group of the ligand occupies the anionic S1 pocket lined by residues Asp189, Ser195, and Tyr228, and anothergroup of the ligand occupies the aromatic S4 pocket lined by residues Tyr99, Phe174, and Trp228. Typically, a fairly rigid linker group bridges these two interaction sites. 18
Tips and tricks Make sure your structure is chemically accurate! Use multiple structures, even different chains from same PDB Induced fit docking to generate bioactive conformations Prime side chain predictions, Macromodel side chain conformation search and hand tweaks if needed Molecular dynamics simulations to understand flexibility of the structure Molecular dynamics simulations to measure the stability of your protein model Pre-screen Pilot screens to evaluate performance of different combinations of constraints (EFs; GlideScores; Chemical matter eyeballing for a motif ). Check if co-crystallised ligand can be docked with Standard Docking. If large RMS to native. Find out why: Are there any crystal mates? Constraints, multiple protein input conformations (see VSW GUI for Glide), QPLD Screen your database! 19
Glide Overview The conformations then enter the Glide Filter This is a series of hierarchical filters which are used to rapidly eliminate poses of the ligand which cannot correspond to a welldocked solution. Protein & ligand preparation Calculate Coulomb & vdw grids Docking algorithm 3 Modes: HTVS 3-5 secs/lig SP 30-50secs/lig XP 3-5mins/lig Ligandconformations Site-point Search Diameter Test/Subset Test Greedy Score Refinement Minimisation core O H N N sidechain group S O - sidechain group center diameter O Final Score Top hits Additional Information: Glide Implementation Details The value of GlideScore is determined as follows: 0.065E GlideScore = E coul + 0.130E Metal + P BuryP + P + E + E + Site The P BuryP is a penalty term for burying polar functionality in a hydrophobic environment. The P RotB is a penalty term for freezing rotatable bonds. The Site term rewards polar, but non-hydrogen bonding interactions in the site. vdw RotB Lipo HBond + 20
Glide SP: Enrichment Summary Average enrichment in recovering actives in top 1% of decoys (Average of 65 systems) EF(1%) Glide: A New Approach for Rapid, Accurate Docking and Scoring.1. Method and Assessment of Docking Accuracy. R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J. Klicic, D. T. Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K. Perry, D. E. Shaw, P. Francis, and P. S. Shenkin. J. Med. Chem. 2004, 47,1739-1749 Glide: A New Approach for Rapid, Accurate Docking and Scoring.2. Enrichment Factors in Database Screening. T. A. Halgren, R. B. Murphy, R. A. Friesner, H. S. Beard, L. L. Frye, W. T. Pollard, and J. L. Banks. J. Med. Chem. 2004, 47, 1750-1759 Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes. Friesner,R.A.; Murphy, R. B.; Repasky,M. P.; Frye, L. L.; Greenwood, J. R.; Halgren,T. A.; Sanschagrin, P. C.; Mainz, D. T., J. Med. Chem., 2006, 49, 6177 6196. Comparative Performance of Several Flexible Docking Programs and Scoring Functions: Enrichment Studies for a Diverse Set of Pharmaceutically Relevant Targets. Zhou, Z.; Felts, A. K.; Friesner, R. A.; Levy, R. M., J. Chem. Inf. Model., 2007, 47, 1599 1608. Practical 5 Virtual Screening: Docking and Visualising Poses Generate the Glide Grid for 1fjs Use the fully prepared protein and co-crystralised ligand as starting point for Glide > Receptor Grid Generation Define the ligand inside the Grid panel Start the job 1fjs-grid.zip is the output Dock the 1fjs ligand using this Grid file In Glide > Ligand Docking, Settings tab > browse for the 1fjs grid file, choose SP mode In Ligands tab > choose selected entry and ensure the 1fjs ligand only is highlited in the Project Table Start the job Selfdock-1fjs-sp-pv.mae is the output Use Maestro to view the result(s)... Overlay Sitemap result! Glide Virtual Screening Workflow is powerful interface for setting up a series of screens, especially for an ensemble of receptors 21
Enjoy Lunch Thanks for you attention so far!!. What do I do with 1000s hits? Pose Filter based on known interactions (Script menu) Filter poses based on pharmacophore (Phase) 22
More Post-processing Filter based on Strain Rescore (Script menu) Calculating Prime MM-GBSA or Macromodel Embrace Selection of Hits If resources are limited, only the most diverse ligands are selected for experimental measurements Practical: Run this script on VSW results output (96 XP ligands) Use fingerprints to cluster hits and choose cluster representatives 23
Structural Interaction Fingerprints Original publication: Chuaqui et al., J. Med. Chem. 47 (2005) 121-133 (Biogen) Basic algorithm Begin with pre-docked poses For each ligand, generate fingerprint based on types of contact with the receptor Use the fingerprints for filtering, similarity searching, and clustering Structural Interaction Fingerprints compound 1 compound 2 compound 3 compound n psift Residue1 Residue 2 Residue N 1 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 0 0.8 0.6 0 0.1 0.2 0.5 0 1 0.1 0.2 0.4 0.1 0.1 0.2 0.3 0.1 0.2 0 0.8 0.6 0 0.1 0.2 0.5 0 1 0.1 bit1 = contact bit2 = backbone bit3 = side chain bit4 = polar bit5 = hydrophobic bit6 = HB acceptor bit7 = HB donor bit8 = aromatic (addition to paper) bit9 = charge (addition to paper) 24
Structural Interaction Fingerprints (Demo or Practical) Calculate Fingerprints Visualize contacts with Interaction Matrix Interactive Analysis of Contacts Picking in the matrix displays the residue and the interaction 25
Interactive Distance Matrix Visual Inspection The different filtering methods reduce the number of hit The final selection should be done by human visual inspection of the protein ligand interactions. Possible selection criterias: Does the ligand have strange docked conformations? Does the protein ligand interactions fit SiteMap results? Does the ligand have good ligand efficiencies? 26
Post-screen Pose filter (Scripts -> Docking post-processing-> Pose filter) Filter based on Interaction fingerprints, ligand strain energy, ligand efficiency Reserve slots for compounds with somewhat lower GlideScores that came via ligand based methods (show high similiarity using techniques we ve seen lately) Visualize 5-10x the number of compounds to be wet-screened by several people; balance leadlike vs. druglike; tabulate votes Cluster results by chemotype (e.g. spectral clustering) and only order up to 2-4 best examples from each cluster for testing Practical 6 Post-Docking Analysis; SIFTs and Clustering Use the Workspace Style toolbar to visualize the docked poses Analyse the interaction pattern Scripts->Cheminformatics->Interaction fingerprints Cluster by chemotype Scripts->Cheminformatics-> clustering 27
Glide XP Visualiser Practical 7 Understanding Extra Precision Docking, XPVisuliaser Do Glide Score in Place with the co-crystalized ligand IFJS using XP and toggle on Write XP descriptor information Examine the XP descriptors in Applications-> Glide -> XP Visualizer (read in your *xpdes file) Use Help... To understand the terms in the scoring function... Leads to Practical 4b 28
Structure based pharmacophore Glide XP Fragment Docking Top Hits Neuraminidase 29
Which are the important features? Which are the important features? -2.2-1.2-0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0-2.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0-0.3-1.2 30
Optimal Site Selection -2.2-1.2-2.4-1.2 Methods Protein PDB structure preparation Fragments ionization/tautomer states generated Glide XP modified settings Increase number of poses generated for initial docking stage Wider scoring window for filtration on initial poses Increase number of poses per ligand for energy minimization Write XP Descriptor Information Writes atom-level energy terms H-bond Electrostatic and vdw Hydrophobic enclosure π-π and π-cation 31
Practical 8 Generating a Structure Based Pharmacophore for Screening... Using the 1fjs-xp-pv.mae as the pose-viewer file, generate e-pharmacophore by using Scripts->Post-docking processing- > e-pharmacophore (< 1 min) Single ligand option Create hypothesis Search the Phase database using Applications -> Phase -> Find matches to hypothesis (< 1 min for search) Database: FXA_db_phasedb Choose hypothesis in workspace (or selected entry) Use existing conformations View results in Maestro Fitness is the output column Use right click fix on highlited row to fix the original pharmacophore in the workspace. Arrow-through results. Use all possible known information Use all available information about target and its ligand preferences 2D similarity Pharmacophore Shape based 32
Do still have some energy?? Practical 9 (if you have energy!) Shape Based Searching Use the VDW shape of a ligand to search for molecules of a similar shape Use Shape query from workspace Generate conformations during search Options reveal more stringency in addition to shape Shape sim is the output column View results in Maestro as before 33
Putting everything together Tailored Protocols for XYZ Protein v Databases v ABCDE pocket of XYZ 1. CACDB2010 lead/drug-like set. 2. Phase mining, with multiple hypotheses, of CACDB phase database using half and whole of ABCDE as queries (shape similarity > 0.6 or top 0.5%). * 3. Fingerprint-based similarity search of whole CACDB using HTS hits as queries (Tanimoto>= 0.6 or top 0.5%). * 1. Receptor of ABCDE site with HB to Res123 (-C=O and NH) and/or Res 122 (-NH) on chain A. 2. Receptor of ABCDE site with HB to Res123 (-C=O and NH) and/or Res 122 (-NH) on chain B. 3. Receptor of ABCDE site with HB to Res123 (-C=O and NH) on both chain A and B v Glide HTVS (25 % top scoring) v Glide SP* * Two conformations of ligandsfor XP docking *: Combined structures taken directly to XP v Glide XP* (15 % top scoring) * Three conformations of ligandsfor XP docking (post-processing of ensemble) 1. Mine for poses with desirable interactions, i.e., hydrogen bonding. 2. Rescore with Epik state and strain penalties 3. Take top scoring 5000 to next step for visualization 4. Select molecules for sourcing with help of clustering tools Summary Virtual screening needs careful planning and preparation Post-process the results using different tools and re-score, rerank Products and tools that have been discussed today: PrimeX, Prime, Macromodel, Sitemap, Glide, Epik, Canvas, Phase, Prime MM-GBSA, Interaction fingerprint, Spectral clustering, Strain rescore, Pose filter, e-pharmacophore 34
Thanks for your attention! Enjoy your next session We d appreciate your online feedback J 35