The PhilOEsophy There are only two fundamental molecular descriptors
Where can we use shape? Virtual screening More effective than 2D Lead-hopping Shape analogues are not graph analogues Molecular alignment No requirement for (manual) atom matching Pose generation/prediction Matching a binding site Matching a bound ligand
Where can we use electrostatics? Lead-hopping Electrostatic analogues are not graph analogues Solvent treatment Continuum and semi-continuum
Virtual Screening Protein Preparation Compound Collection Database Preparation Screening Database Structure-based 3D & 2D Ligand-based Ligand Preparation Hybrid & Consensus
Using a protein structure Pose v. protein FRED Score v. protein FRED/SZYBKI Score v. ligand ROCS/EON Pose v. ligand ROCS HYBRID (VS) & POSIT (posing)
Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring
OMEGA: Would you like to? Generate high quality conformer ensembles rapidly. Store large ensembles in very compact databases for rapid searching. Calculate useful conformer energetics in a variety of environments.
ROCS: Would you like to? Efficiently align molecules by shape and chemical features. Rapidly screen large databases for non-obvious actives. Obtain informative overlays between active and untested compounds.
The ROCS GUI: vrocs Generate custom queries
FRED: Would you like to? Perform structure-based VS rapidly. Identify binding mode(s) of molecules in an active site. Utilize more information to achieve better results. HYBRID
POSIT: Would you like to? Produce good quality predictions of ligand poses with very high frequency. Accurately estimate the probability that a predicted pose is accurate. Automatically determine the best protein structure from a set to pose a molecule against.
Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring
OMEGA: conformation generation OMEGA Knowledge-based conformation generation Virtual screening Crystal structure reproduction Ensemble properties
OMEGA: The best validated conformer generator Carefully selected crystallographic structures PDB and CSD Multiple measures of success Closeness and coverage Rigorous statistical analysis DOI: 10.1021/ci100031x
OMEGA: The process Input molecule (1D, 2D, 3D) Find fragments 3D Fragment library Built-in or custom Assemble fragments -> 3D structure Torsion driving Complete conformer ensemble Torsion library Built-in & extensible Knowledge Base Pruned conformer ensemble
Size (MB) The file size problem SD/MOL2 files too large to store large numbers 14000 of molecules or conformers 12000 10000 OpenEye binary (OEB) much smaller 8000 10x or more 6000 4000 Can we do better? 2000 File size for 22 million conformers How is this done? 0 MOL2 SDF OEB ROC-OEB File Format
Rotor-offset compression (ROC) Speeds up downstream tools 10-15% Store one set of coordinates. All other conformers defined by torsion angles.
RMSD OMEGA: accuracy on a carefully chosen dataset 2.5 2 Mean RMSD: 0.67Å (0.655, 0.688) Median RMSD: 0.53Å 1.5 1 0.5 0 0 50 100 150 200 Count J. Chem. Inf. Model., 50, 572 (2010).
OMEGA: relative accuracy 100 75 50 MOE/Import Catalyst/BEST ConfGen/CompMin ConfGen/CombMin OMEGA2 25 0 <0.5 <1 <1.5 <2 Watts et al. J. Chem. Inf. Model. 50, 534 (2010)
OMEGA: speed 150 120 90 60 NumConfs Time(s) 30 0 MOE/Import Catalyst/BEST ConfGen/CompMin ConfGen/CombMin OMEGA2 Average OMEGA time = 2.7 secs/molecule J. Chem. Inf. Model. 50, 822 (2010)
OMEGA Summary Speed: 0.5-2 molecules/sec Fastest of all commercial applications Quality: Excellent reproduction of X-ray poses Best overall at highly precise reproduction (< 0.5Å) Flexibility in generation of conformers Focus/diversity of conformer sets can easily be controlled In vacuo, solution, protein-bound
Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring
The Shape of Ligand-based Design ROCS ROCS compares molecules by shape & chemistry Rigid overlay of a query conformer(s) with a set of conformers of database molecules Scoring by shape similarity and chemical (color) similarity (in 3D)
Per cent actives ROCS: Shape overlay and scoring Effective virtual screening 100 75 50 25 ROCS 0 0 5 Per 10 cent 15 screened 20 25 30 35 Identify shared features Molecular alignment Leadhopping
Shape similarity & graph similarity are not the same CDK2 inhibitors 10 nm 10-32 nm ROCS (shape) sim = 0.90 Fingerprint (2D) sim = 0.40
ROCS: Overlays + Scores Shape Tanimoto = 0.90 Color Tanimoto = 0.17 TanimotoCombo = 1.07
VS Comparison from Merck Virtual screening on 11 targets CA, CDK2, COX-2, DHFR, ER, HIV-PR, HIV-RT, NA, PTP- 1B, thrombin, TS Structure-based and ligand-based compared ROCS and docking Same X-ray structures; ligand as query McGaughey et al., J. Chem. Inf. Model., 2007, 47, 1504.
E (1%) ROCS is better than docking VS by Merck 30 25 20 15 10 5 Mean StdDev Median 0 GLIDE ROCS Application
Conclusion Extensive Merck study shows that ROCS is the best overall VS tool available Fast Reliability High hit rate Diverse hit structures Merck no longer uses docking for VS
VS against GPCRs Evers et al., J. Med. Chem., 2005, 48, 5448. 5HT2A, A1A, D2, M1 Various 3D techniques Docking to homology models Gold, FlexX-Pharm Ligand-based methods Catalyst, FlexS Compare to ROCS
Enrichment Mean of Results 20 15 10 5 GOLD FlexX-Pharm Catalyst FlexS ROCS 2D_MACCS 0 1% 5% 10% Per cent screened FlexS, ROCS: 1 query molecule, 1 computed conformation Catalyst: 15-20 query molecules -> 1 pharmacophore
ROCS Summary Powerful VS application Frequently outperforms docking Success does NOT require a bioactive conformation for the query Only low database conformational sampling required 25-50 confs/molecule Fast Up to 40 molecules/second 1000-2000 conformers/second
The ROCS GUI: vrocs Generate queries from molecules Customized queries Multi-molecule queries
Why vrocs? Enhanced Virtual Screening Active Compound(s) Query Creation vrocs Query Editing Query Validation ROCS
Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring
How does FRED work? Build/customize receptor model GUI fred_receptor Input conformer database Optimized with OMEGA Exhaustive posing Structure-based & ligand-based scoring FRED Consensus pose selection
Global Exhaustive Search Systematic Rotations Systematic Translations Poses for scoring X Filtering of clashing poses
Scoring Operate on best poses from Exhaustive Search Protein-based scoring PLP, ChemScore, ScreenScore ChemGauss3, ShapeGauss PB (electrostatic interactions) Ligand-based scoring CGO Consensus
Fraction FRED: Self Docking Results 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 RMSD (Å) Top Scoring Pose Top 5 Top 10 Top 20 This is a completely irrelevant problem.
Cross-docking is difficult Average self-docking success Average cross-docking success J. Chem. Inf. Model. 50, 1432 (2010)
VS Comparison from Merck Virtual screening on 11 targets CA, CDK2, COX-2, DHFR, ER, HIV-PR, HIV-RT, NA, PTP-1B, thrombin, TS Structure-based and ligand-based compared McGaughey et al., J. Chem. Inf. Model., 2007, 47, 1504.
Enrichment (1%) FRED = GLIDE for VS 20 15 10 5 Mean StdDev Median FRED - Lower standard deviation, higher consistency 0 FRED Application GLIDE Best indicator of future performance
FRED - Summary Does well at posing Cross-docking is very difficult Virtual screening performance is good Reliable Can we do better?
Is a co-crystal structure available? Yes Use docking No Ligand-based (2D & 3D) Docking to apo structure is risky Best answer - use BOTH
Hybrid Docking: Using what you know Docking e.g. FRED Hybrid Docking Ligand- Based Design e.g. ROCS Bound ligand structure guides docking
FRED Hybrid vs. Standard Docking
Hybrid docking: speed Docking Time per Compound (one CPU 2.4 GHz Xeon) FRED 2.2 Standard Docking HYBRID 5sec 1sec
Hybrid docking: Posing 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Success Rate at 2Å Hybrid FRED Top20 Top10 Top5 Best HYBRID: 86% @ 2Å FRED: 70% @ 2Å
Virtual Screening Cross et al., J. Chem. Inf. Model., 49, 1455 (2010); McGann ibid. 51, 578 (2011). Results from DUD dataset Thick bars are the 95% confidence interval for the true average AUC Whiskers are the 95% confidence interval for the result of a single trial 95% confidence interval for the true average AUC. 95% confidence interval for the result of a single trial Performance of docking tools is very variable. Is it possible to show statistically meaningful differences between tools?
Virtual Screening Comparison Probability that the mean performance of HYBRID is better than FRED Probability that HYBRID will do better than FRED on one system 93% 62% Use more information Better results
FRED Summary Efficient virtual screening 3-5 sec/molecule Good pose prediction 70% < 2Å RMSD Variety of scoring Unique ligand-based Using more information gives better results Hybrid docking
Why structure-based design? Pose Prediction POSIT Virtual screening FRED/HYBRID Binding affinity prediction
Using a protein structure Pose v. protein FRED Score v. protein FRED/SZYBKI Score v. ligand ROCS/EON Pose v. ligand ROCS HYBRID (VS) & POSIT (posing)
Structure-based design with OpenEye POSIT Ligand-based posing SZMAP Solvent mapping FRED/HYBRID Posing and SBVS SZYBKI MMFF94 optimisation BROOD ROCS Shape alignment & scoring EON Electrostatic similarity Fragment replacement
Count POSIT: Accurate and reliable analogue posing Flexibly fit a new molecule to shape of a known ligand 60 50 > 90% 0-0.5Å RMSD 40 30 20 10 0 RMSD
Cross-docking pose prediction J. Chem. Inf. Model. 50, 1432 (2010) Average self-docking success Average cross-docking success How can we improve? Predict reliability? Identify likely failure cases?
POSIT: analogue posing CDK2 inhibitors Shape analogues are not obvious graph analogues. BUT Obvious graph analogues ARE shape analogues. Shape Tanimoto = 0.903 Fingerprint Tanimoto = 0.45
Molecular Similarity in 3D: How POSIT defines an analogue Shape Tanimoto = 0.90 Color Tanimoto = 0.17 TanimotoCombo = 1.07
How to use what you know: POSIT X-ray structure of known ligand New molecule Pose for new molecule using known ligand. Score AND estimate quality with TanimotoCombo.
What POSIT knows Ligand Information Protein Structure Symbol E Overlay E MMFF Potential TanimotoCombo Merck Molecular Force Field (MMFF94) Goal Match shape and chemistry of bound ligand Maximize interaction with the protein E POSIT = 1 λ E Overlay + λe MMFF λ = scaling factor
Cross-Docking: the real problem 100.00 Percent < 2.0 Angstrom 90.00 80.00 70.00 60.00 Posit* Gold_PLP Glide_PLP 50.00 40.00 30.00 AD4 Fred 20.00 10.00 0.00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 X-ray v. Fit ligands Tanimoto Combo Tuccinardi, et al. J. Chem. Inf. Model., 2010, 50, 1432-1440. * Currently validating that the same data set was used
Prospective Results POSIT predicted crystal structures versus X-ray Active projects at Abbott Labs 17/20 (85%) TC > 1.4 RMSD <= 2Å TanimotoCombo Pose to Crystallographic Ligand
POSIT Summary POSIT gives a pose, a score and a CONFIDENCE POSIT knows when to fail Pose quality can be accurately predicted in POSIT Docking scores cannot predict quality POSIT works PROSPECTIVELY Fast: 10-20 sec/molecule Reliable: pose + confidence Accurate: 98% poses < 2Å; 90% < 0.5Å
POSIT
The PhilOEsophy There are two fundamental molecular descriptors
>omega2 in my_filtered_molecules.ism out my_dbase.oeb.gz >rocs query my_query.oeb dbase my_dbase.oeb.gz besthits 500 prefix my_rocs_results(.sdf) (virtual screening) >fred rec my_receptor.oeb.gz dbase my_dbase.oeb.gz - prefix my_fred_results oformat sdf.gz num_alt_poses 4 (hybrid) >fred rec my_receptor.oeb.gz dbase my_dbase.oeb.gz prefix my_hybrid_results oformat sdf.gz exhaustive_scoring cgo opt Chemgauss3 chemgauss3 true num_alt_poses 4