The PhilOEsophy. There are only two fundamental molecular descriptors

The PhilOEsophy There are only two fundamental molecular descriptors

Where can we use shape? Virtual screening More effective than 2D Lead-hopping Shape analogues are not graph analogues Molecular alignment No requirement for (manual) atom matching Pose generation/prediction Matching a binding site Matching a bound ligand

Where can we use electrostatics? Lead-hopping Electrostatic analogues are not graph analogues Solvent treatment Continuum and semi-continuum

Virtual Screening Protein Preparation Compound Collection Database Preparation Screening Database Structure-based 3D & 2D Ligand-based Ligand Preparation Hybrid & Consensus

Using a protein structure Pose v. protein FRED Score v. protein FRED/SZYBKI Score v. ligand ROCS/EON Pose v. ligand ROCS HYBRID (VS) & POSIT (posing)

Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring

OMEGA: Would you like to? Generate high quality conformer ensembles rapidly. Store large ensembles in very compact databases for rapid searching. Calculate useful conformer energetics in a variety of environments.

ROCS: Would you like to? Efficiently align molecules by shape and chemical features. Rapidly screen large databases for non-obvious actives. Obtain informative overlays between active and untested compounds.

The ROCS GUI: vrocs Generate custom queries

FRED: Would you like to? Perform structure-based VS rapidly. Identify binding mode(s) of molecules in an active site. Utilize more information to achieve better results. HYBRID

POSIT: Would you like to? Produce good quality predictions of ligand poses with very high frequency. Accurately estimate the probability that a predicted pose is accurate. Automatically determine the best protein structure from a set to pose a molecule against.

Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring

OMEGA: conformation generation OMEGA Knowledge-based conformation generation Virtual screening Crystal structure reproduction Ensemble properties

OMEGA: The best validated conformer generator Carefully selected crystallographic structures PDB and CSD Multiple measures of success Closeness and coverage Rigorous statistical analysis DOI: 10.1021/ci100031x

OMEGA: The process Input molecule (1D, 2D, 3D) Find fragments 3D Fragment library Built-in or custom Assemble fragments -> 3D structure Torsion driving Complete conformer ensemble Torsion library Built-in & extensible Knowledge Base Pruned conformer ensemble

Size (MB) The file size problem SD/MOL2 files too large to store large numbers 14000 of molecules or conformers 12000 10000 OpenEye binary (OEB) much smaller 8000 10x or more 6000 4000 Can we do better? 2000 File size for 22 million conformers How is this done? 0 MOL2 SDF OEB ROC-OEB File Format

Rotor-offset compression (ROC) Speeds up downstream tools 10-15% Store one set of coordinates. All other conformers defined by torsion angles.

RMSD OMEGA: accuracy on a carefully chosen dataset 2.5 2 Mean RMSD: 0.67Å (0.655, 0.688) Median RMSD: 0.53Å 1.5 1 0.5 0 0 50 100 150 200 Count J. Chem. Inf. Model., 50, 572 (2010).

OMEGA: relative accuracy 100 75 50 MOE/Import Catalyst/BEST ConfGen/CompMin ConfGen/CombMin OMEGA2 25 0 <0.5 <1 <1.5 <2 Watts et al. J. Chem. Inf. Model. 50, 534 (2010)

OMEGA: speed 150 120 90 60 NumConfs Time(s) 30 0 MOE/Import Catalyst/BEST ConfGen/CompMin ConfGen/CombMin OMEGA2 Average OMEGA time = 2.7 secs/molecule J. Chem. Inf. Model. 50, 822 (2010)

OMEGA Summary Speed: 0.5-2 molecules/sec Fastest of all commercial applications Quality: Excellent reproduction of X-ray poses Best overall at highly precise reproduction (< 0.5Å) Flexibility in generation of conformers Focus/diversity of conformer sets can easily be controlled In vacuo, solution, protein-bound

Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring

The Shape of Ligand-based Design ROCS ROCS compares molecules by shape & chemistry Rigid overlay of a query conformer(s) with a set of conformers of database molecules Scoring by shape similarity and chemical (color) similarity (in 3D)

Per cent actives ROCS: Shape overlay and scoring Effective virtual screening 100 75 50 25 ROCS 0 0 5 Per 10 cent 15 screened 20 25 30 35 Identify shared features Molecular alignment Leadhopping

Shape similarity & graph similarity are not the same CDK2 inhibitors 10 nm 10-32 nm ROCS (shape) sim = 0.90 Fingerprint (2D) sim = 0.40

ROCS: Overlays + Scores Shape Tanimoto = 0.90 Color Tanimoto = 0.17 TanimotoCombo = 1.07

VS Comparison from Merck Virtual screening on 11 targets CA, CDK2, COX-2, DHFR, ER, HIV-PR, HIV-RT, NA, PTP- 1B, thrombin, TS Structure-based and ligand-based compared ROCS and docking Same X-ray structures; ligand as query McGaughey et al., J. Chem. Inf. Model., 2007, 47, 1504.

E (1%) ROCS is better than docking VS by Merck 30 25 20 15 10 5 Mean StdDev Median 0 GLIDE ROCS Application

Conclusion Extensive Merck study shows that ROCS is the best overall VS tool available Fast Reliability High hit rate Diverse hit structures Merck no longer uses docking for VS

VS against GPCRs Evers et al., J. Med. Chem., 2005, 48, 5448. 5HT2A, A1A, D2, M1 Various 3D techniques Docking to homology models Gold, FlexX-Pharm Ligand-based methods Catalyst, FlexS Compare to ROCS

Enrichment Mean of Results 20 15 10 5 GOLD FlexX-Pharm Catalyst FlexS ROCS 2D_MACCS 0 1% 5% 10% Per cent screened FlexS, ROCS: 1 query molecule, 1 computed conformation Catalyst: 15-20 query molecules -> 1 pharmacophore

ROCS Summary Powerful VS application Frequently outperforms docking Success does NOT require a bioactive conformation for the query Only low database conformational sampling required 25-50 confs/molecule Fast Up to 40 molecules/second 1000-2000 conformers/second

The ROCS GUI: vrocs Generate queries from molecules Customized queries Multi-molecule queries

Why vrocs? Enhanced Virtual Screening Active Compound(s) Query Creation vrocs Query Editing Query Validation ROCS

Virtual Screening with OpenEye QUACPAC tautomers charges FILTER OMEGA Conformations Remove undesirables FRED/HYBRID Posing and SBVS ROCS Shape alignment & scoring

How does FRED work? Build/customize receptor model GUI fred_receptor Input conformer database Optimized with OMEGA Exhaustive posing Structure-based & ligand-based scoring FRED Consensus pose selection

Global Exhaustive Search Systematic Rotations Systematic Translations Poses for scoring X Filtering of clashing poses

Scoring Operate on best poses from Exhaustive Search Protein-based scoring PLP, ChemScore, ScreenScore ChemGauss3, ShapeGauss PB (electrostatic interactions) Ligand-based scoring CGO Consensus

Fraction FRED: Self Docking Results 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 RMSD (Å) Top Scoring Pose Top 5 Top 10 Top 20 This is a completely irrelevant problem.

Cross-docking is difficult Average self-docking success Average cross-docking success J. Chem. Inf. Model. 50, 1432 (2010)

VS Comparison from Merck Virtual screening on 11 targets CA, CDK2, COX-2, DHFR, ER, HIV-PR, HIV-RT, NA, PTP-1B, thrombin, TS Structure-based and ligand-based compared McGaughey et al., J. Chem. Inf. Model., 2007, 47, 1504.

Enrichment (1%) FRED = GLIDE for VS 20 15 10 5 Mean StdDev Median FRED - Lower standard deviation, higher consistency 0 FRED Application GLIDE Best indicator of future performance

FRED - Summary Does well at posing Cross-docking is very difficult Virtual screening performance is good Reliable Can we do better?

Is a co-crystal structure available? Yes Use docking No Ligand-based (2D & 3D) Docking to apo structure is risky Best answer - use BOTH

Hybrid Docking: Using what you know Docking e.g. FRED Hybrid Docking Ligand- Based Design e.g. ROCS Bound ligand structure guides docking

FRED Hybrid vs. Standard Docking

Hybrid docking: speed Docking Time per Compound (one CPU 2.4 GHz Xeon) FRED 2.2 Standard Docking HYBRID 5sec 1sec

Hybrid docking: Posing 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Success Rate at 2Å Hybrid FRED Top20 Top10 Top5 Best HYBRID: 86% @ 2Å FRED: 70% @ 2Å

Virtual Screening Cross et al., J. Chem. Inf. Model., 49, 1455 (2010); McGann ibid. 51, 578 (2011). Results from DUD dataset Thick bars are the 95% confidence interval for the true average AUC Whiskers are the 95% confidence interval for the result of a single trial 95% confidence interval for the true average AUC. 95% confidence interval for the result of a single trial Performance of docking tools is very variable. Is it possible to show statistically meaningful differences between tools?

Virtual Screening Comparison Probability that the mean performance of HYBRID is better than FRED Probability that HYBRID will do better than FRED on one system 93% 62% Use more information Better results

FRED Summary Efficient virtual screening 3-5 sec/molecule Good pose prediction 70% < 2Å RMSD Variety of scoring Unique ligand-based Using more information gives better results Hybrid docking

Why structure-based design? Pose Prediction POSIT Virtual screening FRED/HYBRID Binding affinity prediction

Using a protein structure Pose v. protein FRED Score v. protein FRED/SZYBKI Score v. ligand ROCS/EON Pose v. ligand ROCS HYBRID (VS) & POSIT (posing)

Structure-based design with OpenEye POSIT Ligand-based posing SZMAP Solvent mapping FRED/HYBRID Posing and SBVS SZYBKI MMFF94 optimisation BROOD ROCS Shape alignment & scoring EON Electrostatic similarity Fragment replacement

Count POSIT: Accurate and reliable analogue posing Flexibly fit a new molecule to shape of a known ligand 60 50 > 90% 0-0.5Å RMSD 40 30 20 10 0 RMSD

Cross-docking pose prediction J. Chem. Inf. Model. 50, 1432 (2010) Average self-docking success Average cross-docking success How can we improve? Predict reliability? Identify likely failure cases?

POSIT: analogue posing CDK2 inhibitors Shape analogues are not obvious graph analogues. BUT Obvious graph analogues ARE shape analogues. Shape Tanimoto = 0.903 Fingerprint Tanimoto = 0.45

Molecular Similarity in 3D: How POSIT defines an analogue Shape Tanimoto = 0.90 Color Tanimoto = 0.17 TanimotoCombo = 1.07

How to use what you know: POSIT X-ray structure of known ligand New molecule Pose for new molecule using known ligand. Score AND estimate quality with TanimotoCombo.

What POSIT knows Ligand Information Protein Structure Symbol E Overlay E MMFF Potential TanimotoCombo Merck Molecular Force Field (MMFF94) Goal Match shape and chemistry of bound ligand Maximize interaction with the protein E POSIT = 1 λ E Overlay + λe MMFF λ = scaling factor

Cross-Docking: the real problem 100.00 Percent < 2.0 Angstrom 90.00 80.00 70.00 60.00 Posit* Gold_PLP Glide_PLP 50.00 40.00 30.00 AD4 Fred 20.00 10.00 0.00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 X-ray v. Fit ligands Tanimoto Combo Tuccinardi, et al. J. Chem. Inf. Model., 2010, 50, 1432-1440. * Currently validating that the same data set was used

Prospective Results POSIT predicted crystal structures versus X-ray Active projects at Abbott Labs 17/20 (85%) TC > 1.4 RMSD <= 2Å TanimotoCombo Pose to Crystallographic Ligand

POSIT Summary POSIT gives a pose, a score and a CONFIDENCE POSIT knows when to fail Pose quality can be accurately predicted in POSIT Docking scores cannot predict quality POSIT works PROSPECTIVELY Fast: 10-20 sec/molecule Reliable: pose + confidence Accurate: 98% poses < 2Å; 90% < 0.5Å

POSIT

The PhilOEsophy There are two fundamental molecular descriptors

>omega2 in my_filtered_molecules.ism out my_dbase.oeb.gz >rocs query my_query.oeb dbase my_dbase.oeb.gz besthits 500 prefix my_rocs_results(.sdf) (virtual screening) >fred rec my_receptor.oeb.gz dbase my_dbase.oeb.gz - prefix my_fred_results oformat sdf.gz num_alt_poses 4 (hybrid) >fred rec my_receptor.oeb.gz dbase my_dbase.oeb.gz prefix my_hybrid_results oformat sdf.gz exhaustive_scoring cgo opt Chemgauss3 chemgauss3 true num_alt_poses 4