Pose and affinity prediction by ICM in D3R GC3 Max Totrov Molsoft
Pose prediction method: ICM-dock ICM-dock: - pre-sampling of ligand conformers - multiple trajectory Monte-Carlo with gradient minimization in internal coordinates - receptor represented by grid potentials - multiple receptor conformations used when needed to address flexibility - pose re-ranking with ICM VLS score - optional chemical biasing by APF from available experimental ligand structures/templates
Atomic Property Fields (APF) Totrov M. Atomic Property Fields: generalized 3D pharmacophoric potential for automated ligand superposition, pharmacophore elucidation and 3D QSAR. Chem Biol Drug Des. 2008 ;71(1):15-27.
APF - 3D pharmacophoric potential 3D pharmacophore: arrangement of molecular properties in space that confers activity. Generalization of point pharmacophore concept: Discrete pharmacophoric points Continuous distributions Moieties represented as Ph4 types f j i - vector of properties i for atom j Vectors of atomic properties Atomic similarity measure - dot product of property vectors: f j i f k i Atomic Property Field (APF) - continuous 3D potential: P i (r) = f j i exp((r-r j ) 2 /l 2 APF); Pseudo-energy (score) of a compound in APF: E APF =- f j i P i (r j ); Implementation - on a 3D (multi)grid - continuous derivatives (spline) - fast potential for molecular mechanics/optimization in combination with force-field energy
Accuracy of Flexible Ligand Superposition Independent broad benchmark: ligands without X-ray structures but similar chemotype to a solved complex. Assessment of superposition quality 2/1/0 - good / acceptable / poor. 11 targets from DUD (out of 40). ADA CDK2 DHFR ER FXA HIVRT NA P38 THR TK TRP mean (39) (72) (410) (39) (146) (43) (49) (454) (72) (22) (49) (1100) ---------------------------------------------------------------------------------------------------------------------------------------- Surflex-sim 2 12.82 12.5 44.39 56.41 4.11 18.6 18.37 9.69 4.17 68.18 40.82 23.15 1 35.9 51.39 53.66 43.59 33.56 72.09 75.51 69.6 93.06 31.82 59.18 59.07 0 51.28 36.11 1.95 0 62.33 9.3 6.12 20.7 2.78 0 0 17.78 ROCS 2 12.82 43.06 74.15 41.03 14.38 30.23 79.59 9.47 2.78 86.36 8.16 35.63 1 20.51 36.11 14.39 56.41 28.77 34.88 14.29 41.19 69.44 9.09 81.63 32.83 0 66.67 20.83 11.46 2.56 56.85 34.88 6.12 49.34 27.78 4.55 10.2 31.54 FlexS 2 15.38 25 56.1 48.72 35.62 16.28 36.73 14.98 30.56 81.82 18.37 33.48 1 20.51 19.44 11.71 43.59 13.7 46.51 57.14 74.01 5.56 13.64 2.04 35.77 0 64.1 55.56 32.2 7.69 50.68 37.21 6.12 11.01 63.89 4.55 79.59 30.75 ICM/APF 2 46.15 12.5 86.83 51.28 70.55 18.6 75.51 20.04 88.89 90.91 69.39 54.48 1 23.08 68.06 11.95 46.15 16.44 46.51 14.29 68.28 9.72 9.09 28.57 36.49 0 30.77 19.44 1.22 2.56 13.01 34.88 10.2 11.67 1.39 0 2.04 9.03 Giganti et al. J Chem Inf Model 2010, 50, 992-1004
Ligand-Biased docking with APF MC docking simulations: APF potentials in addition to physical interaction term grids Pose ranking: composite score combining physicsbased ICM VLS score and APF pseudoenergy Visualization of APF used for ligand bias in Cathepsin S docking Lam PC, Abagyan R, Totrov M. Ligand-biased ensemble receptor docking (LigBEnD): a hybrid ligand/receptor structure-based approach. J Comput Aided Mol Des. 2018; 32(1):187-198.
D3R Cathepsin S: Pose prediction Average RMSD, top pose Average RMSD, best pose of 5 2.82Å 1.31Å Median RMSD, top pose Median RMSD, best pose of 5 1.7Å 1.06Å
CatS Ligands: RMSD for top 5 poses - Most accurate pose of 5
Apparent crystal contact effects Crystal neighbor Cathespin Crystal neighbor Cathespin Ligands Primary receptor Cathespin Superimposed answer X-rays nmxm (CatS_7), rpwj (CatS_9) and gabj (CatS_14). Extensive ligand-crystallographic neighbor contacts (~150Å 2 ) are visible Primary receptor Cathespin Superimposed answer X-ray yrpk (CatS_16), and its top predicted pose. Also shown are top poses for CatS_7, CatS_9 and CatS_14 (thin wires)
Kinases and FXR: flexibility ensembles Ensemble construction: - PDB structures collected/aligned via Pocketome database - Up to 10 representative X- ray structures selected by iterative procedure to maximize number of compatible ligands - For each receptor conformation, compatible ligands are used as APF templates in docking X-ray Receptor Conformation X-ray Ligand Bound Ligand/Receptor conformation compatibility matrix heatmap for VEGFR2
Pocketome: comprehensive collection of ligand-binding pockets from PDB Instant access to all relevant PDB X-ray structures, optimally pre-aligned around the binding pocket. Kufareva I, Ilatovskiy AV, Abagyan R. Pocketome: an encyclopedia of small-molecule binding sites in 4D. Nucleic Acids Res. 2012; 40:D535-40.
FXR (GC2) pose prediction results Average RMSD, top pose Average RMSD, best pose of 5 1.95Å 1.69Å Median RMSD, top pose Median RMSD, best pose of 5 1.95Å 1.95Å
Affinity prediction approaches Docking to generate aligned poses Receptor/Physics-based approach ICM VLS score: ΔG = α 1 ΔE FF + α 2 ΔE GB + α 3 ΔE HP + α 4 ΔE HB + α 5 ΔE PD + α 6 TΔS TO Ligand/APF-based approach: DG» f m i P i APF-QSAR (r m ); P i APF-QSAR (r) = w k if j i exp(-(r-r j ) 2 /l 2 ); 7 N train weights w k i for the contributions of each molecule k in the training set into each APF component i DG l» w k i E APF kl i ; E APF kl i = f m i f j i exp(-(r m -r j ) 2 /l 2 ); Partial Least Squares (PLS) to determine weights w k i Totrov M. Atomic Property Fields: generalized 3D pharmacophoric potential for automated ligand superposition, pharmacophore elucidation and 3D QSAR. Chem Biol Drug Des. 2008 ;71(1):15-27.
Training APF 3D QSAR: Cathepsin S 302 related compounds from ChEMBL v2.3 docked 3D poses used to build APF 3DQSAR model Visualization of APF fields of pkd model for Cathepsin S
Training sets of activity data Source: ChEMBL v2.3 Varying number and relevance: Cath. S VEGFR2 Target N of data points JAK2 p38a Cath S 1754 VEGFR2 5733 JAK2 1618 p38a 4183 Distributions of Tanimoto distances to the closest training set compound for each challenge compound
Training/Testing Set Generation LOO cross-validation or simple N-fold random test subsets don t reflect realistic challenge adequately Stringent 3-fold cluster cross-validation: - Cluster full training set (APF 3D chemical distance, 0.25 cutoff) - Randomly assign clusters to three groups - Use any 2 groups to train, 3 rd group for test (Q2/RMSE)
Improving upon APF 3D QSAR: Combining APF and physics based terms: - APF produces better models provided sufficient training data. Physics based terms are typically noisier, but more general. - Can the two combine for better performance? - Investigate single and staged models combining chemical and physical terms. PLS and/or RFR Dynamic focused models: - Some evidence that large training set dilutes local activity trends - Investigate focused models trained on subsets of data related to challenge molecules
Dynamic/Focused Model Training 1. Dock Ligands 2. Cluster by 3D poses in APF 3. For each cluster: Find ~300 nearest known ligands as training set 4. Train a model for each cluster Challenge Ligands ChEMBL Ligands
Kinase models cross-validation Training Sets Full/ Static Focused/ Dynamic Terms Method VEGFR2 Q 2 Physics/ VLS-Score Physics Only No training VEGFR2 RMSE JAK2 Q 2 JAK2 RMSE P38a Q 2 P38a RMSE 0.12 NA 0.23 NA 0.13 NA PLS 0.13 1.2 0.30 1.1 0.25 1.0 RFR 0.12 1.2 0.23 1.2 0.18 1.1 APF only PLS 0.22 1.4 0.30 1.2 0.29 1.1 Physics + APF 1 Stage PLS 2 Stage PLS 0.26 1.3 0.36 1.2 0.29 1.1 0.22 1.4 0.32 1.2 0.30 1.1 PLS/RFR 0.25 1.2 0.33 1.1 0.33 1.0 APF only PLS 0.26 1.2 0.35 1.2 0.32 1.0 Physics + APF PLS/RFR 0.28 1.1 0.40 1.1 0.33 1.0 RMSE is shown in pkd units
Challenge Set Performance Training Set Terms Method VEGFR2 Corr R VEGFR 2 RMSE JAK2 Corr R JAK2 RMSE P38a Corr R Static APF PLS 0.54 1.4 0.55 1.2 0.55 1.1 Focused/ Dynamic Physics + APF PLS 0.61 1.2 0.65 1.0 0.56 1.1 PLS/RFR 0.68 1.0 0.61 1.0 0.63 1.0 APF PLS 0.53 1.5 0.53 1.2 0.51 1.3 Physics + APF PLS/RFR 0.67 Q C3F =0.53 <d min >= 0.2 1.0 0.59 Q C3F =0.63 <d min >= 0.3 1.0 0.56 Q C3F =0.57 <d min >= 0.27 P38a RMSE 1.3
D3R affinity prediction: ligand ranking, Kendall τ τ = 0.45 Cathepsin S stage 1 VEGFR2 τ = 0.45 JAK2_SC2 p38a τ = 0.47 τ = 0.41
Conclusions Ligand biased docking (ICMdock+APF) consistently produces good pose accuracy Atomic Property Field-based 3D QSAR activity models outperform physical term based models Cluster cross-validation is adequate to assess model quality Using dynamic/focused training sets did not result in consistently better predictions Composite models and in particular PLS(APF)/RFR(Phys) are consistently most predictive
Acknoledgments Polo Lam Eugene Rausch Ruben Abagyan D3R organizers