Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Iván Solt Solutions for Cheminformatics

Drug Discovery Strategies for known targets High-Throughput Screening (HTS) Cells or recombinant protein Fluorescent or luminescent readout Automated, miniaturized Thousands of samples / day Number of primer actives: ~1% Virtual Screening (VS) Ligand or structure based Virtual or real libraries Similarity search, 2D or 3D Can lead to thousands of possible actives: further processing needed Measurement: Enrichment ratio, ROC curves for known actives

Virtual Library Design Workflow DB DB Databases Reactions Molecules Queries Fragmentation R-group decomposition Fragmentation Reagent clipping Compound selection Similarity searches Substructure searches Enumeration Fuse fragments R-group composition Reaction enumeration Library analysis Clustering 2D similarity screen 3D Shape similarity screen

Find or Virtually Create Candidates Virtual screening of existing compounds Pros: Fast Hits are readily available for in vitro experiments Cons: Limitation on available compounds De novo design Pros: No limitation on virtual compound space Structural novelty Cons: Are hits synthetically available?

Virtual Screening Workflow DB DB Molecules in-house or commercially available 1. Reactions virtual synthetic path Synthetically Accessible Compounds 2. Filtering in vivo experiment? 5. Clustering 4. 3D alignment 3. Similarity Search

Step 1: Reaction Enumeration Reaction schema for accessible syntheses Combinatorial or sequential enumeration Reaction rules: phrase + apply public and in-house chemical knowledge Selectivity with tolerance Reactivity Exclusion rules EXCLUDE: match(reactant(1), "[Cl,Br,I]C(=[O,S])C=C") or match(reactant(0), "[H][O,S]C=[O,S]") or match(reactant(0), "[P][H]") or (max(pka(reactant(0), filter(reactant(0), "match('[o,s;h1]')"), "acidic")) > 14.5) or (max(pka(reactant(0), filter(reactant(0), "match('[#7:1][h]', 1)"), "basic")) > 0)

Step 1: Reaction Enumeration

Step 1: Reaction Enumeration Reaction rules ON Fewer results than theoretical Unfeasible starting materials eliminated Feasible products only Custom rules can be added to increase selectivity Reaction rules OFF More results Best for debugging purposes Prodcts may be incorrect due to neglecting chemical rules

Step 2: Filtering Lead likeness, drug likeness Chemical Terms Could it fit to the active centre? Basic analysis: size, mass... Could it get to the active centre? ADME properties: solubility, pka, polar surface, partition coefficients... Structural filtering e.g. reactive groups Toxicity, environmental concerns, etc... Calculator plugins Elemental Analysis Elemental Analysis IUPAC Name Structure to Name Protonation pk a Microspecies Isoelectric Point Partitioning logp logd Charge Charge Polarizability Orbital Electronegetivity Isomers Tautomerization Stereoisomer Conformation Conformer Flexible 3D Alignment Molecular Dynamics Geometry Topology Analysis Geometry Polar Surface Area (2D) Molecular Surface Area (3D) Markush Markush Enumeration Other Hydrogen Bond Donor- Acceptor Huckel Analysis Refractivity Structural Framework Resonance

Step 3: Similarity search Screen 2D + Descriptor package Screen against known bioactives Chemical Fingerprints Topology Pharmacophore Fingerprints: Custom atomic properties + their topological relationship H-bond donors / acceptors Cationic / anionic groups Hydrophobic groups Aromatic groups etc. ECFP/FCFP Similarity searches Tanimoto, Eucledian, Tversky metrics Metrics optimization 0.57 0.47 0.55 regular Tanimoto optimized Tanimoto 0.20 0.28 0.06

Step 4: Screen 3D Align the candidates to the known active in 3D Treat the candidate flexible! Consider pharmacophore atom types (align cationic to cationic, etc.)! Problem: complicated conformational space

Step 4: Screen 3D Simple sampling of the conformational space: Minimum and maximum distance between atom pairs in the full torsion space Select atoms Colors (e.g. pharmacophore types ) Topological features (e.g.:longest chain start/end/center) Ring centers (aromatic, aliphatic) Calculate Min/max internal distance ranges Distance histograms for selected atoms Only once for each molecule

Step 4: Screen 3D Hybrid alignment: Separate translation&rotation from torsions Robust and goes fast Needs good guess on atomatom mapping: Same colors Distance ranges must be allowed for all mapped pairs Triangle inequality must be fulfilled for any atom triplet

% of the actives retrieved Screen 3D: Test on DUD 30 Average of 1% Enrichments 25 20 15 10 5 0 Giganti et al. J. Chem. Inf. Model. 2010, 50, 992

% of the actives retrieved Screen 3D: Test on DUD 100 Average of 10% enrichments 90 80 70 60 50 40 30 20 10 0 Giganti et al. J. Chem. Inf. Model. 2010, 50, 992

Screen 3D: Test on DUD Average time per compound (without precalculations) ChemAxon Screen3D 0.07 ROCS 0.5 FRED 1.0 ICMsim 2.4 Surflex-sim 6.7 FlexS 6.9 Surflex-dock 14.6 FLEXX 15.6 ICM 17.7 Speed Intel Q6600 2.4 GHz Intel Xeon 2.4 GHz Giganti et al. J. Chem. Inf. Model. 2010, 50, 992

Step 5: Clustering, library analysis JKlustor Wide range of methods Unsupervised, agglomerative clustering Hierarchical and non-hierarchical methods Similarity based and structure based techniques Flexible search options Tanimoto and Euclidean metrics, weighting Maximum common substructure identification chemical property matching including atom type, bond type, hybridization, charge

JChem Extensions in KNIME Worklflow management in KNIME JChem extension nodes developed by InfoCom, Japan Constantly developing palette of available JChem tools

JChem Extensions in KNIME IO molecule and reaction import, export, drawing Visualization Manipulators Calculator plugins Reactor Similarity and structure-based search Fingerprint calculation Fragmentation Clustering R-group composition, decompozition Standardization... Database management Molecular format conversion Web search services

Step 1: Reaction Enumeration

Step 2: Filtering

Step 3: Similarity search

JChem Extensions in KNIME DB DB 1. Reactions virtual synthetic path Synthetically Accessible Compounds 2. Filtering in vivo experiment? 4. 3D alignment 3. Similarity Search 1. Import reactants 2. Enumerate reaction Carry out topology analysis 3. Calculate properties Filter 4. Screen for similarity against known active 5. Export results

Conclusions Virtual libraries and virtual screening are essential tools in modern Drug Discovery No special hardware, short experiment cycles, variety of approaches Database of synthetically accessible compounds can be designed with reaction libraries and custom in-house synthetic knowledge Powerful 3D alignment techniques allow highthroughput conformational screening with great efficiency Straightforward integration into KNIME

Contributors Tímea Polgár Attila Tajti