Lecture for Molekylär bioinformatik X3 Feb Computational Chemistry in Drug Discovery. Mats Kihlén. Head of Research Informatics.

Lecture for Molekylär bioinformatik X3 Feb 24 2004 Computational Chemistry in Drug Discovery Mats Kihlén Head of Research Informatics Biovitrum AB

verview» The role of computational chemistry» Basic concepts The pharmacophore concept - conformational analysis Database searching Virtual combinatorial chemistry Molecular Dynamics Structure based drug design Predicting ADME properties Protein modeling» Trends and future directions

The Drug Development Process Pre-clinical Clinical Medicinal chemistry Pharmacology Drug Design Mol biology Tox ADME Investigational New Drug Application Phase I - III New Drug Application Market 0 5 10 Years

The role of computational chemistry» Aid chemists in the design of compounds Improve affinity Find SAR (Structure Activity Relationship) Select building blocks for combinatorial libraries Predict permeability & solubility» Provide protein models to biologists Target identification Genetic constructs Specificity guidance» Analyse biological data Make sure data is captured and stored Coordinate data flow in projects

The pharmacophore concept Superimpose features from several compounds H N H N H Lipophilic Use the features as search pattern Hydroxyl D1 D2 Amine D3 Aromatic ring

Conformational analysis» Identify the bioactive conformation» Vary all torsion angles and calculate lowest internal energy» Typically done with MacroModel H N H H N... but the free binding energy depends on:» Solvent» Protein ligand interactions Protein Water ε r = 80 ε r = 4 Free Ligand Water

Structure Searching» Pharmacophores Finding novel scaffolds Refining substituents Typically ACD (Available Chemicals Directory), in-house databases or virtual libraries (100k - 1M compounds) Rigid or flexible search - fast» High troughput docking Protein structure required! Time consuming, despite crude model

The structure based drug design cycle Design Synthesis Structure determination of complex Activity measurement

A project screening funnel Design Synthesis Selectivity Protein X assay K i < 5 µm Cell based assay Caco-2 P app > 1*10-6 cm/s Co-crystallisation with Protein X ADME Mouse model

Virtual PPARγ Libraries H NH 2 + Cl R1 R2 H HN R2 R1 All reagents selected from ACD: Purity: > 95% Price: < $50 MW: < 250 Quantity: > 1g R1: 95 acid chlorides R2: 647 alcohols 61 465 R1: 13 acid chlorides R2: 128 alcohols 1 664 Full expansion R1: 1 acid chloride R2: 647 alcohols R1: 95 acid chlorides R2: 1 alcohol 712 Iterative design

Building libraries in in Afferent

Binding energy prediction Molecular Dynamics Simulations Ligand Water Ligand Protein Water - 10-15 CPU hours per compound - Full flexibility and solvent within simulation sphere As close to reality as we can get today The Åqvist & Medina equation: G binding = β V el + α V vdw

High Throughput Docking Applications:» Selection of compounds for screening Smaller number of compounds to test Possible to cover compounds not in the compound collection» Selection of reagents for focussed libraries Make large virtual libraries, but synthesise only the most promising compounds» Virtual SAR by NMR Identification of small binding fragments which could be joined to create potent compounds

Several weak binders can be turned into one strong» Linking two weak binders may result in a ligand with the product of their binding energies.» Case study: Combinatorial linking of two weak c-src tyrosine kinase ligands gave a 64 nm binder. Each fragment showed appr 70% inhibition at 500µM Maly D, Choong I, Ellman J, Combinatorial target-guided ligand assembly: Identification of potent subtype-selective c-src inhibitors, PNAS 97 (2000)

Selection for screening» Dock public compound databases as starting point for compound acquisition or screening: Example: ACDscreen - 1.2M compounds» Pre-filtering necessary Reduce computational needs Remove junk, e.g. SLN-based filters Require known features» Good for small compound collections

Pre-filtering Implemented as Sybyl substructure filters» Definition of unwanted groups as SLNs» Versatile syntax, high capacity filters # Sul phonyl hal i des S(=)(=)Hal # Acyl hal i des C( =) Ha l # Perhal o ket ones CC(=)C(Hal)(Hal)Hal # Sul phonat es t ers = S ( = ) C R R R X X X R H R R # Phosphonat est ers = P( = ) C R # Al pha hal o carbonyl compounds =CCAny [ i s =Cl, Br, I ] # Het - het s i ngl e bond but not N- 5ri ng- het rocycl es or s ul phonami des Any-S[not=S=]-Het-Any Any[ i s =N,, P; not =N*[ 1] ~Any~Any~Any~Any~@1] - Any[is=S,N,,P;not=S*=]-Any

Docking method used at Biovitrum» Fixed protein, fully flexible ligands Fails if induced fit MC generation of conformers and positions No intial bias (positions, restraints etc)» Fully automated procedure using ICM or GLIDE» Capacity ~40k compounds per day using 30 CPUs» Docks e.g. PTP1B and PPARg binders close to crystal structures, but...» Docking and scoring are different things!

ICM docking of troglitazone, rosiglitazone, pioglitazone and PNU91325 into PPARγ ligand binding domain.

Docking test - PTP1B inhibitors Prediction of actives vs inactives Activity threshold: <50uM 100% 80% 60% 40% False True 20% 0% Active Inactive Conservative score threashold (-40), n = 107 55 random drugs: 100% predicted inactive Correct prediction of actives: 82% Correct prediction of inactives: 26% Relatively close analogues. Hard to explain from structure why some are inactive.

Green compound from crystal complex vs white docked analogue Phosphate mimetic Greasy C-term patch Crucial interaction with Asp48

NovoNordisk Xtal vs PNU Xtal Structure-Based Design of a Low Molecular Weight, Nonphosphorus, Nonpeptide, and Highly Selective Inhibitor of Protein-tyrosine Phosphatase 1B Iversen et al, J Biol Chem. 2000 PDB code: 1ECV I N H H HN S N H H H H

Docked Novo #5 vs PNU Asp48

Novo #5 vs 1ECV HN S N H H I N H H Created virtual library from 250 aldehydes to explore nearby pocket H H

Virtual library hits

Reaching 2 nd ptyr site

Structure Based Focussing Combine the pharmacophore concept with high troughput docking: Align with a docked pharmacophore Score against the surface Protein Ligand

Virtual screening summary» ICM & GLIDE are fast and robust enough to be used as a standard docking tools» Can clearly enrich screening sets of diverse compounds» Not reliable enough to predict small differencies in binding affinity» More work should be done to improve scoring

Predicting ADME properties Typical aspects:» Cell permeability» Aqueous solubility» Liver enzymes: inhibitiors or substrates?» Protein binding» Physical properties vs. biological interactions

A model of passive diffusion Water pk a, ph [C charged ] [C neutral ] G 0 Size Lipids [C] Water [C] = 0

Predicting absorption 100 90 80 70 20 diverse compounds with known absorption in humans (from Palm et al 1997) FA pred % 60 50 40 30 20 100 ln ln = α PSA + β ASA + γ 100 FA% 10 0 0 10 20 30 40 50 60 70 80 90 100 FA%

Predicting aqueous solubility Experimental vs predicted solubility for 833 mixed compounds 924_sort.M3 (PLS), Untitled, PS-924_sort logsol, Comp 4 (Cum) 4 Experimental logs bserved 2 0-2 -4-6 -8-10 -12-12 -11-10 -9-8 -7-6 -5-4 -3-2 -1 0 1 2 3 4 5 Predicted logs RMSEP=0.865623 Npc = 4 N Simca-P 8.0 by Umetrics AB 2000-10-15 18:57 training = 91 N test = 833 RMSEP = 0.87 PLS PLS model model based based on on 3D 3D molecular descriptors calculated by by Cerius2

Blood Brain Barrier model» In-house data from 75 compounds» High level descriptors from Ab initio calculations» PLS statistics Y 2 1 0-1 all_tr.m4 (PLS), train_50_x_21, Work set Predicted log(b/p), vs. Comp observed 2(Cum) log(b/p) 9 2 29 27 31 12 6 21 46 36 44 4749 65 55 5138 53 1718 58 2534 1 41 39 22 54 7437 3270 26 43 3566 1462 71 69 72 33 63 40 4 42 75-2 59 77 Npc = 2 n = 50 R 2 = 0.73 Q 2 = 0.61 48-2 -1 0 1 2 Predicted RMSEE=0.510595 Simca-P 8.0 by Umetrics AB 2000-11-01 16:23

Pharmacophore model for 2D6 inhibitors» 3D QSAR model built in Catalyst» 36 compounds from Lily paper» Correlation fitted vs. observed K m : 0.93» Correctly predicted 82% of P&U compounds < 1 log» Activities 0.0046 1000 µm

Protein Modeling» Models usually too poor for SBDD» Sufficient for selectivity guidance» Increasing demand due to Bioinformatics revolution Auto-building and classification of structural domains If family identified: select initial compound set for testing» Major tool: ICM Multiple sequence alignment Structure optimisation with Monte Carlo ZACRP7 QGDPGLPGVCRCGSIVLKSAFSVGITTSYPEER--LPI ZACRP2 KGEPGLPGPCSCGSGHTKSAFSVAVTKSYPRER--LPI 1c28a_a ---------------MYRSAFSVGLETRVTVPN--VPI huzsig39 RSESRVP----------------------PPSD--APL

Trends & Guesses» More parallel synthesis and combi.chem. Larger data volumes for theoretical evaluation» Earlier ADME studies Prediction of physical properties Metabolism models» Faster project turnover» Need for efficient data management New tasks for computational chemistry!» Novel targets» Specialisation on target classes vs. therapeutic areas» Virtual screening as primary source for hits» Small companies without large compound collections