OntoChem Software. Chemoinformatic Solutions for Life Sciences Problems

ntochem Software. Chemoinformatic Solutions for Life Sciences Problems ntochem GmbH H.-Damerow-Str. 4 Halle 612

Short Company verview Founded in 25 Dr. Lutz Weber (Roche, Morphochem) Prof. Ludger Wessjohann (IPB Halle, Monaco) 26 implementing new software and algorithms 27 first clients (Pharma, Agro, Biotech, Food & Fragrances) 8 projects cash flow positive 28 first financing round expansion (new space) investments head count from 5 to 12

Mission ur knowledge discovery is the non-trivial extraction of implicit, unknown, and potentially useful information from data. The knowledge discovery process uses data mining results (the process of extracting patterns from data) and transforms them into useful and understandable information. This information is not typically retrievable by standard techniques but is uncovered through the use of artificial intelligence (AI) techniques.

Technology Idea: Automation of Association Discovery Morphochem/Migragen example 23: Therapeutic goal: spinal cord injury Solution/Patent: treatment with fasudil erv cell growth is inhibited in spinal cord injury Rho-kinase inhibits nerv cell growth Rho-kinase inhibitors are known, e.g. Fasudil Fasudil, is in Phase II clinical development Fasudil (patent) is claimed for cardiovascular S Pharma buys patent H Patent: Fasudil as a treatment of spinal cord injuries

ntochem Searchspace patent space described targets and diseases described molecules (39 Mio., 6. drugs) ntochem s virtual compound library of druglike molecules with synthesis procedures (1...)

ntochem Searchspace patent space described targets and diseases described molecules (39 Mio., 6. drugs) ntochem s virtual compound library known molecule new application of druglike molecules with synthesis procedures (1...)

ntochem Searchspace patent space described targets and diseases described molecules (39 Mio., 6. drugs) ntochem s virtual compound library new molecule known application of druglike molecules with synthesis procedures (1...)

Intelligent Product Generation Priato Reaction Library Reactants Baeyer-Villiger ketone oxidation Baylis-Hillman vinyl alkylation Beckmann rearrangement Bischler-apieralski isoquinoline synthesis Friedel-Crafts reaction Friedlander quinoline synthesis Gabriel synthesis Grignard reaction Hell-Volhardt-Zelinski halogenation Products REACTR ChemAxon Reactor...

ChemAxon Related Large Chemical Databases (>1 billion compounds) non-combinatorial, non-markush is it technically feasible? Upload and search speed... How to generate... Which software is best... is it useful? Chemical similarity concepts SSS = screening with fingerprints + atom-by-atom-search (ABAS) SSS fingerprints are tuned to provide fast screening Will they work in case of large chemical databases? With they work with many similar compounds generated via reactions?

Large Databases For fast searching we need molecules in cache (?) Index 1 bln compounds approx 1 GB memory (ChemAxon) Can index be optimized, i.e. smaller? Disk s are becoming competitive, i.e. AS and solid state drives 15, rpm SCSI disk array s solid state drives same speed for random access as for sequential access Hardware comparison PC, 2 cores AMD64, 4GB RAM, Linux 9.2 Silicon Graphics Altix 3, 4 cores Itanium2, 12 GB RAM, Linux 9.2 InfitineStorage S3, 2.8TB UMAlink 6GB/sec

Search Speed Test database with 11 million compounds (PubChem) PC racle 1.2 Enterprise; JChem 3.2 peration jc_tanimoto( >.9) SGI peration jc_tanimoto( >.9) Query Structure umber f Hits SSS Time (ms) 788 1.587 88.827 131.892 2.343.464 812 788 31.945 59.829 399.85 5.943 c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 Query Structure c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 umber f Hits SSS Time (ms) 2.15 4.17 88.827 131.892 2.343.464 2.95 2.99 21.729 45.28 27.345 13.325 Screened Count Screening Time (ms) 773 1.536 89.117 677.719 2.68.366 Screened Count 767 771 846 994 1.53 5.943 Screening Time (ms) 2.79 4.125 89.117 677.719 2.68.366 2.57 2.69 2.189 2.536 3.232 13.325

Search Speed Test database with 4 million compounds (own compounds) racle 1.2 Enterprise; JChem 3.1 PC peration jc_tanimoto( >.9) SGI peration jc_tanimoto( >.9) Query Structure umber f Hits SSS Time (ms) 7.218 7.27 19 2.269 21.339 375.389 5.932.686 9.146 9.36 45.26 283.92 1.453.639 51.92 c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 Query Structure c1cncc2c(cnnc12)3cc3 C1C1c2cnnc3c(cncc23)C4=CSC=C4 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 umber f Hits SSS Time (ms) 6.392 6.952 19 2.269 21.339 375.389 5.932.686 6.711 7.41 39.341 114.84 67.951 36.529 Screened Count Screening Time (ms) 7.19 7.175 2 2.984 183.897 1.816.711 8.967.52 Screened Count 7.18 8.174 8.96 8.131 1.8 51.92 Screening Time (ms) 6.385 6.941 2 2.984 183.897 1.816.711 8.967.52 6.373 6.486 7.659 7.329 8.756 36.529

Search Speed bservations racle 1.2 Enterprise; JChem 3.2 easy to setup and integrate works out-of-the-box switch from Java 1.4 to 1.6 approx 1% speed increase on PC Java 1.6 not available for pure 64-bit Itanium2, but 1.5 with jrockit is similar Loading of data 6 days for 4 million compounds and standard racle and Jchem duplicates allowed... tuning needed : database (racle) commit transaction is slow screening uses 1 core ABAS uses all available cores

Tuning Hard- and Software for large DB Sun 46 Server, 16 cores, 128 GB RAM, Solaris 1 StorageTek 254, 16TB Two 4 Gb/sec Fibre Channel host ports SATA-II, 5 GB, 7,2-rpm ZFS - zetafile system JVM: Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM 1.6._5 S: amd64 SunS 5.1 JChem 5.2 PostgreSQL 8.1 works out-of-the-box! (exception IJC, because of special Solaris libraries)

Why PostgreSQL? pgsql-performance@postgresql.org Tables of up to 18 quadrillion rows with up to 1 gigabyte of data per row Up to 5 TB in one table Can utilize up to 128 GB RAM with large-database applications 5 to 1, concurrent active connections Up to 5, concurrent application users Includes C and standards-compliant JDBC drivers Drivers for DBC, PHP, Perl, C++, Python, Ruby,.ET, and other languages are available from the PostgreSQL community $ broad public / company support

Tuning Test database with 2 million compounds tuning PostgreSQL postgres.conf file: switch off synchronization for massive db upload - danger shared_buffers = 2 temp_buffers = 1 work_mem = 124 maintenance_work_mem = 16384 max_fsm_pages = 2 max_fsm_relations = 1 fsync = off full_page_writes = off Java 1.6 -d64 -server -Xmx4M programs make sure you are using 64-bit, e.g. fopen64() etc to load 2 million compounds takes 4 h instead of 24 h needs 26 GB RAM

Tuning Test database with 2 million compounds Sun C1C1c2cnnc3c(cncc23)C4=CSC=C4 SSS Time (ms) 16.422 36.828 CC(=)CC(C(C(=)Cc1ccccc1)c2c[nH]nc2c3 cccs3)c(=)c 14.991 peration c1cncc2c(cnnc12)3cc3 jc_tanimoto( >.9) Query Structure C(C(=)Cc1ccccc1)c2c[nH]nc2c3cccs3 c1ncc2ncnc2n1 c1c()cccc1 =Cc1ccccc1 =C1C(1c2ccccc2)c3ccccc3 speed difference to SGI 1x umber f Hits 1.287.459 2.96.817 34.43.442 Screened Count Screening Time (ms) 16.394 36.793 14.961 14.91 63.364 1.44.36 84.312 11.351.9 355.138 43.854.27 69.12 14.878 19.3 18.863 2.47 69.12

Chemical Similarity Basic Assumption in Chemistry for Life Sciences: similar chemical structures have similar biological activities Empirical Taste, flavor Physicochemical properties Biological activity Prediction based on chemical structures Semiempirical and ab initio calculations (quantum chemistry) Docking into 3D structures (modelling) Structural similarity - based on atom connectivities (chemoinformatics)

Chemical Similarity today's method Pre-screening for Substructure and Similarity Similarity methods are based on substructure searching methods, typically a bitstring (e.g. with length 124) is calculated. Each bit (e.g. for benzene ring) occurs only once, even if more rings are in the molecule. String is hashed (e.g. one bit may have different meanings)... 1111111111.. halogene-7-bonds path bit set benzene bit set Software Isis-Base & Isis-Host (MDL) Daylight H Tripos ChemFinder ChemAxon InfoChem

ntochem Topological Torsions ntochem has developed and validated a better similarity search method: Using topological torsions: are composed of topologically connected 4 atom sequences: atom(1)-atom(2)-atom(3)-atom(4) Properties are than added : atom type, charge, π-electrons, attached hydrogens, subsequently the multiplicity of each ToTo is counted ToTo_MACPH: e.g. benzene ToTo: pyrazine ToTo: 12 611 611 611 611 6 611 71 611 71 6 71 611 71 611

Topological Torsion example A small molecule has typically up to 1 ToTo s, calculated by smi2 program: 4 6 11 6 11 6 117 8 6 11 6 11 6 1 6 11 8 6 11 6 1 6 11 6 11 2 6 11 6 1 7 6 3 2 6 11 6 1 7 6 1 2 6 11 6 11 6 1 7 8 6 1 6 11 6 11 6 1 4 17 6 1 6 11 6 11 1 6 1 7 6 1 8 1 1 6 1 7 6 1 6 2 2 7 6 1 6 11 6 11 1 7 6 1 6 2 7 2 6 3 7 6 1 6 11 1 6 3 7 6 1 8 1 1 6 3 7 6 1 6 2 2 6 1 7 6 1 6 11 4 6 1 6 2 7 6 2 1 8 1 6 1 7 6 1 1 8 1 6 1 7 6 3 1 8 1 6 1 6 2 7 1 6 2 6 1 7 6 1 1 6 2 6 1 7 6 3 8 6 2 7 6 2 6 2 1 7 6 2 6 1 7 1 7 6 2 6 1 8 1 4 7 6 2 6 2 7 4 6 2 7 6 2 6 1 8 6 2 6 2 7 6 2 2 7 6 2 6 1 6 11 2 6 2 6 1 6 11 6 11 2 6 11 6 1 6 2 7 2 6 11 6 11 6 1 6 2

Topological Torsion s chemical similarity validation ToTo similarity allows better classification of compounds than by other known 2D methods (see also ilakatan 1987 to Sheridan 24) ToTo - Tanimoto 1a 1b 2a 1a: 1..37.19 1b:.37 1..16 2a:.19.16 1. 2b:.2.15.3 2b.2.15.3 1. 2b.45.33.55 1. H H.37 JChem - Tanimoto 1a 1b 2a 1a: 1..38.44 1b:.38 1..43 2a:.44.43 1. 2b:.45.33.55.15.2.3 Dopamine D4 antagonists 1a 1b.19 Histamine H3 receptor ligands 2a.16 2b

Topological Torsion s search speed Similarity searching in 2 Mio db JChem similarity, PostgreSQL: 4 sec ntochem ToTo similarity Sun disk array, file is divided into 16 parts, one for each core: 12 sec

Application example MDM2-P53 inhibitors project task: propose new compounds, similar to known inhibitors but with different scaffold, patentable easy to synthesise water soluble Me H I CH 1 (utlin-3a) H 2 (TPD222669)

Application example Step1: ToTo search in vendors database Step2: generate 3D, align 54 compounds aligned with utlin H

Application example Extract protein pocket: Compare 3D similarity: M3dsml program with moloc.ch

Application example step 2-3D filtering F R H 1 R=F 2 R=Br 3 H H F 4 5 Result: hitlist

Application example step 3 - search in Reaction database & synthesis: 2 R H H H2 1 1 R R 2 + R + H 1 H2 R reflux F H H Result: compounds are active & selective filed patents publication X-6 X-7 H H H H X-552 X-561

Inhibitors - MR Binding Studies Mdm2 protein MR and Biacore binding studies (T Holak, Biochemistry, 21) X522 binds reversibly to the Mdm2 p53 utlin binding site, Disrupts a preformed p53-mdm2 complex, Behaves well: no protein precipitation or unfolding From 13 known inhibitors, only X s and utlins behave well