ESPRESSO (Extremely Speedy PRE-Screening method with Segmented compounds) 1

Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 ESPRESS 1,4,a) 2,4 2,4 1,3 1,3,4 1,3,4 - ESPRESS (Extremely Speedy PRE-Screening method with Segmented cmpounds) 1 Glide HTVS ESPRESS 2,900 200 ESPRESS: An ultrafast compound pre-screening method based on compound decomposition Keisuke Yanagisawa 1,4,a) Shunta Komine 2,4 Shogo D. Suzuki 2,4 Masahito hue 1,3 Takashi Ishida 1,3,4 Yutaka Akiyama 1,3,4 Abstract: Recently, the number of available protein tertiary structures and compounds has increased. However, structure-based virtual screening is computationally expensive due to docking simulations. Thus, methods that filter out obviously unnecessary compounds prior to computationally expensive docking simulations have been proposed. However, the calculation speed of these methods is not fast enough to evaluate more than 10 million compounds. In this study, we proposed a novel, docking-based pre-screening protocol named ESPRESS (Extremely Speedy PRE-Screening method with Segmented cmpounds). Partial structures (fragments) are often common among several compounds; therefore, the number of fragment variations needed for evaluation is smaller than that of compounds. ur method increased calculation speeds approximately 200-fold compared to conventional methods. Keywords: computational drug discovery, virtual screening, compound decomposition, pre-screening 1 Department of omputer Science, School of omputing, Tokyo Institute of Technology 2 Department of omputer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology 3 Advanced omputational Drug Discovery Unit (ADD), Institute of Innovative Research, Tokyo Institute of Technology 4 1. ( Finding a needle in a haystack ) Education Academy of omputational Life Sciences (ALS), Tokyo Institute of Technology a) yanagisawa@bi.cs.titech.ac.jp c 2016 Information Processing Society of Japan 1

[1] in vitro virtual screening (VS) [2] (structure-based virtual screening, SBVS) SBVS 3 SBVS Protein Data Bank (PDB) 2015 11.4 2014 2015 20% [3] ZI [4] 3,400 SBVS - [5] AutoDock Vina[6] 1-500 PU [7] 1 ZI 3,400 500 PU (pre-screening) [8] (ligand-based) (structure-based) 2 [7], [9] [10] [11] [7] Glide HTVS (high-throughput virtual screening) [12] Panther[13] 3,400 ZI 1 PU F H H l l H 3 1 l H 3 ESPRESS F Glide SP, AutoDock Vina - ESPRESS (Extremely Speedy PRE-Screening method with Segmented cmpounds) - 2. ESPRESS 2.1 ESPRESS 1 3 2.1.1 Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 [14] ( 1 ) H H l c 2016 Information Processing Society of Japan 2

Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 例外ケース孤原の統合 2 4 / ( 2 ) 3 / 2 1 2.1.2 AutoDock Vina[6] Glide[12] GLD[15] 2.1.3 4 ( 1 ) (SUM) 1 SUM ( 2 ) MAX MAX = max score f (2) f 1 MAX ( 3 ) Generalized Sum, GS GS x = x f (score f ) x (3) GS 1 SUM GS MAX GS SUM MAX GS 2 GS 3 GS score f 0 3.2 GS 3 ESPRESS GS 3 2.2 DUD-E (Directory of Useful Decoys, Enhanced)[16] DUD-E 102 ZI [4] all purchasable all boutique 28,629,602 SUM = f score f (1) 2.3 ESPRESS 2.1.1 ++ 2.1.3 Python c 2016 Information Processing Society of Japan 3

GPLv3 http://www.bi.cs.titech.ac.jp/espresso/ Glide SP Glide HTVS Glide HTVS 2.4 TSUBAME 2.5 Thin Thin 6 Intel Xeon X5670 PU 2 54 GB 12 PU 2.5 Glide 1 1 PU PU Enrichment Factor (EF)[17] EF x% = Pos x%/all x% Pos 100% /All 100% (4) All x% x% Pos x% x% x% EF 1% EF 2% 2.6 ESPRESS ( 1 ) 2%, 5%, 10% ( 2 ) Glide SP 1%, 2%EF 1% EF 2% 3. ESPRESS 2 Glide HTVS 3.1 1 ZI 28,629,602 Glide SP 1 ZI 28,629,602 DUD-E 3 PU Glide HTVS DUD-E [PU hours] ESPRESS-SP ESPRESS-HTVS Glide HTVS 2 AES 42.6( 76.8) 22.8( 143.1) 3268.8 EGFR 38.9( 126.4) 21.5( 229.3) 4925.1 PGH1 41.8( 88.0) 20.9( 175.4) 3674.5 DUD-E 102 ESPRESS ESPRESS-SP Enrichment Factors (EF) 2% 1% 5% 1% 10% 1% 5% 2% 10% 2% SUM 4.63 6.73 8.79 3.96 5.43 MAX 9.09 10.93 11.85 7.49 8.24 GS 2 7.32 10.12 11.89 6.22 7.71 GS 3 9.61 12.78 15.03 8.05 9.89 SUM 4.53 6.85 8.80 4.05 5.35 MAX 9.30 9.91 12.25 6.40 8.20 ESPRESS-HTVS GS 2 7.08 9.68 11.70 5.77 7.21 GS 3 9.07 12.10 14.43 7.38 9.26 Glide HTVS 17.85 18.97 19.60 12.50 12.92 ( ) a% b% a% Glide SP EF b% ESPRESS-SP 2 PU Glide HTVS ESPRESS-HTVS 1 PU Glide HTVS 140 PU ESPRESS 200 3.2 2 DUD-E 102 EF GS 3 SUM, MAX ESPRESS-SP ESPRESS-HTVS GS 2 GS 3 ESPRESS-SP ESPRESS Glide HTVS 4. 4.1 Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 3.1 ESPRESS 200 ZI 28,629,602 263,319 c 2016 Information Processing Society of Japan 4

Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 4 DUD-E AES ZI (logp) (A) ZI 0.1% ESPRESS-SP 0.1% (B) (A) () DUD-E AES (A) (B) GS 3 MAX GS 3 3 ZI drugs now 10,639,555 100 1 1 hembl (version 21) [18] 1,583,897 127,360 Pubhem [19] 1000 88,527,810 2,082,185 3 4.2 3.2 GS 3 SUM 4 4.3 Drwal [7] 3 ESPRESS DUD-E AES ZI ESPRESS-SP SVM Glide HTVS 3 0.1% ESPRESS-SP DUD-E 4 Tanimoto 5 4 ESPRESS [20] c 2016 Information Processing Society of Japan 5

Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 ( 3 ) V c > kv p k 1 k k = 1 k = 1 k k k = 1.5 750 600 ESPRESS ligand efficiency[23] 5 DUD-E AES Tanimoto ZI 0.1% ZISVM 0.1% SVMGlide HTVS 0.1% Glide HTVSESPRESS-SP 0.1% ESPRESS 5 SVM Tanimoto 4.4 4 ESPRESS ( 1 ) V c Zhao [21] V c = V a 5.92 B 14.7R A 3.8R A (5) a c V a B R A R A ( 2 ) V p Sitemap[22] 4.5 ESPRESS DUD-E AES 4.4 ESPRESS-SP ZI ZI12181222 6(A) 395.5 logp 1.84 [10] 6(B) 6() ESPRESS 6() 5. ESPRESS Glide HTVS 2,900 200 GS 3 4.3 4.4 c 2016 Information Processing Society of Japan 6

Vol.2016-MPS-108 o.18 Vol.2016-BI-46 o.18 H 2 H 2 6 (A)ESPRESS-SP AES ZI12181222(B)ZI12181222 () Glide SP (A) (B) () JSPS (A) (24240044) JST REST EBD: [1] Klon A.E. et al.: Finding more needles in the haystack: a simple and efficient method for improving highthroughput docking results, J. Med. hem. 47, 2743 2749, 2004. [2] Sliwoski G. et al.: omputational methods in drug discovery, Pharmacol. Rev. 66, 334 395, 2014. [3] Rose P.W. et al.: The RSB Protein Data Bank: views of structural biology for basic and applied research and education, ucleic Acids Res. 43, D345 D356, 2015. [4] Irwin J.J. et al.: ZI: a free tool to discover chemistry for biology, J. hem. Inf. Model. 52, 1757 1768, 2012. [5] Meng X. et al.: Molecular docking: a powerful approach for structure-based drug discovery, urr. omput. Aided Drug Des. 7, 146 157, 2011. [6] Trott. et al.: AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient otimization, and multithreading, J. omput. hem. 31, 455 461, 2010. [7] Drwal M.. et al.: ombination of ligand- and structure-based methods in virtual screening, Drug Discov. Today Technol. 10, e395 e401, 2013. [8] Kumar A. et al.: Hierarchical virtual screening approaches in small-molecule drug discovery, Methods 71, 26 37, 2015. [9] Ferreira L.G. et al.: Molecular docking and structurebased drug design strategies, Molecules 20, 13384 13421, 2015. [10] Lipinski.A. et al.: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Deliv. Rev. 23, 3 25, 2001. [11] Ripphausen P. et al.: State-of-the-art in ligand-based virtual screening, Drug Discov. Today 16, 372 376, 2011. [12] Friesner R.A. et al.: Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy, J. Med. hem. 47, 1739 1749, 2004. [13] iinivehmas S.P. et al.: Ultrafast protein structurebased virtual screening with Panther, J. omput. Aided Mol. Des. 29, 989 1006, 2015. [14] : -, 2015-BI-42, 1 8, 2015. [15] Verdonk M.L. et al.: Virtual screening using proteinligand docking: avoiding artificial enrichment, J. hem. Inf. omput. Sci. 44, 793 806, 2003. [16] Mysinger M.M. et al.: Directory of Useful Decoys, Enhanced (DUD-E): better ligands and decoys for better benchmarking, J. Med. hem. 55, 6582 6594, 2012. [17] Hamza A. et al.: Ligand-based virtual screening approach using a new scoring function, J. hem. Inf. Model. 52, 963 974, 2012. [18] Bento A.P. et al.: The hembl bioactivity database: an update, ucleic Acids Res. 42, 1083 1090, 2014. [19] Kim S. et al.: Pubhem substance and compound databases, ucleic Acids Res. 44, D1202 1213, 2016. [20] Verdonk M.L. et al.: Virtual screening using proteinligand docking: avoiding artificial enrichment, J. hem. Inf. omput. Sci. 44, 793 806, 2004. [21] Zhao Y.H. et al.: Fast calculation of van der Waals volume as a sum of atomic and bond contribu-tions and its application to drug compounds, J. rg. hem. 68, 7368 7373, 2003. [22] Halgren T.A.: Identifying and characterizing binding sites and assessing druggability, J. hem. Inf. Model. 49, 377 389, 2009. [23] Shultz M.D.: Setting expectations in molecular optimizations: Strengths and limitations of commonly used composite parameters, Bioorg. Med. hem. Lett. 23, 5980 5991, 2013. c 2016 Information Processing Society of Japan 7