Bisphosphonate Inhibition of a Plasmodium Farnesyl Diphosphate. Synthase and a General Method for Predicting Cell-Based Activity.

Similar documents
Predicting Crystal Packing of Olanzapine Solvates using Random Forest

Supplementary Materials: Phenolic Constituents of Medicinal Plants with Activity against Trypanosoma brucei

In silico prediction and experimental evaluation of furanoheliangolide sesquiterpene. lactones as potent agents against Trypanosoma brucei rhodesiense

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Estimation of biliary excretion of foreign compounds using properties of molecular structure

Hydrogen Bonding & Molecular Design Peter

Structural biology and drug design: An overview

Kristin P. Bennett. Rensselaer Polytechnic Institute

Structure-Activity Modeling - QSAR. Uwe Koch

Alkane/water partition coefficients and hydrogen bonding. Peter Kenny

Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites. J. Andrew Surface

QSAR, QSPR, statistics, correlation, similarity & descriptors

Plan. Day 2: Exercise on MHC molecules.

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Chapter 8: Introduction to QSAR

The Calculation of Physical Properties of Amino Acids Using Molecular Modeling Techniques (II)

The Linear Chain as an Extremal Value of VDB Topological Indices of Polyomino Chains

QSAR studies of cytotoxic acridine 5,7-diones: A comparative study using P-VSA descriptors and topological descriptors

Similarity Search. Uwe Koch

Supporting Information (Part II) for ACS Combinatorial Science

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Comparing Multi-Label Classification Methods for Provisional Biopharmaceutics Class Prediction

Introduction to Chemoinformatics and Drug Discovery

COMPUTER AIDED DRUG DESIGN (CADD) AND DEVELOPMENT METHODS

All-atom Molecular Mechanics. Trent E. Balius AMS 535 / CHE /27/2010

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

Atomic and molecular interaction forces in biology

Comparison of log P/D with bio-mimetic properties

MODEL FOR PREDICTING SOLUBILITY OF FULLERENES IN ORGANIC SOLVENTS. Speaker: Chun I Wang ( 王俊壹 )

Ligand-based QSAR Studies on the Indolinones Derivatives Bull. Korean Chem. Soc. 2004, Vol. 25, No

Nonlinear QSAR and 3D QSAR

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Comparative QSAR- and Fragments Distribution Analysis of Drugs, Druglikes, Metabolic Substances, and Antimicrobial Compounds

User Guide for LeDock

Comparing Multilabel Classification Methods for Provisional Biopharmaceutics Class Prediction

Physical Chemistry Final Take Home Fall 2003

Supporting Information. Thermodynamics of Bisphosphonate Binding to Human Bone: A Two-Site Model

Acta Chim. Slov. 2003, 50,

Kirkwood-Buff Integrals for Aqueous Urea Solutions Based upon the Quantum Chemical Electrostatic Potential and Interaction Energies

Virtual screening for drug discovery. Markus Lill Purdue University

Introduction. OntoChem

Electronic Supplementary Information Effective lead optimization targeted for displacing bridging water molecule

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

On the Randić Index of Polyomino Chains

Discrete Applied Mathematics

Conformational Sampling of Druglike Molecules with MOE and Catalyst: Implications for Pharmacophore Modeling and Virtual Screening

molecules ISSN

Coefficient Symbol Equation Limits

Quiz QSAR QSAR. The Hammett Equation. Hammett s Standard Reference Reaction. Substituent Effects on Equilibria

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

Lec.1 Chemistry Of Water

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Chapter 2 - Water 9/8/2014. Water exists as a H-bonded network with an average of 4 H-bonds per molecule in ice and 3.4 in liquid. 104.

Docking. GBCB 5874: Problem Solving in GBCB

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

ADME evaluation in drug discovery

Multiscale Materials Modeling

Impersonality of the Connectivity Index and Recomposition of Topological Indices According to Different Properties

Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics

Chemical library design

QSAR of Microtubule Stabilizing Dictyostatins

A Review on Computational Methods in Developing Quantitative Structure-Activity Relationship (QSAR)

FORCES THAT DETERMINE LIGAND-RECEPTOR INTERACTIONS

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part I: Motivation, Basics, Descriptors

Available online through ISSN

QSAR in Green Chemistry

Prediction of Organic Reaction Outcomes. Using Machine Learning

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

Classification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques

Ajeet, Jour. Harmo. Res. Pharm., 2013, 2(1), 45-53

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Organic Chemistry, 7 L. G. Wade, Jr. 2010, Prentice Hall

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 01

Using AutoDock for Virtual Screening

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

The Eccentric Connectivity Index of Dendrimers

Computational Fragment-based Discovery of Allosteric Modulators on Metabotropic. Glutamate Receptor 5. Yuemin Bian

QSAR Study of Quinazoline Derivatives as Inhibitor of Epidermal Growth Factor Receptor-Tyrosine Kinase (EGFR-TK)

3rd Advanced in silico Drug Design KFC/ADD Molecular mechanics intro Karel Berka, Ph.D. Martin Lepšík, Ph.D. Pavel Polishchuk, Ph.D.

Prediction and QSAR Analysis of Toxicity to Photobacterium phosphoreum for a Group of Heterocyclic Nitrogen Compounds

NAME. EXAM I I. / 36 September 25, 2000 Biochemistry I II. / 26 BICH421/621 III. / 38 TOTAL /100

One Q partial negative, the other partial negative Ø H- bonding particularly strong. Abby Carroll 2

Design and Synthesis of the Comprehensive Fragment Library

QSAR Models for the Prediction of Plasma Protein Binding

ENERGY MINIMIZATION AND CONFORMATION SEARCH ANALYSIS OF TYPE-2 ANTI-DIABETES DRUGS

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Pelagia Research Library

Advanced Medicinal Chemistry SLIDES B

Solvent & geometric effects on non-covalent interactions

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Mathematical model for determining octanol-water partition coeffi cient of alkanes

Statistical concepts in QSAR.

Other Cells. Hormones. Viruses. Toxins. Cell. Bacteria

Transcription:

Bisphosphonate Inhibition of a Plasmodium Farnesyl Diphosphate Synthase and a General Method for Predicting Cell-Based Activity from Enzyme Data Dushyant Mukkamala #,, Joo Hwan No #,, Lauren M. Cass 2, Ting-Kai Chang 3, and Eric Oldfield,3, Supporting Information Center for Biophysics & Computational Biology, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, IL 680, USA; 2 School of Molecular and Cellular Biology, University of Illinois at Urbana- Champaign, 505 South Goodwin Avenue, Urbana IL 680; 3 Department of Chemistry, University of Illinois at Urbana-Champaign, 600 S. Mathews Avenue, Urbana, IL 680, USA Table of Contents Supporting Tables... S2-S8 Supporting Figures... S9-S20 References... S2-S22 S

Table S. List of descriptors used in the Combinatorial Descriptor Search a Code Descripton Number of Occurences b log(pampa) Log of parallel artificial membrane permeability 0 log(caco-2) Log of permeability coefficient for transport into Caco-2 cells log P-neutral Log of the concentrations of un-ionized compound between water and octanol log D-pH 7.4 Log of the ratio of the sum of concentrations of solute s various forms between water and octanol clogp Log P calculated by Sybyl 7.3 mlogp Measured log P calculated by Sybyl 7.3 diameter Largest value in the distance matrix petitjean Value of (diameter - radius) / diameter. petitjeansc Petitjean graph Shape Coefficient 2 : (diameter - radius) / radius. VDistEq If m is the sum of the distance matrix entries then VdistEq is defined to be the sum of log 2 m - p i log 2 p i / m where p i is the number of distance matrix entries equal to i. VDistMa If m is the sum of the distance matrix entries then VDistMa is defined to be the sum of log 2 m - D ij log 2 D ij / m over all i and j. weinerpath Wiener path number: half the sum of all the distance matrix entries 3, 4 weinerpol Wiener polarity number: half the sum of all the distance matrix entries with a value of 3 3 BCUT_PEOE_0 BCUT_PEOE_ BCUT_PEOE_2 BCUT_PEOE_3 BCUT_SLOGP_0 BCUT_SLOGP_ BCUT_SLOGP_2 BCUT_SLOGP_3 BCUT_SMR_0 BCUT_SMR_ BCUT_SMR_2 BCUT_SMR_3 GCUT_PEOE_0 GCUT_PEOE_ GCUT_PEOE_2 GCUT_PEOE_3 The BCUT descriptors 5 are calculated from the eigenvalues of a modified adjacency matrix. Each ij entry of the adjacency matrix takes the value /sqrt(b ij) where b ij is the formal bond order between bonded atoms i and j. The diagonal takes the value of the PEOE partial charges. The resulting eigenvalues are sorted and the smallest, /3-ile, 2/3-ile and largest eigenvalues are reported. The BCUT descriptors using atomic contribution to logp (using the Wildman and Crippen SlogP method) instead of partial charge 6. The BCUT descriptors using atomic contribution to molar refractivity (using the Wildman and Crippen SMR method) instead of partial charge 6. The GCUT descriptors are calculated from the eigenvalues of a modified graph distance adjacency matrix. Each ij entry of the adjacency matrix takes the value /sqr(d ij) where d ij is the (modified) graph distance between atoms i and j. The diagonal takes the value of the PEOE partial charges. The resulting eigenvalues are sorted and the smallest, /3-ile, 2/3- ile and largest eigenvalues are reported. S2

Code GCUT_SLOGP_0 GCUT_SLOGP_ GCUT_SLOGP_2 GCUT_SLOGP_3 GCUT_SMR_0 GCUT_SMR_ GCUT_SMR_2 GCUT_SMR_3 a_count a_ic a_icm b_rotn b_rotr b_count b_rotn b_rotr b_single chi0v chi0v_c chiv chiv_c Descripton The GCUT descriptors using atomic contribution to logp (using the Wildman and Crippen SlogP method) instead of partial charge 6. The GCUT descriptors using atomic contribution to molar refractivity (using the Wildman and Crippen SMR method) instead of partial charge 6. Number of atoms (including implicit hydrogens). This is calculated as the sum of ( + h i) over all nontrivial atoms i. Atom information content (total). This is calculated to be a_icm times n. Atom information content (mean). This is the entropy of the element distribution in the molecule (including implicit hydrogens but not lone pair pseudo-atoms). Let n i be the number of occurrences of atomic number i in the molecule. Let p i = n i / n where n is the sum of the n i. The value of a_icm is the negative of the sum over all i of p i log p i. Number of rotatable single bonds. Conjugated single bonds are not included (e.g., ester and peptide bonds). Fraction of rotatable single bonds: b_rotn divided by b_heavy. Number of bonds (including implicit hydrogens). This is calculated as the sum of (d i/2 + h i) over all non-trivial atoms i. Number of rotatable bonds. A bond is rotatable if it has order, is not in a ring, and has at least two heavy neighbors. Fraction of rotatable bonds: b_rotn divided by b_heavy. Number of single bonds (including implicit hydrogens). Aromatic bonds are not considered to be single bonds. Atomic valence connectivity index (order 0) 7, 8. This is calculated as the sum of /sqrt(v i) over all heavy atoms i with v i > 0. Carbon valence connectivity index (order 0). This is calculated as the sum of /sqrt(v i) over all carbon atoms i with v i > 0. Atomic valence connectivity index (order ) 7, 8. This is calculated as the sum of /sqrt(v iv j) over all bonds between heavy atoms i and j where i < j. Carbon valence connectivity index (order ). This is calculated as the sum of /sqrt(v iv j) over all bonds between carbon atoms i and j where i < j. Number of Occurences b 3 S3

Code Descripton Molecular weight (including implicit hydrogens) Weight with atomic weights taken from CRC Handbook of Chemistry and Physics, CRC Press (994). b_heavy Number of bonds between heavy atoms. a_nc Number of carbon atoms: #{Z i Z i = 6}. chi0 Atomic connectivity index (order 0) 7, 8. This is calculated as the sum of /sqrt(d i) over all heavy atoms i with d i > 0. chi0_c Carbon connectivity index (order 0). This is calculated as the sum of /sqrt(d i) over all carbon atoms i with d i > 0. chi Atomic connectivity index (order ) 7, 8. This is calculated as the sum of /sqrt(d id j) over all bonds between heavy atoms i and j where i < j. chi_c Carbon connectivity index (order ). This is calculated as the sum of /sqrt(d id j) over all bonds between carbon atoms i and j where i < j. VAdjEq Vertex adjacency information (equality): -(-f)log 2(- f) - f log 2 f where f = (n 2 - m) / n 2, n is the number of heavy atoms and m is the number of heavy-heavy bonds. If f is not in the open interval (0,), then 0 is returned. VAdjMa Vertex adjacency information (magnitude): + log 2 m where m is the number of heavy-heavy bonds. If m is zero, then zero is returned. zagreb Zagreb index: the sum of d 2 i over all heavy atoms i. balabanj Balaban's connectivity topological index 9. PC+ Q_PC+ PEOE_PC+ PC- Q_PC- PEOE_PC- RPC+ Q_RPC+ PEOE_RPC+ RPC- Q_RPC- PEOE_RPC- Total positive partial charge: the sum of the positive q i. Q_PC+ is identical to PC+ which has been retained for compatibility. Total negative partial charge: the sum of the negative q i. Q_PC- is identical to PC- which has been retained for compatibility. Relative positive partial charge: the largest positive q i divided by the sum of the positive q i. Q_RPC+ is identical to RPC+ which has been retained for compatibility. Relative negative partial charge: the smallest negative q i divided by the sum of the negative q i. Q_RPC- is identical to RPC- which has been retained for compatibility. PEOE_VSA+0 Sum of v i where q i is in the range [0.00,0.05). 2 PEOE_VSA+ Sum of v i where q i is in the range [0.05,0.0). PEOE_VSA-0 Sum of v i where q i is in the range [-0.05,0.00). PEOE_VSA- Sum of v i where q i is in the range [-0.0,-0.05). Number of Occurences b 2 5 S4

Code Q_VSA_FHYD PEOE_VSA_FHYD Q_VSA_FNEG PEOE_VSA_FNEG Q_VSA_FPNEG PEOE_VSA_FPNEG Q_VSA_FPOL PEOE_VSA_FPOL Q_VSA_FPOS PEOE_VSA_FPOS Q_VSA_FPPOS PEOE_VSA_FPPOS Q_VSA_HYD PEOE_VSA_HYD Q_VSA_NEG PEOE_VSA_NEG Q_VSA_PNEG PEOE_VSA_PNEG Q_VSA_POL PEOE_VSA_POL Q_VSA_POS PEOE_VSA_POS Q_VSA_PPOS Descripton Fractional hydrophobic van der Waals surface area. This is the sum of the v i such that q i is less than or equal to 0.2 divided by the total surface area. The v i are calculated using a connection table approximation. Fractional negative van der Waals surface area. This is the sum of the v i such that q i is negative divided by the total surface area. The v i are calculated using a connection table approximation. Fractional negative polar van der Waals surface area. This is the sum of the v i such that q i is less than -0.2 divided by the total surface area. The v i are calculated using a connection table approximation. Fractional polar van der Waals surface area. This is the sum of the v i such that q i is greater than 0.2 divided by the total surface area. The v i are calculated using a connection table approximation. Fractional positive van der Waals surface area. This is the sum of the v i such that q i is non-negative divided by the total surface area. The v i are calculated using a connection table approximation. Fractional positive polar van der Waals surface area. This is the sum of the v i such that q i is greater than 0.2 divided by the total surface area. The v i are calculated using a connection table approximation. Total hydrophobic van der Waals surface area. This is the sum of the v i such that q i is less than or equal to 0.2. The v i are calculated using a connection table approximation. Total negative van der Waals surface area. This is the sum of the v i such that q i is negative. The v i are calculated using a connection table approximation. Total negative polar van der Waals surface area. This is the sum of the v i such that q i is less than -0.2. The v i are calculated using a connection table approximation. Total polar van der Waals surface area. This is the sum of the v i such that q i is greater than 0.2. The v i are calculated using a connection table approximation. Total positive van der Waals surface area. This is the sum of the v i such that q i is non-negative. The v i are calculated using a connection table approximation. Total positive polar van der Waals surface area. This is the sum of the v i such that q i is greater than 0.2. The v i are calculated using a connection table approximation. Number of Occurences b 3 3 4 3 2 S5

Code Descripton Number of Occurences b lip_acc The number of O and N atoms. opr_nrot The number of rotatable bonds 0. E Value of the potential energy. The state of all term enable flags will be honored (in addition to the term weights). This means that the current potential setup accurately reflects what will be calculated. E_ang Angle bend potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. E_ele Electrostatic component of the potential energy. In 2 the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. E_nb Value of the potential energy with all bonded terms disabled. The state of the non-bonded term enable 3 flags will be honored (in addition to the term weights). E_oop Out-of-plane potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term 5 weight is applied. E_sol Solvation energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. E_stb Bond stretch-bend cross-term potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. E_str Bond stretch potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. E_strain Local strain energy: the current energy minus the value of the energy at a near local minimum. The current energy is calculated as for the E descriptor. The local minimum energy is the value of the E descriptor after first performing an energy minimization. Current chirality is preserved and charges are left undisturbed during minimization. The structure in the database is not modified (results of the minimization are discarded). E_tor Torsion (proper and improper) potential energy. In the Potential Setup panel, the term enable flag is 3 ignored, but the term weight is applied. E_vdw van der Waals component of the potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. Kier First kappa shape index: (n-) 2 / m 2 8. Kier2 Second kappa shape index: (n-) 2 / m 2 8. Kier3 Third kappa shape index: (n-) (n-3) 2 / p 2 3 for odd n, and (n-3) (n-2) 2 / p 2 3 for even n 8. S6

Code Descripton KierA First alpha modified shape index: s (s-) 2 / m 2 where s = n + a 8. KierA2 Second alpha modified shape index: s (s-) 2 / m 2 where s = n + a 8. KierA3 Third alpha modified shape index: (n-) (n-3) 2 / p 2 3 for odd n, and (n-3) (n-2) 2 / p 2 3 for even n where s = n + a 8. KierFlex Kier molecular flexibility index: (KierA) (KierA2) / n 8. logs Log of the aqueous solubility This property is calculated from an atom contribution linear atom type model with r 2 = 0.90, ~,200 molecules. apol Sum of the atomic polarizabilities (including implicit hydrogens) with polarizabilities taken from CRC Handbook of Chemistry and Physics, CRC Press (994). Sum of the absolute value of the difference between bpol atomic polarizabilities of all bonded atoms in the molecule (including implicit hydrogens) with polarizabilities taken from CRC Handbook of Chemistry and Physics, CRC Press (994). Molecular refractivity (including implicit mr hydrogens). This property is calculated from an descriptor linear model with r 2 = 0.997, RMSE = 0.68 on,947 small molecules. Number of Occurences b 2 7 dipole Dipole moment calculated from the partial charges of the molecule. dipolex The x component of the dipole moment (external coordinates). dipoley The y component of the dipole moment (external 2 coordinates). dipolez The z component of the dipole moment (external coordinates). pmi Principal moment of inertia. 5 pmix x component of the principal moment of inertia 5 (external coordinates). pmiy y component of the principal moment of inertia 3 (external coordinates). pmiz z component of the principal moment of inertia 3 (external coordinates). rgyr Radius of gyration. vsa_acc Approximation to the sum of VDW surface areas of pure hydrogen bond acceptors (not counting acidic atoms and atoms that are both hydrogen bond donors and acceptors such as -OH). vsa_hyd Approximation to the sum of VDW surface areas of hydrophobic atoms. S7

Code Descripton vsa_pol Approximation to the sum of VDW surface areas of polar atoms (atoms that are both hydrogen bond donors and acceptors), such as -OH. SlogP Log of the octanol/water partition coefficient (including implicit hydrogens). This property is an atomic contribution model 6 that calculates logp from the given structure; i.e., the correct protonation state (washed structures). Results may vary from the logp(o/w) descriptor. The training set for SlogP was ~7000 structures. SlogP_VSA0 Sum of v i such that L i <= -0.4. SlogP_VSA2 Sum of v i such that L i is in (-0.2,0]. SlogP_VSA7 Sum of v i such that L i is in (0.25,0.30]. SlogP_VSA8 Sum of v i such that L i is in (0.30,0.40]. SlogP_VSA9 Sum of v i such that L i > 0.40. SMR Molecular refractivity (including implicit hydrogens). This property is an atomic contribution model 6 that assumes the correct protonation state (washed structures). The model was trained on ~7000 structures and results may vary from the mr descriptor. SMR_VSA2 Sum of v i such that R i is in (0.26,0.35]. SMR_VSA3 Sum of v i such that R i is in (0.35,0.39]. SMR_VSA5 Sum of v i such that R i is in (0.44,0.485]. SMR_VSA7 Sum of v i such that R i > 0.56. Water accessible surface area calculated using a ASA radius of.4 A for the water molecule. A polyhedral representation is used for each atom in calculating the surface area. Number of Occurences b ASA+ ASA- ASA_H ASA_P CASA+ CASA- DASA DCASA Water accessible surface area of all atoms with positive partial charge (strictly greater than 0). Water accessible surface area of all atoms with negative partial charge (strictly less than 0). Water accessible surface area of all hydrophobic ( q i <0.2) atoms. Water accessible surface area of all polar ( q i >=0.2) atoms. Positive charge weighted surface area, ASA+ times max { q i > 0 } 2. Negative charge weighted surface area, ASA- times max { q i < 0 } 2. Absolute value of the difference between ASA+ and ASA-. Absolute value of the difference between CASA+ and CASA- 2. 4 2 S8

Code Descripton Number of Occurences b FASA+ Fractional ASA+ calculated as ASA+ / ASA. FASA- Fractional ASA- calculated as ASA- / ASA. FASA_H Fractional ASA_H calculated as ASA_H / ASA. 4 FASA_P Fractional ASA_P calculated as ASA_P / ASA. 4 FCASA+ Fractional CASA+ calculated as CASA+ / ASA. FCASA- Fractional CASA- calculated as CASA- / ASA. 2 VSA van der Waals surface area. A polyhedral representation is used for each atom in calculating 2 the surface area. TPSA Polar surface area calculated using group contributions to approximate the polar surface area from connection table information only. The parameterization is that of Ertl et al. 3. density Molecular mass density: Weight divided by vdw_vol. vdw_area Area of van der Waals surface calculated using a connection table approximation. vdw_vol van der Waals volume calculated using a connection table approximation. dens Mass density: molecular weight divided by van der Waals volume as calculated in the vol descriptor. glob Globularity, or inverse condition number (smallest eigenvalue divided by the largest eigenvalue) of the covariance matrix of atomic coordinates. A value of indicates a perfect sphere while a value of 0 indicates a two- or one-dimensional object. std_dim Standard dimension : the square root of the largest eigenvalue of the covariance matrix of the atomic coordinates. A standard dimension is equivalent to the standard deviation along a principal component axis. std_dim2 Standard dimension 2: the square root of the second largest eigenvalue of the covariance matrix of the atomic coordinates. A standard dimension is equivalent to the standard deviation along a principal component axis. std_dim3 Standard dimension 3: the square root of the third largest eigenvalue of the covariance matrix of the atomic coordinates. A standard dimension is equivalent to the standard deviation along a principal component axis. vol van der Waals volume calculated using a grid approximation (spacing 0.75 A). logp(o/w) Log of the octanol/water partition coefficient (including implicit hydrogens). This property is calculated from a linear atom type model 4 [LOGP 998] with r 2 = 0.93, RMSE=0.393 on,827 molecules. a Descriptors searched and their descriptions are generally taken from MOE 4 b Times that the descriptor occurred in the top ranked descriptor combinations S9

Table S2. Top ten enzyme (FPPS) plus 2-descriptor combinations with their coefficients and relative contributions for the D. discoideum cell pic50 predictions. Dictyostelium discoideum Relative Relative Relative (FPPS) R 2 Importance of Importance of Importance of pic (cell) = pic 50 (enzyme) Descriptor B Descriptor C -20.76307 +0.4834 * pic 50 Enzyme -4.8055 * GCUT_PEOE_0 +3.20333 * BCUT_PEOE_3 0.69692.000000 0.48425 0.7295-2.3273 +0.57298 * pic 50 Enzyme +4.56066 * BCUT_SMR_3 +0.0669 * E_vdw -6.5020 +0.50365 * pic 50 Enzyme +3.0276 * BCUT_PEOE_3 +0.000724775 * CASA+ -6.9459 +0.49467 * pic 50 Enzyme +3.3292 * BCUT_PEOE_3 +0.00390 * ASA+ -3.8907 +0.52832 * pic 50 Enzyme +2.0550 * BCUT_PEOE_3 +0.040 * E_vdw 2.940 +0.4604 * pic 50 Enzyme -0.00635 * ASA_P +0.6382 * PEOE_PC+ -8.7972 +0.50884 * pic 50 Enzyme +2.97502 * BCUT_PEOE_3-0.62648 * PEOE_PC- -7.05598 +0.56036 * pic 50 Enzyme +6.2728 * BCUT_SMR_3 +0.000757926 * CASA+ -8.7530 +0.48545 * pic 50 Enzyme +3.48930 * BCUT_SMR_3 +0.4559 * PEOE_PC+ -6.09568 +0.4920 * pic 50 Enzyme +3.03006 * BCUT_PEOE_3 +0.35326 * FCASA+ 0.68450.000000 0.46094 0.38890 0.6745.000000 0.66355 0.393880 0.67085.000000 0.738924 0.4732 0.6695.000000 0.427270 0.353525 0.66825.000000 0.39290 0.638878 0.66574.000000 0.643330 0.362862 0.665.000000 0.585055 0.37024 0.66286.000000 0.375754 0.428755 0.66042.000000 0.678777 0.36338 S0

Table S3. Top ten enzyme (FPPS) plus 2-descriptor combinations with their coefficients and relative contributions for the L. donovani cell pic50 predictions. Leishmania donovani (FPPS) pic (cell) = R 2 a of pic 50 (enzyme) a of Descriptor B a of Descriptor C a 2.8234 +0.74773 * pic 50 Enzyme -.23653 * FCASA- -4.34579 * Q_VSA_FPOS -.52445 +0.74773 * pic 50 Enzyme -.23653 * FCASA- +4.34579 * Q_VSA_FNEG 2.63995 +0.5684 * pic 50 Enzyme +0.0038 * SlogP -2.7284 * b_rotr 0.86952 0.269777 0.95872.000000 0.86952 0.269777 0.95872.000000 0.80829.000000 0.6505 0.987653 2.63995 +0.0038 * SlogP -2.7284 * b_rotr +0.5684 * pic 50 Enzyme 0.80829 0.6505 0.987653.000000 2.4368 +0.67597 * pic 50 Enzyme +0.0062 * PEOE_VSA+0-4.4094 * Q_VSA_FPOS -.97773 +0.67597 * pic 50 Enzyme +0.0062 * PEOE_VSA+0 +4.40940 * Q_VSA_FNEG 2.78569 +0.67570 * pic 50 Enzyme -3.0289 * Q_VSA_FPOS +0.4382 * SlogP -0.22720 +0.67570 * pic 50 Enzyme +0.4382 * SlogP +3.0289 * Q_VSA_FNEG 5.2373 +0.35909 * pic 50 Enzyme +0.00796 * Q_VSA_HYD -0.9768 * KierA 0.80745.000000 0.80696 0.793464 0.80745 0.793464 0.80696.000000 0.80578.000000 0.86479 0.732693 0.80578.000000 0.732693 0.86479 0.80375 0.39957 0.726844.000000 4.83355 +0.30356 * pic 50 Enzyme -0.22903 * KierA +0.00805 * VSA 0.80300 0.233458.000000 0.738323 a Identical values are obtained when descriptors are linearly dependent S

Table S4. Top ten enzyme (UPPS) plus 2-descriptor combinations with their coefficients and relative contributions for the S. pneumoniae cell pic50 predictions. Streptococcus pneumoniae (UPPS) pic (cell) = 2.36039 +0.0952 * pic 50 Enzyme +5.5767 * E_oop +0.3064 * logs 2.7396-0.02058 * pic 50 Enzyme -0.2565 * dipoley -0.50252 * SlogP 4.02259-0.0469 * pic 50 Enzyme +0.4493 * logs -0.00 * pmiz.93265 +0.0258 * pic 50 Enzyme +5.82244 * E_oop -0.38762 * SlogP R 2 Relative Importance of pic 50 Enzyme of Descriptor B Relative Importance of Descriptor C 0.70859 0.206740 0.770366.000000 0.65294 0.036773 0.664638.000000 0.63628 0.55024.000000 0.38049 0.63029 0.059775 0.858685.000000 2.997 +0.0565 * pic 50 Enzyme -0.67 * dipoley +0.33943 * logs 3.29294 +0.608 * pic 50 Enzyme +8.55584 * E_oop -0.099 * vsa_hyd 3.00970-0.02670 * pic 50 Enzyme +0.30685 * logs -0.00045748 * pmiy 2.72405 +0.0829 * pic 50 Enzyme +0.0022 * E_nb +0.37898 * logs 0.97384 +0.2900 * pic 50 Enzyme -0.00022370 * pmiy +6.34230 * E_oop 2.4664 +0.03924 * pic 50 Enzyme +0.2363 * E_str +0.40087 * logs 0.6395 0.0774 0.460388.000000 0.60705 0.23066.000000 0.682444 0.59586 0.05789.000000 0.47237 0.59540 0.03209 0.334330.000000 0.5946 0.542273 0.827566.000000 0.59423 0.06525 0.332503.000000 S2

Table S5. Top ten enzyme (MurI) plus 2-descriptor combinations with their coefficients and relative contributions for the S. pneumoniae cell pic50 predictions. Streptococcus pneumoniae (MurI) pic (cell) = 0.52552 +0.67868 * pic 50 Enzyme -8.00526 * FASA_H -9.5894 * Q_VSA_FPOL 2.52026 +0.67868 * pic 50 Enzyme -9.5894 * Q_VSA_FPOL +8.00526 * FASA_P 0.936 +0.67868 * pic 50 Enzyme -8.00526 * FASA_H +9.5894 * Q_VSA_FHYD -7.0695 +0.67868 * pic 50 Enzyme +9.5894 * Q_VSA_FHYD +8.00526 * FASA_P R 2 a of pic 50 (enzyme) a of Descriptor B a of Descriptor C a 0.72033 0.760086 0.93032.000000 0.72033 0.760086.000000 0.93032 0.72033 0.760086 0.93032.000000 0.72033 0.760086.000000 0.93032.064 +0.73485 * pic 50 Enzyme +0.006 * pmiz -0.0089 * DCASA 0.9666 +0.7428 * pic 50 Enzyme -0.0333 * ASA_H +0.02650 * Q_VSA_HYD.0705 +0.73955 * pic 50 Enzyme +0.002 * pmiz -0.00932 * DASA 5.68232 +0.6840 * pic 50 Enzyme -5.36646 * Q_VSA_FPOL -0.00565709 * ASA_H 0.3586 +0.6840 * pic 50 Enzyme -0.00565708 * ASA_H +5.36646 * Q_VSA_FHYD 0.70598.000000 0.35278 0.42965 0.7004 0.68356.000000 0.97895 0.69820.000000 0.36724 0.40603 0.69080.000000 0.730430 0.673985 0.69080.000000 0.673985 0.730430-0.2623 +0.77505 * pic 50 Enzyme -0.0054943 * DCASA +0.54584 * PEOE_RPC+ 0.68937.000000 0.26402 0.32590 a Identical values are obtained when descriptors are linearly dependent S3

Table S6. Top ten enzyme (NS3 protease) plus 2-descriptor combinations with their coefficients and relative contributions for the HCV replicon cell pic50 predictions. Hepatitis C Virus (NS3 protease) pic (cell) = -.29239 +0.930 * pic_enzyme +0.58847 * log(pampa) -0.0007357 * pmiy R 2 of pic 50 (enzyme) of Descriptor B Relative Importance of Descriptor C 0.77744 0.730339.000000 0.305259-2.44335 +0.92235 * pic 50 Enzyme +0.55294 * log(pampa) +0.25772 * log(caco-2) 0.56 +0.84966 * pic 50 Enzyme +0.5598 * log(pampa) -0.56438 * std_dim2-2.082 +0.9600 * pic 50 Enzyme +0.5898 * log(pampa) -0.6375 * dipolex 0.740 0.785205.000000 0.265 0.73967 0.74448.000000 0.23869 0.7389 0.740897.000000 0.222535 -.0766 +0.9280 * pic 50 Enzyme +0.62367 * log(pampa) -0.24639 * PEOE_PC+ -.43622 +0.92293 * pic 50 Enzyme +0.65322 * log(pampa) -0.06097 * KierA2 -.52542 +0.926 * pic 50 Enzyme +0.64820 * log(pampa) -0.04420 * Kier2 -.58484 +0.9365 * pic 50 Enzyme +0.65058 * log(pampa) -0.05945 * Kier3 -.85996 +0.96397 * pic 50 Enzyme +0.6988 * log(pampa) -0.00374 * Q_VSA_PPOS -.6354 +0.95796 * pic 50 Enzyme +0.652 * log(pampa) -0.0030 * Q_VSA_POL 0.73569 0.700433.000000 0.9823 0.73564 0.665088.000000 0.97280 0.7358 0.672545.000000 0.95686 0.73454 0.6607.000000 0.93573 0.7345 0.73203.000000 0.203505 0.73394 0.732983.000000 0.203058 S4

Table S7. Top ten enzyme (HIV- integrase) plus 2-descriptor combinations with their coefficients and relative contributions for the HIV- integrase replicon cell pic50 predictions. MT4 cells (HIV- integrase) pic (cell) = -4.8373 +0.98896 * pic50_enzyme +.36334 * pcc50-0.00697 * Weight R 2 Relative Importance of pic 50 (enzyme) of Descriptor B Relative Importance of Descriptor C 0.68808 0.892564.000000 0.526629-3.90054 +0.79998 * pic50_enzyme +.24225 * pcc50-0.00990 * vsa_hyd -5.92926 +0.899 * pic50_enzyme +.37520 * pcc50-0.00047 * pmi -5.67093 +0.84883 * pic50_enzyme +.3573 * pcc50-0.00066 * pmix 0.68552 0.79238.000000 0.520038 0.68063 0.797385.000000 0.489095 0.67775 0.769595.000000 0.482305-2.5905 +0.7970 * pic50_enzyme +.22060 * pcc50-0.30594 * chiv -5.5367 +0.8507 * pic50_enzyme +.3449 * pcc50-0.987 * dipole -5.45844 +0.84242 * pic50_enzyme +.3334 * pcc50-0.9829 * dipolex -2.07656 +0.67785 * pic50_enzyme +.6006 * pcc50-0.00792 * vol -2.88642 +0.78860 * pic50_enzyme +.2043 * pcc50-0.850 * chi0v -6.535 +0.9486 * pic50_enzyme +.35583 * pcc50 +0.2086 * logs 0.6765 0.725509.000000 0.509833 0.6724 0.779798.000000 0.477547 0.67078 0.777374.000000 0.475734 0.66766 0.7898.000000 0.59278 0.6669 0.80639.000000 0.497274 0.66278 0.83025.000000 0.47726 S5

Table S8. Top ten enzyme (hlgp) plus 2-descriptor combinations with their coefficients and relative contributions for the rat hepatocytes pic50 predictions. Rat Hepatocytes (hlgp) pic (cell) = R 2 of pic 50 (enzyme) of Descriptor B of Descriptor C -2.48325 +0.74485 * pic 50 Enzyme +0.022 * PEOE_VSA_PNEG 0.732.000000 0.67022 0.45600 +0.00797 * Q_VSA_HYD -2.6987 +0.8722 * pic 50 Enzyme +0.0567 * PEOE_VSA_PNEG +0.38040 * E_str -2.47800 +0.7733 * pic 50 Enzyme +0.0257 * PEOE_VSA_PNEG +0.00356 * ASA_H -3.30393 +0.7724 * pic 50 Enzyme +0.02442 * PEOE_VSA_PNEG +3.06864 * FASA_H 0.70675.000000 0.453849 0.36004 0.70330.000000 0.66853 0.44954 0.7098.000000 0.748269 0.45509-0.23529 +0.7724 * pic 50 Enzyme +0.02442 * PEOE_VSA_PNEG -3.06864 * FASA_P -7.578 +0.89039 * pic 50 Enzyme +0.0755 * PEOE_VSA_PNEG -2.55393 * BCUT_PEOE_0-8.83556 +0.83245 * pic 50 Enzyme +0.0546 * PEOE_VSA_PNEG +2.90552 * GCUT_SMR_3-7.3859 +2.6295 * GCUT_SLOGP_3 +0.7988 * pic 50 Enzyme +0.0647 * PEOE_VSA_PNEG -7.3859 +0.7988 * pic 50 Enzyme +0.0647 * PEOE_VSA_PNEG +2.6295 * GCUT_SLOGP_3-2.00842 +0.735 * pic 50 Enzyme +0.0897 * PEOE_VSA_PNEG +0.00634 * vsa_hyd 0.7098.000000 0.748269 0.45509 0.7009.000000 0.4664 0.329457 0.69768.000000 0.439538 0.327327 0.69753.000000 0.345396 0.49233 0.69753.000000 0.49233 0.345396 0.69664.000000 0.64097 0.397957 S6

Table S9. Top ten enzyme (KDR kinase) plus 2-descriptor combinations with their coefficients and relative contributions for the NIH3T3 cell pic50 predictions. NIH 3T3 (KDR, type III receptor tyrosine kinase) pic (cell) = 0.82478 +0.8533 * pic 50 Enzyme +6.56543 * PEOE_VSA_FPOL +0.05504 * E_tor R 2 of pic 50 (enzyme) of Descriptor B of Descriptor C 0.60006.000000 0.4949 0.78790 0.665 +0.8624 * pic 50 Enzyme +0.05066 * E_tor +2.60955 * PEOE_VSA_FPNEG 9.32007 +0.79676 * pic 50 Enzyme -8.85532 * PEOE_RPC- -0.06827 * apol 0.7667 +0.82694 * pic 50 Enzyme -9.03382 * PEOE_RPC- -0.008527 * ASA 0.59907.000000 0.655445 0.485462 0.59487 0.766096 0.76070.000000 0.59352 0.795265 0.768062.000000 0.23883 +0.8099 * pic 50 Enzyme -20.8045 * PEOE_RPC- -0.00895460 * vdw_vol 0.3493 +0.793 * pic 50 Enzyme -20.464 * PEOE_RPC- -0.0264080 * vol 0.3493 +0.793 * pic 50 Enzyme -20.464 * PEOE_RPC- -0.0264080 * vol -0.2236 +0.959 * pic 50 Enzyme +0.03452 * E -0.05235 * E_nb.32645 +0.89890 * pic 50 Enzyme -0.024 * E_nb +0.0499 * E_tor 2.55668 +0.7228 * pic 50 Enzyme -0.75267 * KierA3 +.9503 * PEOE_PC+ 0.59328 0.730389 0.794900.000000 0.59276 0.729847 0.788320.000000 0.58789 0.708545 0.798883.000000 0.58380 0.500955 0.743845.000000 0.58330.000000 0.434552 0.520573 0.584 0.78385.000000 0.77589 S7

Table S0. Top ten enzyme (Akt kinase) plus 2-descriptor combinations with their coefficients and relative contributions for the MiaPaCa-2 cell pic50 predictions. MiaPaca-2 human pancreatic cancer cells (Akt Kinase) pic (cell) = 4.3372 +0.45265 * pic 50 Enzyme -0.0000929650 * pmix -4.23429 * GCUT_SLOGP_2 R 2 of pic 50 Enzyme of Descriptor B of Descriptor C 0.68326.000000 0.556839 0.42486 2.5569 +0.44882 * pic_enzyme -0.00009939 * pmi +0.00253 * ASA_H 4.4455 +0.39949 * pic 50 Enzyme -0.0000873893 * pmix -0.02504 * E_ele 4.23090 +0.4269 * pic 50 Enzyme -0.000087904 * pmix -.87949 * FASA_P 0.68020.000000 0.7354 0.499224 0.6797.000000 0.593093 0.458662 0.67557.000000 0.559220 0.42329 2.354 +0.4269 * pic 50 Enzyme -0.000087904 * pmix +.87949 * FASA_H 4.65687 +0.4086 * pic 50 Enzyme -0.0000778479 * pmi -4.34328 * GCUT_SLOGP_2 4.54500 +0.38640 * pic 50 Enzyme -.96877 * FASA_P -0.0000739953 * pmi 2.57623 +0.38640 * pic 50 Enzyme -0.0000739953 * pmi +.96877 * FASA_H 4.34247 +0.40352 * pic_enzyme -0.0025494 * ASA_P -0.000083233 * pmix 4.7925 +0.35970 * pic 50 Enzyme -0.000079880 * pmi -0.02523 * E_ele 0.67557.000000 0.559220 0.42329 0.6753.000000 0.608637 0.479355 0.675.000000 0.488967 0.6543 0.675.000000 0.6543 0.488967 0.67007.000000 0.435308 0.558505 0.66794.000000 0.642875 0.53363 S8

Figure S. Correlation plots for the cell and enzyme assays, and predicted cell activities from the training and test set data in Streptococcus pneumoniae (MurI) (ac), Hu-7 (HCV NS3 protease replication) (d-f) and MT4 (HIV- integrase) (g-i). (a) Plot showing the correlation between cell (S. pneumoniae) pic50 and enzyme (MurI) pic50 values. (b) Correlation between predicted cell pic50 values (from the best combination of enzyme plus two molecular descriptors) for the training set, with the experimental pic50. (c) Correlation between test set pic50 predictions obtained from a leave-two-out analysis. (d) Plot showing the correlation between cell (HCV/Hu-7) pic50 and enzyme (HCV NS3 protease) pic50 values. (e) Correlation between predicted cell pic50 values (from the best combination of enzyme plus two molecular descriptors) for the training set, with the experimental pic50. (f) Correlation between test set pic50 predictions obtained from a leave-two-out analysis. (g) Plot showing the correlation between cell (HIV/MT4) pic50 and enzyme (HIV- integrase) pic50 values. (h) Correlation between predicted cell pic50 values (from the best combination of enzyme plus two molecular descriptors) for the training set, with the experimental pic50. (i) Correlation between test set pic50 predictions obtained from a leave-two-out analysis. S9

Figure S2. Correlation plots for the cell and enzyme assays, and predicted cell activities from the training and test set data in rat hepatocytes (hlgp) (a-c), NIH3T3 (KDR kinase) (d-f) and MiaPaCa-2 (Akt kinase) (g-i). (a) Plot showing the correlation between cell (Rat hepatocytes) pic50 and enzyme (hlgp) pic50 values. (b) Correlation between predicted cell pic50 values (from the best combination of enzyme plus two molecular descriptors) for the training set, with the experimental pic50. (c) Correlation between test set pic50 predictions obtained from a leave-two-out analysis. (d) Plot showing the correlation between cell (NIH 3T3) pic50 and enzyme (KDR kinase) pic50 values. (e) Correlation between predicted cell pic50 values (from the best combination of enzyme plus two molecular descriptors) for the training set, with the experimental pic50. (f) Correlation between test set pic50 predictions obtained from a leave-two-out analysis. (g) Plot showing the correlation between cell (MiaPaCa-2) pic50 and enzyme (Akt kinase) pic50 values. (h) Correlation between predicted cell pic50 values (from the best combination of enzyme plus two molecular descriptors) for the training set, with the experimental pic50. (i) Correlation between test set pic50 predictions obtained from a leave-two-out analysis. S20

References. Sybyl 7.3, Tripos, Inc.: St. Louis, MO. 2. Petitjean, M., Applications of the Radius Diameter Diagram to the Classification of Topological and Geometrical Shapes of Chemical-Compounds. J. Chem. Inf. Comput. Sci. 992, 32, (4), 33-337. 3. Balaban, A. T., Five New Topological Indices for the Branching of Tree-Like Graphs. Theoretica Chimica Acta. 979, 53, 355-375. 4. Weiner, H., Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 947, 69, 7-20. 5. Pearlman, R. S.; Smith, K. M., Novel Software Tools for Chemical Diversity. Persp. Drug. Disc. Desc. 9/0/ 998, 339-353. 6. Wildman, S. A.; Crippen, G. M., Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 999, 39, (5), 868-873. 7. Hall, L. H.; Kier, L. B., The Nature of Structure-Activity Relationships and Their Relation to Molecular Connectivity. Eur. J. Med. Chem. 977, 2, 307. 8. Hall, L. H.; Kier, L. B., The Molecular Connectivity Chi Indices and Kappa Shape Indices in Structure-Property Modeling. Rev. Comp. Chem. 99, 2. 9. Balaban, A. T., Highly Discriminating Distance-Based Topological Index. Chem. Phys. Lett. 982, 89, (5), 399-404. 0. Oprea, T. I., Property Distribution of Drug-Related Chemical Databases. J. Comp. Aid Mol. Des. 2000, 4, 25-264. S2

. Hou, T. J.; Xia, K.; Zhang, W.; Xu, X. J., ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach. J. Chem. Inf. Comput. Sci. 2004, 44, 266-275. 2. Stanton, D.; Jurs, P., Development and Use of Charged Partial Surface Area Structural Descriptors in Computer-Assisted Quantitative Structure-Property Relationship Studies. Anal. Chem. 990, 62, 2323-2329. 3. Ertl, P.; Rohde, B.; Selzer, P., Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment-Based Contributions and Its Application to the Prediction of Drug Transport Properties. J. Med. Chem. 2000, 43, 374-377. 4. Molecular Operating Environment (MOE), Chemical Computing Group, Inc.: Montreal, Quebec, 2006. S22