CHAPTER-2. Drug discovery is a comprehensive approach wherein several disciplines

Similar documents
Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Three Dimensional Pharmacophore Modelling of Monoamine oxidase-a (MAO-A) inhibitors

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Ping-Chiang Lyu. Institute of Bioinformatics and Structural Biology, Department of Life Science, National Tsing Hua University.

Receptor Based Drug Design (1)

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Introduction to Chemoinformatics and Drug Discovery

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

In Silico Investigation of Off-Target Effects

Ligand Scout Tutorials

Data Mining in the Chemical Industry. Overview of presentation

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

In silico pharmacology for drug discovery

Introduction. OntoChem

Statistical concepts in QSAR.

Design and Synthesis of the Comprehensive Fragment Library

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

FRAUNHOFER IME SCREENINGPORT

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Different conformations of the drugs within the virtual library of FDA approved drugs will be generated.

Implementation of novel tools to facilitate fragment-based drug discovery by NMR:

Using AutoDock for Virtual Screening

Structural biology and drug design: An overview

Biologically Relevant Molecular Comparisons. Mark Mackey

The Schrödinger KNIME extensions

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

Data Quality Issues That Can Impact Drug Discovery

Supplementary information

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Performing a Pharmacophore Search using CSD-CrossMiner

Docking. GBCB 5874: Problem Solving in GBCB

Structure-Activity Modeling - QSAR. Uwe Koch

DOCKING TUTORIAL. A. The docking Workflow

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Early Stages of Drug Discovery in the Pharmaceutical Industry

Identifying Interaction Hot Spots with SuperStar

Reaxys Medicinal Chemistry Fact Sheet

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

The PhilOEsophy. There are only two fundamental molecular descriptors

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

Targeting protein-protein interactions: A hot topic in drug discovery

Notes of Dr. Anil Mishra at 1

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

MM-GBSA for Calculating Binding Affinity A rank-ordering study for the lead optimization of Fxa and COX-2 inhibitors

Building 3D models of proteins

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

5.1. Hardwares, Softwares and Web server used in Molecular modeling

Preparing a PDB File

PROVIDING CHEMINFORMATICS SOLUTIONS TO SUPPORT DRUG DISCOVERY DECISIONS

Part 6. 3D Pharmacophore Modeling

Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites. J. Andrew Surface

Ultra High Throughput Screening using THINK on the Internet

Introduction to Structure Preparation and Visualization

Advanced Medicinal Chemistry SLIDES B

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

Structural Bioinformatics (C3210) Molecular Docking

Conformational Searching using MacroModel and ConfGen. John Shelley Schrödinger Fellow

Integrated Cheminformatics to Guide Drug Discovery

The Conformation Search Problem

Combinatorial Heterogeneous Catalysis

Chapter 8 Notes. An Introduction to Metabolism

Version 1.2 October 2017 CSD v5.39

Adenosine Kinase Inhibitor Design Based on Pharmacophore Modeling

Schrodinger ebootcamp #3, Summer EXPLORING METHODS FOR CONFORMER SEARCHING Jas Bhachoo, Senior Applications Scientist

Life Sciences 1a Lecture Slides Set 10 Fall Prof. David R. Liu. Lecture Readings. Required: Lecture Notes McMurray p , O NH

MOLECULAR DRUG TARGETS

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

CHAPTER 3. Pharmacophore modelling studies:

CSD. CSD-Enterprise. Access the CSD and ALL CCDC application software

Society for Biomolecular Screening 10th Annual Conference, Orlando, FL, September 11-15, 2004

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Chapter Cells and the Flow of Energy A. Forms of Energy 1. Energy is capacity to do work; cells continually use energy to develop, grow,

Similarity Search. Uwe Koch

Interactive Feature Selection with

Development of a Structure Generator to Explore Target Areas on Chemical Space

Plan. Day 2: Exercise on MHC molecules.

An Introduction to Metabolism

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

User Guide for LeDock

Cheminformatics analysis and learning in a data pipelining environment

Nonlinear QSAR and 3D QSAR

Supplementary Discussion:

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

ENERGY MINIMIZATION AND CONFORMATION SEARCH ANALYSIS OF TYPE-2 ANTI-DIABETES DRUGS

An Integrated Approach to in-silico

Principles of Drug Design

Virtual screening for drug discovery. Markus Lill Purdue University

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses

Introduction to Spark

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Hydrogen Bonding & Molecular Design Peter

Parkinson's disease (PD) is a chronic and progressive neurological. disorder that affects the motor system and is characterized by tremor, rigidity,

Quantum Mechanical Models of P450 Metabolism to Guide Optimization of Metabolic Stability

Cheminformatics platform for drug discovery application

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

Transcription:

36 CAPTER-2 Molecular Modeling Analog Based Studies Drug discovery is a comprehensive approach wherein several disciplines are used to design or discover the drugs. The R&D expenditure incurred to bring a new chemical entity (CE) to the market is estimated to be around $1.3 billion. The principal reason to this increase in the cost is the decline in efficiency in transforming a lead, preclinical candidates from 75% to 50% and the rate of declining of compounds from phase 2 to phase 3 clinical trials from 50% to 30%.68 in spite of new technology and better perceptive of biological systems, still the drug discovery is time consuming (~15 years) with low rates of success. This situation insists alternate methods and techniques that bring down both the cost and the time with increase in the success rate. 2.1 Pharmacophore studies Pharmacophore is one of the analog based methods. In early 1900s the word pharmacophore was coined by Paul Ehrlich with an explanation to a molecular framework that carries the essential features (phoros) responsible for a compound or drug (pharmacon) biological activity. In 1977 Peter Gund defined pharmacophore as a set of structural features in a molecule that are accepted at a receptor site and is accountable for that molecule biological activity

37 Pharmacophore means molecular framework that carries the essential features (phoros) responsible for a compound or drug (pharmacon) biological activity set of structural features in a molecule that are accepted at a receptor site and is accountable for that molecule biological activity ty Pharmacophore features like 1. ydrogen bond acceptor 2. ydrogen bond donor 3. ydrophobic 4. ydrophobic aliphatic 5. ydrophobic aromatic 6. Positive ionizable 7. egative ionizable 8. Ring aromatic Manual Pharmacophore Generation: Below are the steps involved in the design of Pharmacophore model47 1 Visual recognition of familiar structural and chemical features among the active molecules which are missing in the inactive molecules 2 Measurement of the 3D aspects of the common features among the set of compounds with each other

38 3 Enhancement of a outlined Pharmacophore and validation of the generated model such that the model accurately fits the active compounds and fails to fit the inactive compounds 4 Improvement of the Pharmacophore model by validating to compounds with known activity in the databases, until the preferred result is achieved. Some compounds are to be inactive because i It does not contain chemical groups in the geometry necessary for identification by the target protein, means it does not match with the requisite Pharmacophore. ii Even though it contains the Pharmacophore, it also contains groups that interfere with recognition and that can be detected by a subsequent Quantitative structure activity relationship studies. iii It contains groups that avert interaction with the target protein, one more potency declining property that can be detected by Quantitative structure activity relationship studies. In the absence of a crystal structure of a target for which the active site of the receptor binding is clearly identified one can rely on the structure activity studies for a given set of ligands. If these ligands are known to bind to the same target, we can derive a commonality between them. The generated pharmacophore model can be used in the below scenarios

39 In the process of identifying new candidate molecule Prioritization lead molecules In the compound library generation Predicting activity for the new compounds As a search query for mining the databases Accelry s Catalyst48 program will generate two types of chemical feature based hypotheses based on the availability biological activities of the compounds. When the activity data is integrated, the Catalyst ypogen module will be used. Feature based models derived by ypogen have been successfully used to suggest new guidelines in lead generation and in lead discovery for searching a chemical databases to identify new structural classes of possible lead candidates. When there is no biological activity is available for the compounds the Catalyst ipop will be used to hypothesis building, and only common chemical features are identified. ipop: Common feature based alignments Pharmacophore model or ypothesis, consists of a three dimensional configuration of chemical functions surrounded by tolerance spheres. A

40 acceptance sphere defines that area in space that should be occupied by a specific type of chemical functionality. Each chemical function will be assigned a weight and it describes the relative importance of function inside the hypothesis. A better weight indicates that the respective feature is more significant in describing the activity than other combined parts of the hypothesis. When the compound set is less than 15 and biological data is absent, one can generate a hypothetical model based on common feature alignments using Catalyst/ipop module.48-50 ipop module is used to align a set of compounds based on their common chemical features. ipop module identifies all configurations or three-dimensional spatial arrangements of chemical features which are common in the set of molecules in training set. The configurations of compounds are identified by pruned exhaustive search, beginning with small sets of compounds and extending until no longer configuration is found. The user defines the number of compounds that should map completely or partially to the generated hypothesis. This option allows the user to generate broader and more different hypotheses. If a pharmacophore model is less likely to map the active compound, then it will be given higher rank and at the same time the reverse is also correct. Principle specifies the reference molecule(s) reference configuration models are potential centers for hypotheses

41 We need to mark each compound with number 0 don t consider these molecules 1 consider configurations of this molecule 2 use this compound as a reference molecule used only for ipop hypothesis generation Maxmit Features specifies how many features for each compound have to omit 0 all features must map to generated hypotheses 1 all but one features must map to generated hypotheses 1 no features need to map to generated hypotheses used only for ipop hypothesis generation ypogen: Quantitative Pharmacophore Models It creates SAR hypothesis models48-50 from a set of molecules for which activity values are known. ypogen selects pharmacophore that are common among the active compounds but not among the inactive compounds and then optimizes the pharmacophores using simulated annealing. The top pharmacophores can be used to predict the activity of unknown compounds or to search for new possible leads contained in 3D chemical databases.

42 ypogen generates hypotheses that are set of features in 3D space, each containing a certain tolerance and weight that fit to the features of the training set, and that correlate to the activity data. The hypotheses are created in three phases Constructive, subtractive and optimization phase. The constructive phase identifies hypotheses that are common among active compounds, the subtractive phase removes hypotheses that are common among the inactive compounds, and the optimization phase attempts to improve the initial hypotheses.. Therefore, the hypotheses models can be used as search queries to mine for potential leads (Figure 2.1) from a three-dimensional database or in the form of an equation to predict the activity of a potential lead. Figure 2.1: Lead optimization using Pharmacophores

43 Running ypogen Ideal Training set Should contain at least 16 compounds to assure statistical power Activities should span 4 orders of magnitude Each order of magnitude should contain 3-4 compounds o redundant information & o excluded volume problems ypogen is done in three phases, a constructive, subtractive and optimization phase (Figure 2.2). Figure 2.2: ypotheses generation in Catalyst

44 ypogen calculates the cost of two theoretical hypotheses, one in which the cost is minimal (Fixed cost), and one where the cost is high (ull cost). Each optimized hypothesis cost should have a value between these two values and should be closer to the Fixed than the ull cost. Randomized studies have found that if a returned hypothesis has a cost that differs from the ull hypothesis by 40-60 bits, it has 75-90% chance of representing a true correlation in the data. Another useful number is the Entropy of hypothesis space. If this is less than 17, a thorough analysis of all the models will be carried out. Constructive phase Constructive phase is very similar to ipop algorithm. This is done in several steps: 1) All active compounds are identified 2) All hypotheses (maximum 5 features) among the two most active compounds are identified and stored 3) Those that fit the remaining active compounds are kept Subtractive phase In this phase, the program removes hypotheses from the data structure that are not likely to be useful. The hypotheses that were created in the

45 constructive phase are inspected and if they are common to most of the inactive compounds then they are removed from consideration. ptimization phase The optimization is done using the well-known algorithm simulated annealing. The algorithm applies small perturbations to the hypotheses created in the constructive and subtractive phases in an attempt to improve the score. yporefine The yporefine algorithm is an extension of the Catalyst ypogen algorithm for generating SAR-based pharmacophore models which can be used to estimate activities of new compounds. yporefine helps to improve the predictive models generated from a dataset by a better correlating hypothesis with the stearic properties that contribute to biological activity. In addition, yporefine can help overcome over-prediction of inactive compounds with pharmacophore features in common with other active compounds in the dataset, where inactivity is due to stearic clashes with the target. Interpreting the cost parameters in the output files During an automated hypothesis generation run, Catalyst considers and discards many thousands of models. It distinguishes between alternatives by applying a cost analysis. The overall assumption is based on ccam s razor; that is between equivalent alternatives, the simplest model is best. In general, if this difference is greater than 60 bits, there is an excellent chance of the model

46 to represent a true correlation. Since most returned hypotheses are higher in cost than the fixed cost model, a difference between fixed cost and null cost of 70 or more is necessary to achieve the 60 bits difference. 48-50 Fixed cost Cost of the simplest possible hypothesis (initial) ull cost Costs when each molecule estimated as mean activity acts like a hypothesis with no features Weight cost A value that increases in a Gaussian form as the feature weight in a model deviates from an idealized value of 2.0. This cost factor favors hypotheses in which the feature weights are close to 2. The standard deviation of this parameter is given by the weight variation parameter. Error cost A value that increases as the rms difference between estimated and measured activities for the training set molecules increases. This cost factor is designed to favor models for which the correlation between estimated and measured activities is better. The standard deviation of this parameter is given by the uncertainty parameter. Configuration cost

47 A fixed cost depends on the complexity of the hypothesis space being optimized. It is equal to the entropy of the hypothesis space. This parameter is constant among all the hypotheses. The main assumption made by ypogen is that an active molecule should map more features than an inactive molecule. In other words, the molecule is inactive because a) it misses important feature or b) the feature is present but cannot be oriented in correct space. Based on this assumption, the most active molecule in the dataset should map to all features of the generated hypotheses. Validity of the pharmacophore model is determined by its ability to retrieve known active molecules from the various known databases (Figure 2.3).47 Database molecules Database molecules its Actives Actives Database molecules t a Actives Figure 2.3: Database searching using pharmacophore models

48 Pharmacophore Validation Percent yield of actives: % Y = a / t x 100 Percent ratio of the activities in the hit list: % A = a / A x 100 Enrichment (enhancement) E= a / t A /D False negatives: False positives: Goodness of fit = a t xd xa A - a t - a = a (3A + T) 4 t A x 1- t - a D-A The best hit list is obtained when there is perfect overlap of the hit list to the known active compounds in the database. This occurs when both conditions a = t and a = A, hence a = t= A, are satisfied, which is a nearly impossible case to achieve in a real-life situation.

49 In reality, there may be many compounds in the database that may be active but either have not been listed as active, or have not been tested for specific activity. In either case, these compounds end up in the False positives list. ence we consider the list of false positives as opportunities for potential leads. The objective is to improve the hit list in such a manner that the false positives can contain a large number of potential leads. False negatives list is nothing but missing the retrieval of active molecules from database. The best hit list is the one that retrieves all the actives and nothing else (i.e., t = a= A); False negatives = 0, false positives = 0. The worst list is the one that retrieves everything else but the known actives in the database (i.e., a = 0, t = D-A) False negatives = A, false positives = D-A. The G score gives a good indication of how good the hit list is with respect to a compromise between maximum yield and maximum percent of activities retrieved. The Table 2.1 provides an acceptable sorting of the hit lists, from best to worst, via the G score. The Goodness of it formula is a convenient way to quantify hit lists obtained from searches with various queries.

50 Table 2.1: Goodness of fir score values Case %Y %A Enrichment False False negatives positives G Best 100 100 500 0 0 1 Typical Good 40 80 200 20 120 0.60 Extreme Y 100 1 500 99 0 0.50 Extreme A 0.2 100 1 0 49,900 0.50 Typical Bad 5 50 25 50 950 0.26 Worst 0 0 0 100 49,900 0 2.2 Introduction to MA Isoforms Monoamine xidase (MA) is a flavinadenosine dinucleotide containing enzyme located at the outer membranes of mitochondria in the brain, liver, intestinal mucosa, and other organs. It catalyzes the oxidative deamination of biogenic amines (neuroamines, vasoactive and exogenous amines), including dopamine, serotonin, norephinephrine, tyramine, tryptamine, and MPTP neurotoxin. The end products are aldehydes and 22 that are involved in oxidative cellular processes25,51. MA exists in two isoforms, i.e., MA-A and MA-B in humans and both are 60 kda outer-mitochondrial membrane-bound flavoenzymes that

51 share 70% sequence identities52,23. Due to distinct and overlapping specificities of MA-A and MA-B in the oxidative deamination of neurotransmitters and dietary amines, the development of specific reversible inhibitors has been a long sought goal. Expression levels of MA-B in neuronal tissue increase 4-fold with age,45,53 resulting in an increased level of dopamine metabolism and the production of higher levels of hydrogen peroxide, which are thought to play a role in the etiology of neurodegenerative diseases such as Parkinson s and Alzheimer s diseases.45,54 MA inhibitors demonstrated remarkable antidepressant action but their clinical value was seriously compromised with side effects like cheese reaction 18,55. These serious side effects stimulated a search for antidepressants that are not MA inhibitors and to their eventual replacement using the uptake inhibitors, the tricyclic antidepressants and more recently the serotonin selective re-uptake inhibitors (Prozac). Despite the general lack of interest, Knoll & Magyar persisted in their study with an irreversible MA inhibitor, l-deprenyl, derived from propargylamine 37-39,45,56. It is a selective MA-B inhibitor at low doses and inhibited the oxidative deamination of dopamine, phenylethylamine and benzylamine but at higher doses the selectivity of the compound was lost. The compound also was evaluated as an antidepressant, devoid of the cheese reaction 18,45. Thus, the development of specific, reversible MA-B inhibitors could lead to clinically useful neuroprotective agents. Compounds with similar structural motifs can lead to the same biological effect. It is also well accepted that bioactive ligands that bind to a common receptor must fulfill certain chemical and geometric

52 criteria. For this purpose a knowledge-based approach is based on another popular technique applied in the drug discovery, known as scaffoldhopping57,58 where the goal is to jump in chemistry space, i.e., to discover a new structure starting from a known active compound via the modification of the central core of this molecule.59 In the previous studies pharmacophore model was generated with the consideration of Thiazole derivatives and Thiosemicarbazide derivatives with features like three hydrogen bond acceptors, one hydrophobic feature and one aromatic ring. 60 In the present study, the ligand and structure based design studies were done using Catalyst and Glide to design selective MA-B inhibitors.61-64 We have considered diverse set of structural motifs like Thio- and semicarbazides, beta-carbolines, cisand trans-resveratrol derivatives, Eugenol derivatives, Pyrazoles, xazolidinones, Phenylcyclopropylamines, and Indan derivatives to describe the knowledge-based design, identification and optimization of a new lead. 65-75 Most of the rationally designed and clinically useful inhibitors (revesible or irreversible) are competitive inhibitors. We have included both reversible and irreversible inhibitors in the training set to get the optimized features necessary for the enzyme inhibition. Plan of work The work done shows how chemical features for Selective MA-B inhibitors along with their activities ranging over several orders of magnitudes can be used to generate pharmacophore hypotheses and can successfully

53 predict the activity. The validated pharmacophore model is used to retrieve molecules from a virtual library and the retrieved hits/leads were further refined using the docking studies to reduce the number of false positives and false negatives. This virtual screening approach can be used to identify and design inhibitors with greater selectivity. Catalyst 4.11 software was used to generate pharmacophore models. GLIDE (Schrodinger, L.L.C., ew York) docking programs were used for the structure based studies. 2.3 Database Mining and Training Set The compound selection process implemented is shown as a flowchart in Figure 2.4. The pharmacophore model (ypo2) was used as a query to screen an in house database consisting of 80,000 compounds which were passed drug like filters. The search retrieved 5500 compounds were selected for cluster analysis and a set of 530 cluster representative hits with predicted IC 50 value of less than 10 µm were chosen for docking onto the active site of MA-B. ighly scored compounds were selected to carry out the docking analysis onto the active site of MA-A.

54 Figure 2.4: Schematic representation of in silico screening protocol implemented in the identification of MA-B inhibitors.

55 2.4 Pharmacophore Model Generation and Validation Pharmacophore modeling correlates activities with the spatial arrangement of various chemical features in a set of active analogues. A set of 70 human MA-B inhibitors65-73 with an activity range (IC50) spanning over 5 orders of magnitude, i.e., 0.014 980 µm were selected. Molecules were chosen based on the MA-B inhibitory assay tested under similar experimental conditions. This initial group was then divided into the training and test sets. The training set of 22 molecules was designed to be structurally diverse with a wide activity range. The training set molecules play a critical role in the pharmacophore generation process and the quality of the resultant pharmacophore models relies solely on the training set molecules. The test set of remaining 48 molecules is designed to evaluate predictive ability of the resultant pharmacophore. ighly active ( or <20 µm), moderately active ( or 20 50 µm) and inactive (+ or >50 µm) compounds were added to the training set to obtain critical information on pharmacophore requirements for MA-B inhibition. The molecules selected as the training set are given in Figure 2.5. This training set was then used to generate quantitative pharmacophore models. Qualitative pharmacophore models were generated using a set of highly active molecules. To confirm essential features prevailing among the MA-B inhibitors, 10 common feature hypotheses were generated using the most active molecules 1 6 (Figure 2.5). The common features for all 10 hypotheses

56 are hydrogen bond donor, hydrogen bond acceptor and ring aromatic features. owever, these models cannot be directly used to predict biological activity of the compounds retrieved from a database. We have generated quantitative pharmacophore models to predict the biological activities of novel compounds. While generating the quantitative model, a minimum of 0 to a maximum of 5 features involving BA, BD, and RA features were selected and used to build a series of hypotheses using a default uncertainty value of 3. The quality of ypogen models are best described by Debnath and Vadivelan74-77 in terms of Fixed Cost, ull Cost and total Cost and other statistical parameters. According to which, a large difference between the fixed cost and null cost, and a value of 40 60 bits for the unit of cost would imply 75 90% probability for experimental and predicted activity correlation. In general, pharmacophore models should be statistically significant, predict the activity of molecules accurately, and retrieve active compounds from a database. The derived pharmacophore models were validated using a set of parameters including cost analysis, test set prediction, enrichment factor, and goodness of hit. ipop and ypogen modules within Catalyst were then used to generate qualitative pharmacophore and quantitative pharmacophore models, respectively.

57 F 3 (2.6 µm) 2 (0.08 µm) F S 7 (13.0 µm) 9 (19.0 µm) 8 (17.0 µm) S 11 (22.0 µm) 10 (20.0 µm) F F 6 (8.9 µm) 5 (5.7 µm) 4 (3.0 µm) 1 (0.06 µm) 14 (38.0 µm) 13 (30.0 µm) 12 (25.0 µm) 19 (88.0 µm) 18 (65.0 µm) 20 (92.0 µm) S S S 17 (61.0 µm) 16 (50.0 µm) 15 (40.0 µm) S 21 (140.0 µm) S S 22 (240.0 µm) Figure 2.5: Training set molecules for MA-B inhibitors. IC50 values of each molecule are given in parentheses.

58 2.5 Results and Discussion The best common feature pharmacophore model 68 indicated the importance of -bond acceptor (BA), -bond donor (BD) and ring aromatic (RA) features, which were further confirmed in the quantitative models. Several quantitative models were generated utilizing the training set (1 22) along with MA-B inhibitory activities (Figure 2.5 and Table 2.2). The top ten hypotheses were composed of BA, BD, and RA features. The values of ten hypotheses such as cost, correlation (r), and root-mean-square deviations (rmsd) are statistically significant (Table 2.3). It is evident that as error, weight and configuration components are very low and not deterministic to the model, the total pharmacophore cost is also low and close to the fixed cost. Also, as total cost is less than the null cost, this model accounts for all the pharmacophore features and has good predictive ability. In addition to an estimation of activity of the training set molecules, the pharmacophore model should also accurately predict the activity of the test set molecules. Two statistical methods were employed to rank the ten resultant hypotheses. In the first method, all ten hypotheses were evaluated using a test set of 48 known MA-B inhibitors, which are not included in the training set. Predicted activities of the test set were calculated using all ten hypotheses and correlated with experimental activities. f the ten hypotheses, ypo2 showed a better correlation coefficient (0.945) compared to the other nine hypotheses. A second statistical test includes calculation of false positives, false negatives, enrichment, and

59 goodness of hit to determine robustness of hypotheses. Under all validation conditions, ypo2 performed superior as compared to the other nine hypotheses. ypo2 demonstrated excellent prediction of MA-B inhibitory activities of the training set compounds (Table 2.2). Analyzing the results, it was observed that out of the 9 highly active molecules, all were predicted correctly as highly active. Among the 6 moderately active molecules, except one, which was predicted as highly active and the rest were correctly predicted. ut of the 6 low active molecules, 3 were predicted as moderately active and the rest was predicted as low active. Activities of the compounds were not only correctly predicted but also the fit values confer a good measure of how well the pharmacophoric features of ypo2 were mapped onto the chemical features of the compounds. Figure 2.6A shows the ypogen pharmacophore features with their geometric parameters, all features of ypo2 (BA, BD and RA) were mapped onto the highly active compounds of the training set (2) as well as the moderate compound (8) shown in Figure 2.6B and Figure 2.6C respectively. Many of the low active compounds in the training set (22) were mapped partially by the features of ypo2 (Figure 2.6D). The correlation values along with the predictions above make the pharmacophore suitable to predict molecular properties well. The plot showing the correlation between the actual and predicted activities for the test set and the training set molecules is given in Figure 2.7

60 The purpose of the pharmacophore model generation is not just to predict the activity of the training set compounds accurately but also to verify whether the pharmacophore models are capable of predicting the activities of external compounds of the test set series and classifying them correctly as active or inactive. The molecules were classified as highly active ( or <20 µm), moderately active ( or 20 50 µm) and inactive (+ or >50 µm). ypo2 was used to search the test set of known MA-B inhibitors. Database mining was performed using the BEST flexible searching technique. The results were analyzed using a set of parameters such as hit list (t), number of active percent of yields (%Y), percent ratio of actives in the hit list (%A), enrichment factor (E), false negatives, false positives, and goodness of hit score (G) (Table 2.4)78. ypo2 succeeded in the retrieval of 84% of the active compounds. In addition, the pharmacophore also retrieved 6 inactive compounds (false positives) and predicted 2 active compounds as inactive (false negatives). An enrichment factor of 1.76 and a G score of 0.73 indicate the quality of the model. verall, a strong correlation was observed between the ypo2 predicted activity and the experimental MA-B inhibitory activity (IC50) of the training and test set compounds (Figure 2.7). owever, the ypo2 model has a greater tendency to show false positives. This could be attributed to high structural similarity in active and inactive MA-B inhibitors, resulting in an inability to discriminate this pattern by the pharmacophore model. We further extended this study to structure based design and to limit the number of false positive

61 and false negative hits and to further understand the binding of inhibitors to the active site of MA-B complex. Table 2.2: Experimental and Predicted IC50 data of 22 training set molecules against ypo2 model. Exp. Predicted Experimental Predicted IC50, μm IC50, μm scale c scale c 1 0.06 4.81 2 2.6 5.41 7.2 5.9 3.54 3 11.2 6.8 3.41 5 6.7 9.8 1.5 3.73 6 8.9 19 2.7 3.34 7 13 14 1.1 3.57 8 17 8.2-6 4.27 9 19 14-1.3 3.57 10 20 13-1.6 3.62 11 22 24 1.1 3.35 12 25 21-1.2 3.4 13 30 25-1.2 3.32 14 38 21-1.8 3.41 15 40 35-2 3.43 16 50 21-2.4 3.41 17 61 38-1.6 3.14 + 18 65 34-1.9 3.2 + 19 88 190 2.15 2.26 + + 20 92 39-2.4 3.14 + 21 140 210 1.5 2.24 + + 22 240 148-5 3.05 + + Molecule Error a Fit value b 0.114 2.4 0.08 0.11 3 2.6 4

62 a + Indicates that the Predicted IC50 is higher than the Experimental IC50; indicates that the Predicted IC50 is lower than the Experimental IC50; a value of 1 indicates that the predicted IC50 is equal to the Experimental IC50. b Fit value indicates how well the features in the Pharmacophore overlap the chemical features in the molecule. Fit = weight*[max(0,1-sse)] where SSE = (D/T)2, D= displacement of the feature from the center of the location constraint and T=the radius of the location constraint sphere for the feature (tolerance). c Activity scale IC50 <20 µm = (ighly active) IC50 20 50 µm = (Moderately active) IC50 >50 µm = + (Low active). Table 2.3: 10 Pharmacophore models generated by the ypogen for MA-B inhibitors. ypo o. Total cost Cost difference$ Error cost RMS deviation Training set 1 100.46 55.42 2 99.04 3 Features# 83.92 0.941 0.772 BABDRA 56.84 82.42 0.875 0.961 BABDRA 99.58 56.3 82.8 0.894 0.892 BABDRA 4 99.64 56.24 82.87 0.898 0.851 BABDRA 5 100.06 55.82 83.23 0.916 0.812 BABDRA 6 100.11 55.77 83.54 0.931 0.783 BABDRA 7 100.77 55.11 84.11 0.958 0.769 BABDRA 8 100.99 54.89 83.45 0.927 0.79 BABDRA 9 101.12 54.76 84.36 0.97 0.763 BABDRA 10 101.19 54.69 83.35 0.922 0.793 BABDRA (r) ypo 2 showed a better correlation coefficient (0.945) compared to the other nine hypotheses. a (ull costtotal cost), ull cost = 155.88, Fixed cost = 90.53, For the ypo-2 Weight = 1.28, Configuration = 15.41. All cost units are in bits. b BA ydrogen Bond Acceptor, BD ydrogen bond donor, RA Ring aromatic.

63 Figure 2.6: Pharmacophore model for MA-B inhibitors. (A) Three-dimensional arrangement of pharmacophore features in the quantitative pharmacophore model (ypo2). Pharmacophore features are: -bond acceptor (BA) as green, Ring aromatic (RA1) as orange and -bond donor (BD) as magenta. (B) ypo2 is mapped onto high active compound (2). (C) Mapping of ypo2 onto a moderately active compound 8. (D) Mapping of ypo2 onto an inactive compound 22.

64 9 Training set (22 molecules, r = 0.954) Test set (48 molecules, r = 0.945) Predicted pic50 8 7 6 5 4 3 3 4 5 6 7 8 9 Experimental pic50 Figure 2.7: Scatter plot shows correlation between experimental and ypo2 predicted activities of known MA-B inhibitors.

65 Table 2.4: Statistical parameters from screening of training and test set molecules. S. o Parameter MA B 1 Total molecules in database (D) 70 2 Total umber of actives in database (A) 34 3 Total its (t) 38 4 Active its (a) 32 5 % Yield of actives [(a/t)*100] 84.21 6 % Ratio of actives [(a/a)*100] 94.12 7 Enrichment factor (E) [(a*d)/(t*a)] 8 False egatives [A - a] 2 9 False Positives [t - a] 6 10 Goodness of it Score$ 0.73 1.76 $ [(a/4ta)(3a+t))*(1-((t-a)/(d-a))]; G Score of 0.7 0.8 indicates a very good model