Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark

Size: px
Start display at page:

Download "Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark"

Transcription

1 This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.org/crt Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark Hongbin Yang, Jie Li, Zengrui Wu, Weihua Li, Guixia Liu, and Yun Tang* Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai , China *S Supporting Information Downloaded via on November 15, 2018 at 20:28:59 (UTC). See for options on how to legitimately share published articles. ABSTRACT: Identification of structural alerts for toxicity is useful in drug discovery and other fields such as environmental protection. With structural alerts, researchers can quickly identify potential toxic compounds and learn how to modify them. Hence, it is important to determine structural alerts from a large number of compounds quickly and accurately. There are already many methods reported for identification of structural alerts. However, how to evaluate those methods is a problem. In this paper, we tried to evaluate four of the methods for monosubstructure identification with three indices including accuracy rate, coverage rate, and information gain to compare their advantages and disadvantages. The Kazius Ames mutagenicity data set was used as the benchmark, and the four methods were MoSS (graph-based), SARpy (fragment-based), and two fingerprint-based methods including Bioalerts and the fingerprint (FP) method we previously used. The results showed that Bioalerts and FP could detect key substructures with high accuracy and coverage rates because they allowed unclosed rings and wildcard atom or bond types. However, they also resulted in redundancy so that their predictive performance was not as good as that of SARpy. SARpy was competitive in predictive performance in both training set and external validation set. These results might be helpful for users to select appropriate methods and further development of methods for identification of structural alerts. INTRODUCTION substructures for alerts. 9 Although only a few substructures Chemical toxicity is an important issue in drug discovery and were identified from a small data set by Ashby, it was widely environmental risk assessment. In silico prediction of chemical accepted and developed. 10,11 Afterward, several toxicity end toxicity is less expensive and time-saving compared to points and structural types were intensively explored by experts, experimental in vitro or in vivo assays. 1 Although machine and their limitations such as applicability domain were also learning methods, such as support vector machine (SVM) and studied. 12,13 Therefore, detection of potential substructures for random forest (RF), have obtained high predictive accuracy, alert is highly desirable, which could quickly identify toxic they are considered as black boxes. Interpretable models are compounds from a large data set and guide drug design. more significant, friendly, and acceptable for researchers in Initially, experts identified structural alerts through exploring medicinal chemistry. 2 Structural alerts (SA), like an expert structure activity relationships (SARs) of toxic compounds 14 system, can help identify key substructures to certain toxicity to create an expert system. Derek Nexus 15 and Genetox Expert and avoid potential toxic compounds in very early stages. It has Alerts from LeadScope 16 are two representatives of commercial been widely applied not only in drug discovery, 3,4 but also in expert systems to predict toxicity. After that, automated other fields such as cosmetic research and environmental detection of structural alerts has become a hotspot in protection. 5 computational toxicology, and there are several excellent Structural alerts are defined as chemical substructures, whose methods and toolkits available. 17,18 presence may be related to the capability of a substance to In general, methodologies implemented to determine cause certain adverse effects to organs. 6 Chemical mutagenicity structural alerts may be roughly classified into fragmentbased, graph-based, and fingerprint-based approaches (Figure is one of the most widely studied end points for structural alerts. It is usually determined via the famous Ames test, named 1). The fragment-based approach first cuts the bonds of each after the inventor Prof. Bruce Ames, to detect potential compound to get all possible fragments and then calculates the mutagens. 7 The qualitative reproducibility of the Ames test frequency of the fragments occurring in toxic and nontoxic among laboratories is generally good. 8 As early as in 1988, Ashby found strong associations between chemical structures Received: March 27, 2017 and their mutagenicity to Salmonella and suggested 11 Published: May 9, American Chemical Society 1355

2 Figure 1. Three categories of approaches implemented to determine structural alerts. (A) Fragment-based methods that cut all possible bonds to obtain substructures. (B) Graph-based methods that search subgraphs by a levelwise algorithm. (C) Fingerprint-based methods that regard the predefined substructures as potential structural alerts. compound sets. For example, CASE 19 uses KLN 20 code, a linear expression for a compound, as descriptors to detect SAs. CatSAR 21 uses HQSAR, a module implemented in the commercial software SYBYL from Tripos Inc., to generate all possible substructures. Ferrari et al. developed a software named SARpy that can extract structural alerts from data set by fragmentation of all the compounds. 22 Golbamaki et al. explored new clues on carcinogenicity-related substructures via SARpy. 23,24 The graph-based approach defines chemical molecules as mathematic graphs that consist of a set of vertices and edges that respectively represent the atoms and bonds of compounds. Then the goal of detecting substructures is converted into searching subgraphs. 25 MolFea, 26 a linear substructure mining program, utilizes the Levelwise Version Space algorithm 27 to handle minimum and maximum frequency constraints. Nevertheless, the shortcoming of MolFea is obvious because it only mines linear substructures. This weakness was overcome by a popular algorithm named MoSS, 28 which utilizes depth-firstsearch association rules to mine substructures that considers side chains and cycles. Gaston 29 is another graph-based mining algorithm, which divides substructures into three subsets, namely paths, trees, and cycles. It allows user-defined node types like [N,O] that indicates nitrogen or oxygen. Ahlberg et al. 30 proposed a new graph-based mining method that uses Atom Signature as descriptor. Atom Signature, suggested by Faulon et al., 31 is a linear expression of a compound and can be calculated via Chemical Development Kit. 32 Molecular fingerprints can be viewed as a set of predefined fragments. 33 For example, the MDL MACCS (Molecular ACCess System) 34 fingerprint contains 166 public keys that represent the most common substructures. Thus, they can be used to detect potential structural alerts via calculation of their frequencies in a data set. All these methodologies have their advantages and defects in computing performance, predictive accuracy, or chemical space. They have been widely used in toxicological research and drug discovery However, to the best of our knowledge, there is no study to compare and evaluate these identification methods with a benchmark data set yet. In this paper, we compared four typical methods among the three types of approaches, namely MoSS (graph-based), SARpy (fragment-based), Bioalerts, and FP (both fingerprint-based), using a chemical Ames mutagenicity data set as the benchmark. We evaluated them from three aspects: (1) to assess the substructures identified by each method and analyze their rankings; (2) to compare the substructures with previous findings; and (3) to package the substructure sets detected by each method as rule-based prediction models to evaluate their predictive performance. Then we deeply analyzed the process of the detection systems and summarized their defects, which might be helpful for users to select appropriate methods and further development of methods for identification of structural alerts. MATERIALS AND METHODS Data Preparation. The most popular data set for chemical Ames mutagenicity is Hansen s benchmark that contains 6512 compounds. 39 Kazius data set, 40 a subset of Hansen s benchmark, is also a commonly used one. To ensure the compatibility, the Kazius data set was used as the benchmark here to compare different mining methods. The preparation of the data set was done by Kazius. 41 To validate the robustness of the methods, the rest of the Hansen s data set (after removing the compounds in Kazius data set) was used as the external validation set to further evaluate the mining methods. OpenBabel 42 was used to convert all the compounds into canonical SMILES format, which can be directly used to mine structural alerts. Methods for Identification of Structural Alerts. Four detecting methods were used in this study, namely MoSS, SARpy, Bioalerts, and FP (fingerprint). Mostly the substructures were described as SMILES, graphs, or fragments that only contain basic information on atoms and bonds. In FP, the substructures were described as SMARTS, an extension of SMILES, which contains more information about the atoms and bonds and allows use wildcard atom or bond types. SMARTS-based structural alerts are more diverse but more complicated. MoSS. MoSS is a graph-based method for structural alerts that generates fragments by embedding them in all molecules in parallel throughout the growth process. 28 This search strategy allows for a restricted depth first search algorithm, which results in excellent 1356

3 Figure 2. (A) Ideal substructure that separates the compounds into two subsets with lower entropy. The red part indicates the number of active (toxic) compounds, whereas the green part represents the inactive compounds; (B) function curve of entropy with variable of p that means the proportion of active (or inactive) compounds; (C) less desirable substructure in which though the active rate in positive set is high, the corresponding compounds are too rare, and the overall entropy changes little, that is, the IG is low. computing performance and theoretically covers all chemical space. MoSS was implemented in KNIME (V2.10.4). 43 We set the fragment size between 2 and 15, the minimum focus support (true positive rate) as 2%, the maximum complement support (false positive rate) as 10%, and other parameters were set as default. SARpy. SARpy ( is a free standalone tool that uses String mining to create substructures from the SMILES notation of training compounds. 22,23 Different from other methods, only entire branches or entire cycles are considered as potential structural alerts. SARpy evaluates existing substructures by likelihood ratio, a measure of precision intrinsic to the test. In the implementation of SARpy, the atom number was set between 2 and 15, the precision was set to OPTIMAL, and only positive rules (mutagens) were extracted. Bioalerts. Bioalerts ( is a python library for the derivation of structural alerts from toxicity data sets. It is a fingerprint-based method, which mainly relies on RDkit 44 implementation of Morgan fingerprints. 45 P-value is used in Bioalerts to evaluate the significance of a substructure. In Bioalerts, threshold_frequency was set to 0.75, and other parameters were set to default (p-value 0.05, nb 5). The structural alerts mined by Bioalerts were converted into SMARTS by RDKit. FP. The FP method was composed of three steps: (1) calculation of fingerprint; (2) removal of redundant substructures; and (3) assessment of potential structural alerts and removing redundant. PaDEL-Descriptor ( V2.20) 46 was used to calculate the fingerprints of MACCS (166 bits), PubChem (889 bits), 47 Klekota-Roth (4860 bits), 48 and Function Group Substructure (307 bits, classified by Christian Laggner and defined in OpenBabel). After the fingerprints of training compounds were calculated, we could get a matrix that shows the relationships between the compounds and the substructures defined in the fingerprint dictionary. Direct deletion of the duplicated substructures is impractical since we could not automatically determine whether two substructures represented by SMARTS are equivalent. Therefore, we first verified whether the sets of compounds matching the two substructures were the same, then manually judged whether they were the same substructure. Subtle differences between two substructures were ignored, and we regarded them as the same to avoid redundancy. For example, MACCS63 - [#7]=[#8], KR [!#1]N O, and KR N O all indicate double bond between a nitrogen and an oxygen. Though the meanings of the three SMARTS were different, they were considered to be the same, and the latter two were removed. We assessed and refined them through a threshold of ACC (accuracy rate) > Evaluation of Substructures and Predictive Models. Indices of Substructures for Assessment. Supposing that molecules can be categorized as mutagen or nonmutagen based on a binary property X, X will have two possible values (1 means mutagen and 0 means nonmutagen). For a potential structural alert (substructure identified by the four methods) T, we used condition t to define compounds that contain the substructure and used t to define compounds that do not contain the substructure. ACC is the rate between the number of mutagens that contain a certain substructure and the number of all compounds that contain the substructure (eq 1). Positive rate, also called coverage rate, is the probability of compounds containing the substructure (eq 2): ACC (accuracy rate) = PX ( = 1 t) (1) PR (positive rate) = Pt ( ) (2) where P(A) means probability of observing A. P(A B) is the probability of observing event A given the condition B. Besides the assessment methods mentioned earlier, another more complicated method called information gain (IG) was also used to assess a substructure. It was used in decision tree classification models to measure the relevance of an attribute to a category in data mining. 49 Shen et al. used this method to evaluate the importance of the substructures in MACCS fingerprint and labeled the highest substructures as structural alerts. 33 IG is defined as the difference between the information entropy of the origin data set and the weighted average information entropies of the two data sets separated 1357

4 by a substructure. 50 Eq 3 shows the definition of information entropy, in which p means the probability of molecules in one category. Eq 4 shows the formation of IG used in this study. An ideal substructure whose IG is very high can separate the data set into two: a high positive rate one and a high negative rate one (Figure 2a), in which the two separated data sets have a very low information entropy compared to the original data set. Information entropy as a function depends on the positive rate of a data set. When the data set is the most disordered, that is, positive rate equals 0.5, the entropy equals 1. When the data set is the most regular, that is, positive rate equals 0 or 1, the entropy equals 0. The function tendency is shown in Figure 2b. A substructure with high IG should not only have high ACC, but also be common enough (high PR). A substructure that has low PR with high ACC (Figure 2c) will get a low IG score: Ep ( ) = p log( p) (1 p) log(1 p) (3) IG = EPX ( ( = 1)) Pt ( ) EPX ( ( = 1 t)) Pt () EPx ( ( = 0 t)) (4) We calculated the three indices, IG, ACC, and PR, of all the potential structural alerts detected by the mentioned methods. OpenBabel and its python wrapper Pybel 51 were used to match SMARTS with compounds and verify equality of two compounds. Performance Evaluation of Predictive Models. For each predictive model, a sorted list of its structural alerts was generated. The structural alerts were sorted by either ACC or IG. Two indicators were calculated to show the accuracy and robustness of the models including positive coverage rate and negative coverage rate described further: PCR( L) = P( X = 1 C L ) (5) NCR( L) = P( X = 0 C L ) (6) where C L means the compounds that contain at least one substructure in the top L places of the potential structural alert list. C L means the compounds that contain none of the top L structural alerts. Comparison of Methods with Previous Work. The substructures identified by the four methods were then compared with the structural alerts reported in previous publications. ToxAlerts 52 collected several structural alerts for genotoxic carcinogenicity and mutagenicity from publications including Kazius et al., 40 Benigni et al., 53 Bailey et al., 54 and Ashby s. 7 We then collected these structural alerts and manually discarded the redundant patterns. If two alerts were similar, we would select the more general one. For example, there were three aromatic nitro alerts in ToxAlerts. Two of them were detected by Kazius, and the other was defined by Benigni R et al. 53 We selected the most general pattern, which is [a!r0][$([nx3+](=[ox1])([o-]))]. In addition, previously Kazius detected structural alerts with Gaston from the same data set as we used. 41 Therefore, we compared the results from Kazius with all the substructures identified by the four methods here and ranked these substructures with IG values. RESULTS Data Sets and Substructures Obtained. In this study, to evaluate and compare the methods for identification of structural alerts, chemical Ames mutagenicity data set was selected as the benchmark at first. The key substructures for chemical mutagenicity were then obtained by those methods and used for evaluation and comparison of those methods. The Kazius data set of chemical Ames mutagenicity, used as the training set, contained 4069 distinct compounds, of which 2294 were labeled as mutagens and 1775 as nonmutagens. The external validation set, the rest of the Hansen s data set (after removing the compounds in Kazius data set), was used to evaluate the performance of the methods, which consisted of 3363 compounds, including 1736 mutagens and 1627 nonmutagens. On the basis of the Kazius data set, 129 frequent substructures were obtained from MoSS, while 130 potential structural alerts were found from SARpy. Bioalerts detected 1094 patterns, from which we selected 395 unique substructures whose IG values were higher than In FP method, the fingerprints output 6214 bits in total. Then we retained the fragments whose accuracy is higher than 0.75 and discarded the nonsubstructure patterns (e.g., MACCS49 is [! +0] ). Finally, 173 patterns were left with a manual screening because many patterns were similar (Table S1). Distribution of Substructures with Three Indices. ACC, PR, and IG are the three most important indices that could decide the capability of the substructures to predict the toxicity of a compound. These three indices were hence used to evaluate the performance of the four methods. After calculations, the three indices of each substructure were plotted in Figure 3. Although both the ACC and PR values of a Figure 3. Distribution of the substructures with their ACC and PR and colored by IG values. substructure were expected to be close to 1, it was nearly impossible to get a substructure of which ACC and PR values were both very high. When the ACC value became close to 1, the maximum of PR value was simultaneously decreased. The border of PR (regarding ACC as independent variable) formed a curved virtual boundary. The more close to the virtual boundary a substructure was, the higher IG value it had. The highest IG value was and the corresponding ACC = 0.809, PR = As the ACC value became higher or lower (while the PR value became lower or higher), the IG value decreased. Figure 4 shows the distribution of the substructures detected by each method. The FP method occupied the highest average IG value, , and the highest average PR value. The ACC values of most of its substructures distributed between The average ACC value from FP was not as high as those from Bioalerts and SARpy. Bioalerts had the highest average ACC value, 0.941, and its IG value was also high. The average PR value of the substructures detected by Bioalerts was not high because most of its substructures distributed PR between 0 and However, if considering the absolute number of substructures with high PR values (>0.1), Bioalerts was still outstanding. As for SARpy, though it did not remove those substructures with low ACC values (<0.75), most of the substructures it detected had high ACC values, while the IG 1358

5 Figure 4. Statistics of the substructures detected by each method with their ACC, PR, and IG values. The legend in each subgraph displays the average index of each method. Figure 5. Performance curves of the four models in training set and comparison with other models in external validation set. The three commercial models (PipelinePolit, MultiCASE, and DEREK) were evaluated in five-fold cross-validation by Hansen et al. 39 values were very low because of the lower PR values. Substructures detected by MoSS were the lowest in both ACC and IG values, though the average PR value was high. From a general view, Bioalerts was the best detecting method, and FP method focused more on PR, while SARpy preferred ACC. The average IG values of FP was higher than the average IG values of SARpy. Capability and Performance of the Four Methods. To evaluate the capability of the methods, three aspects were investigated. At first, we inspected the performance of the rulebased models using the substructures detected by the four methods. Figure 5 shows the performance of the four models in both training set and external validation set. SARpy could get the highest accuracy compared to other three methods in both training set and validation set, which illustrated that the structural alerts identified by SARpy could help predict the toxicity of a compound due to its high accuracy and interpretability. The performance of FP method and Bioalerts were similar, and FP method had a higher maximum coverage rate (89.4% for training set and 86.2% for validation set). This implied that the substructures obtained by FP method were more common than the others. As for MoSS, its accuracy was the lowest compared to other methods in both training set and validation set. The performances of the four methods in validation set were compared with the previous models made by Hansen et al. (Figure 5). The results showed that PipelinePilot, based on molecular fingerprints and machine learning methods, performed the best. MultiCASE was similar in performance with the four methods we used, and all these methods were rule-based methods that automatically detect structural alerts in a data set. The expert system DEREK performed the worst compared to other methods. Among the five structural alerts detection methods, SARpy performed the best in prediction. Then we compared the substructures identified by the four methods with those in publications. From ToxAlerts, we collected 46 patterns represented as SMARTS format to compare with the structural alerts we obtained. All the patterns were represented as SMARTS listed in Table S1. The comparison results of the four methods with patterns from ToxAlerts as references were shown in Table 1. Twenty patterns were matched by at least one method. SARpy matched the most patterns compared to the other three methods, and plus score (indicating the substructures were more specific than 1359

6 Table 1. Comparison of the Four Methods with the Patterns from ToxAlerts a was a little lower in comparison with Bioalerts. Though MoSS could detect azo group and fully matched 10 patterns from ToxAlerts, which was similar to other methods, the number of missing structural alerts was also high so that the accuracy of this method was unsatisfactory. In terms of time-consuming of each method, we made a simple comparison. As illustrated in Table 2, the graph-based Table 2. Time Consuming of Different Methods method substructure mining (sec) fragmentation or FP calculation evaluation (sec) sum (sec) MoSS <3 SARpy FP 2337 (about 39 min) Bioalerts a V means completely matched or almost the same. X means entirely missing. + means the substructures mined by the method were more specific compared to ToxAlerts. + + or + ++ indicates the number of the relevant substructures was high and higher. means the substructures were more common compared to ToxAlerts, and means much more common substructures. For example, Cl * is much more common than C C Cl. ToxAlerts) was also the highest, which could explain why SARpy obtained the highest performance in prediction. FP method had the highest minus score (indicating the substructures were more common than ToxAlerts). The substructures identified by FP were common enough so that the maximal positive coverage rate was higher, and the accuracy detecting method MoSS ran the fastest (only few seconds that might be ignored), whereas the fingerprint-based method FP performed the slowest among the four methods. The results illustrated that the graph-based methods have advantages in computational performance and may be better to detect alerts in a big data set. Nevertheless, in this study, all these methods were acceptable in performance since the slowest one took only about 40 min. Therefore, when alerts are detected in a small data set with less compounds, all four methods could be used. Key Structural Alerts for Chemical Mutagenicity. Since the chemical Ames mutagenicity data set was used as the benchmark to evaluate the four methods, some key structural alerts for chemical mutagenicity were obtained. Ranked by IG values, the top substructures identified by the four methods we used and by Kazius were shown in Table 3. These patterns were mainly divided into two categories, nitrogenous functional groups and fragments in poly aromatic compounds. Among them, N O was shared by all the four methods and Kazius method. Others were mostly detected by FP method and Bioalerts, and they were very similar to subtle difference. For example, the third pattern obtained by Bioalerts cc(c)c allows both biphenyl and condensed ring (naphthalene derivatives). The 11th pattern [#6](:c)(:c)(:c) defined in Pubchem fingerprint constrained that compounds should be condensed rings. DISCUSSION Analysis of Different Methodologies for Structural Alerts. All four methods we compared in this study assess a potential structural alerts based on the occurrence of the substructure in a data set. From the aspect of statistics, a large data set (hundreds or thousands of compounds) is required to make the frequency of the substructures significant. In contrast, it is difficult to learn the rules from a small data set by computational methods, and empirical rules and expert systems may be more helpful. Most detecting methods for structural alerts regarded ACC and its analogs as the most important factor. For example, SARpy sorted the substructure candidates based on likelihood ratio, which is monotonically increasing as ACC increases. Another similar assessment method called enrichment factor 55 was also used in this field to rank the substructures. 1,38,56 Other studies implemented accuracy or p-value, which is also similar, 30 that is, all these evaluation indicators are monotonically correlated. MoSS uses two thresholds, min 1360

7 Table 3. Key Structural Alerts Detected by Different Methods and Ordered by IG supporting rate and max complemental rate, to extract the potential structural alerts. If the thresholds are too narrow, the substructures obtained will be very few and if too wide they will be too many and result in false positive substructures. It is more sensible to widen the thresholds and use additional assessment process to enrich the structural alerts. Using ACC as the main index for substructure detection may lead to overfitting, in which the substructures identified are too specific so that it will lack representativeness. Overly specific structural alerts may have higher accuracy but are less significant, which is nonbeneficial for researchers. For example, in Figure S1a, although the three substructures mined by SARpy had accuracy of 100%, the emerging frequency of them was only about 10, and the constructs were not common enough. Comparatively, a pattern mined by Kazius could match a large number of compounds in the data sets, and meanwhile the accuracy of the prediction was still high (Figure S1b). This substructure indicated that condensed ring as skeleton may result in genotoxicity. We designed two similar SMARTS, shown in Figure S1c, to repeat the result of Kazius, and got a similar result. Therefore, PR is also critical in assessment of a substructure to avoid over specific structural alerts. According to the distribution of structural alerts shown in Figure 4, we cannot expect a substructure with both high ACC and high PR. A compromise index is necessary to evaluate the significance of a substructure. IG is preferable and is widely applied in feature selection in machine learning methods and is an important index in decision tree. The distribution in Figure 4 also suggested that a substructure with extremely high ACC value might not be as useful as that with a lower ACC but higher PR value, according to the IG values. Although IG value can be regarded as a comprehensive score in evaluation of a substructure, there are still some shortages. If a substructure has a low ACC value but high PR value, its IG value will be still very high for its commonness. However, if its ACC is too low, the researchers may lose interest in it or doubt the potency. For example, the substructure allylamine ( N C C C ) had a comparatively high IG score, but its ACC was only On one hand, the high occurrence represented the significance of the pattern that deserved to pay attention, on the other hand, the accuracy was too low and the indices illustrated that more than 600 compounds containing the substructure were inactive. It is still hard to balance the contradiction between the accuracy and positive rate even though the IG score can comprehensively evaluate the performance. Implication for Method Development of Structural Alerts. Many identified substructures for alert were similar, especially with fingerprint-based methods. A good example is the structural alerts shown in Table 3. From the view of chemistry or chemoinformatics, N O, N O, cn O, N =* are different ( * indicates any atom, c means an aromatic carbon, and indicates any bond type). Each of them had high IG and ACC values, and they were correlated. Meanwhile, cc(c)c differed from cccc(c)c in the view of fragment, but they may both indicate condensed rings so that their indices were the same. However, it is difficult to judge whether they are redundant. Current detecting methods for structural alerts do not consider this problem because they only consider whether a substructure s ACC (or other indices) reaches the threshold. These methods can generate hundreds of fragments that are possibly related and even hierarchical (Figure S2 illustrates one example). A lot of work need to be done to eliminate the redundant fragments. Another kind of doubtful structural alerts was caused by the concurrence of two substructures. A typical example is o-nbromobenzene. Through calculation of the frequency of the compounds that contained o-n-bromobenzene, we found that all six compounds were positive. However, after we analyzed the six compounds, we assumed that the toxicity might not be caused by o-n-bromobenzene rather than other structural alerts like nitro group, anthracene analogue, or benzidine (Figure S3). All six compounds have other structural alerts that may be the primary cause of their toxicities, and the o-n-bromobenzene was probably a false positive structural alert. Most identification methods for structural alerts did not consider the effect of existing structural alerts in a compound, so that false positives might happen with limited supporting compounds. Therefore, in the future development of identification methods for structural alerts, it is necessary to consider how to generalize the specific structural alerts, which will also avoid redundancy. 1361

8 Some other methods that consider the occurrence of two or more substructures, such as emerging pattern, 57 are excellent and deserve to be more widely applied and referenced. A molecular pattern is defined to be a set of molecular fragments. 58 Gillet et al. proposed a jumping emerging pattern (JEP) 59 mining algorithm that used atom pairs 60 as descriptors to find the patterns. Afterward, emerging pattern mining 61 was proposed to mine toxic features using the contrast pattern tree algorithm. 62 However, the patterns may be identified by chance if the supporting data set is limited, and it may still be challenging to avoid false positives. It should be noted that the hypothesis that electrophilic reactivity leads to chemical mutagenicity has been acknowledged as a common sense; 14 hence, it is amenable to finding the causal relationships between chemical mutagenicity and discrete substructures. In contrast, many other toxicological end points, in particular toxicities associated with pharmacophoric toxicophores, may not be suitable to this sort of analysis. Therefore, it is necessary to develop new methods for structural alerts of those toxicological end points. CONCLUSIONS In this paper, four methodologies were compared and evaluated in terms of their capability on identification of structural alerts using chemical Ames mutagenicity data set as a benchmark. All the methods are freely available and easy to use. Three parameters, namely IG, ACC, and PR, were used as main evaluation indices. By comparison of the performance of the four methods, we found that fingerprint-based methods demonstrated strong capability in identification of significant structural alerts. They could cover most of the positive sets and the accuracy was acceptable; however, they contained too many redundant patterns and false positives. MoSS was the fastest and also could identify a lot of significant structural alerts, while its assessment method may result in high false positive and false negative. SARpy could detect highly accurate and specific substructures owing to its assessment of likelihood ratio, and it is more preferable to use its structural alerts to build rule-based predictive models. At the end of the paper, we showed that current methods for structural alerts might identify redundant and overspecific substructures. Therefore, it is still challenging for us to automatically detect structural alerts, which are more convincing and beneficial to researchers. ASSOCIATED CONTENT *S Supporting Information The Supporting Information is available free of charge on the ACS Publications website at. Structural alerts detected by SARpy, MoSS, FP, and Bioalerts; structural alerts collected from ToxAlerts (XLSX) Supporting figures (PDF) AUTHOR INFORMATION Corresponding Author *Phone: ytang234@ecust.edu.cn. ORCID Hongbin Yang: Weihua Li: Yun Tang: Funding This work was supported by the National Key Research and Development Program (Grant No. 2016YFA ), the National Natural Science Foundation of China (Grant Nos and ), and the 111 Project (Grant No. B07023). Notes The authors declare no competing financial interest. ACKNOWLEDGMENTS The authors would like to thank Dr. Zhengyu Yin of Dupont Shanghai Innovation Center for the valuable comments in preparation of this manuscript. ABBREVIATIONS ACC, accuracy rate; IG, information gain; JEP, jumping emerging pattern; PR, positive rate; RF, random forest; SA, structural alerts; SAR, structure activity relationships; SVM, support vector machine REFERENCES (1) Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., Lee, P. W., and Tang, Y. (2012) In silico prediction of chemical Ames mutagenicity. J. Chem. Inf. Model. 52, (2) Webb, S. J., Hanser, T., Howlin, B., Krause, P., and Vessey, J. D. (2014) Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. J. Cheminf. 6, 8. (3) Stepan, A. F., Walker, D. P., Bauman, J., Price, D. A., Baillie, T. A., Kalgutkar, A. S., and Aleo, M. D. (2011) Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem. Res. Toxicol. 24, (4) Liu, R., Yu, X., and Wallqvist, A. (2015) Data-driven identification of structural alerts for mitigating the risk of druginduced human liver injuries. J. Cheminf. 7, 4. (5) Benigni, R., Bossa, C., Alivernini, S., and Colafranceschi, M. (2012) Assessment and validation of US EPA s OncoLogic(R) expert system and analysis of its modulating factors for structural alerts. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev. 30, (6) Pizzo, F., Gadaleta, D., Lombardo, A., Nicolotti, O., and Benfenati, E. (2015) Identification of structural alerts for liver and kidney toxicity using repeated dose toxicity data. Chem. Cent. J. 9, 62. (7) Maron, D. M., and Ames, B. N. (1983) Revised methods for the Salmonella mutagenicity test. Mutat. Res. 113, (8) Piegorsch, W. W., and Zeiger, E. (1991) Measuring Intra-Assay Agreement for the Ames Salmonella Assay. In Statistical Methods in Toxicology: Proceedings of a Workshop during EUROTOX 90 Leipzig, Germany, September 12 14, 1990 (Hothorn, L., Ed.) pp 35 41, Springer, Berlin Heidelberg. (9) Ashby, J., and Tennant, R. W. (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutat. Res., Genet. Toxicol. Test. 204, (10) Shelby, M. D. (1988) The genetic toxicity of human carcinogens and its implications. Mutat. Res., Genet. Toxicol. Test. 204, (11) Williams, G. M., Mori, H., and McQueen, C. A. (1989) Structure-activity relationships in the rat hepatocyte DNA-repair test for 300 chemicals. Mutat. Res., Rev. Genet. Toxicol. 221, (12) Enoch, S. J., Ellison, C. M., Schultz, T. W., and Cronin, M. T. D. (2011) A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity. Crit. Rev. Toxicol. 41, (13) Li, H. Q., Qi, F. H., and Wang, S. Y. (2005) A comparison of model selection methods for multi-class support vector machines.

9 Computational Science and Its Applications - Iccsa , (14) Benigni, R., and Bossa, C. (2006) Structural Alerts of Mutagens and Carcinogens. Curr. Comput.-Aided Drug Des. 2, (15) Ridings, J. E., Barratt, M. D., Cary, R., Earnshaw, C. G., Eggington, C. E., Ellis, M. K., Judson, P. N., Langowski, J. J., Marchant, C. A., Payne, M. P., Watson, W. P., and Yih, T. D. (1996) Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. Toxicology 106, (16) (2012) Rules-based System Designed to Support the ICH M7 Guidelines on Impurities, Leadscope, Inc. genetox_expert_alerts/ (accessed April 24, 2017). (17) Lepailleur, A., Poezevara, G., and Bureau, R. (2013) Automated detection of structural alerts (chemical fragments) in (eco)toxicology. Comput. Struct. Biotechnol. J. 5, e (18) Floris, M., Raitano, G., Medda, R., and Benfenati, E. (2016) Fragment Prioritization on a Large Mutagenicity Dataset. Mol. Inf /minf (19) Klopman, G. (1984) Artificial intelligence approach to structureactivity studies. Computer automated structure evaluation of biological activity of organic molecules. J. Am. Chem. Soc. 106, (20) Klopman, G., and McGonigal, M. (1981) Computer simulation of physical-chemical properties of organic molecules. 1. Molecular system identification. J. Chem. Inf. Model. 21, (21) Cunningham, A. R., Moss, S. T., Iype, S. A., Qian, G., Qamar, S., and Cunningham, S. L. (2008) Structure-Activity Relationship Analysis of Rat Mammary Carcinogens. Chem. Res. Toxicol. 21, (22) Ferrari, T., Cattaneo, D., Gini, G., Golbamaki Bakhtyari, N., Manganaro, A., and Benfenati, E. (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ. Res. 24, (23) Golbamaki, A., Benfenati, E., Golbamaki, N., Manganaro, A., Merdivan, E., Roncaglioni, A., and Gini, G. (2016) New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev. 34, (24) Golbamaki, A., and Benfenati, E. (2016) In Silico Methods for Carcinogenicity Assessment. Methods Mol. Biol. 1425, (25) Liu, P., Agrafiotis, D. K., and Rassokhin, D. N. (2011) Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching. J. Chem. Inf. Model. 51, (26) Kramer, S., De Raedt, L., and Helma, C. (2001) Molecular Feature Mining in HIV Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , ACM. (27) Raedt, L. D., and Kramer, S. (2001) The levelwise version space algorithm and its application to molecular fragment finding. In Proceedings of the 17th international joint conference on artificial intelligence, Vol. 2, pp , Morgan Kaufmann Publishers Inc., Seattle, WA. (28) Borgelt, C., and Berthold, M. R. (2002) Mining molecular fragments: Finding relevant substructures of molecules. In ICDM 2003 Proceedings, 2002 IEEE International Conference on Data Mining, pp 51 58, IEEE. (29) Nijssen, S., and Kok, J. N. (2004) A quickstart in frequent structure mining can make a difference. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , ACM. (30) Ahlberg, E., Carlsson, L., and Boyer, S. (2014) Computational Derivation of Structural Alerts from Large Toxicology Data Sets. J. Chem. Inf. Model. 54, (31) Faulon, J. L., Visco, D. P., and Pophale, R. S. (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43, (32) Steinbeck, C., Han, Y. Q., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E. (2003) The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, (33) Shen, J., Cheng, F., Xu, Y., Li, W., and Tang, Y. (2010) Estimation of ADME properties with substructure pattern recognition. J. Chem. Inf. Model. 50, (34) Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, (35) Sun, L., Zhang, C., Chen, Y. J., Li, X., Zhuang, S. L., Li, W. H., Liu, G. X., Lee, P. W., and Tang, Y. (2015) In silico prediction of chemical aquatic toxicity with chemical category approaches and substructural alerts. Toxicol. Res. 4, (36) Chen, Y., Cheng, F., Sun, L., Li, W., Liu, G., and Tang, Y. (2014) Computational models to predict endocrine-disrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol. Environ. Saf. 110, (37) Yang, H., Li, X., Cai, Y., Wang, Q., Li, W., Liu, G., and Tang, Y. (2017) In silico prediction of chemical subcellular localization via multi-classification methods. MedChemComm in press. DOI: / C7MD00074J. (38) Li, X., Chen, L., Cheng, F., Wu, Z., Bian, H., Xu, C., Li, W., Liu, G., Shen, X., and Tang, Y. (2014) In silico prediction of chemical acute oral toxicity using multi-classification methods. J. Chem. Inf. Model. 54, (39) Hansen, K., Mika, S., Schroeter, T., Sutter, A., ter Laak, A., Steger-Hartmann, T., Heinrich, N., and Muller, K. R. (2009) Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J. Chem. Inf. Model. 49, (40) Kazius, J., McGuire, R., and Bursi, R. (2005) Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 48, (41) Kazius, J., Nijssen, S., Kok, J., Back, T., and Ijzerman, A. P. (2006) Substructure mining using elaborate chemical representation. J. Chem. Inf. Model. 46, (42) O Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G. R. (2011) Open Babel: An open chemical toolbox. J. Cheminf. 3, 33. (43) (2007) KNIME: The Konstanz Information Miner, Springer. (accessed Jan 10, 2017). (44) Landrum, G. (2016) RDKit. (accessed Jan 10, 2017). (45) Cortes-Ciriano, I. (2016) Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets. J. Cheminf. 8, 13. (46) Yap, C. W. (2011) PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 32, (47) (2009) PubChem Substructure Fingerprint. ftp://ftp.ncbi.nlm.nih. gov/pubchem/specifications/pubchem_fingerprints.txt (accessed Jan 10, 2017). (48) Klekota, J., and Roth, F. P. (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24, (49) Quinlan, J. R. (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc. (50) Sokolova, M., and Szpakowicz, S. (2010) In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. (51) O Boyle, N. M., Morley, C., and Hutchison, G. R. (2008) Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2, 5. (52) Sushko, I., Salmina, E., Potemkin, V. A., Poda, G., and Tetko, I. V. (2012) ToxAlerts: A Web Server of Structural Alerts for Toxfic Chemicals and Compounds with Potential Adverse Reactions. J. Chem. Inf. Model. 52, (53) Benigni, R., and Bossa, C. (2008) Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. Mutat. Res., Rev. Mutat. Res. 659,

10 (54) Bailey, A. B., Chanderbhan, R., Collazo-Braier, N., Cheeseman, M. A., and Twaroski, M. L. (2005) The use of structure-activity relationship analysis in the food contact notification program. Regul. Toxicol. Pharmacol. 42, (55) Jensen, B. F., Vind, C., Padkjaer, S. B., Brockhoff, P. B., and Refsgaard, H. H. F. (2007) In silico prediction of cytochrome P450 2D6 and 3A4 inhibition using Gaussian kernel weighted k-nearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. J. Med. Chem. 50, (56) Chen, Y. J., Cheng, F. X., Sun, L., Li, W. H., Liu, G. X., and Tang, Y. (2014) Computational models to predict endocrinedisrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol. Environ. Saf. 110, (57) Dong, G., and Li, J. (1999) Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp43 52, ACM. (58) Bertrand, C., Guillaume, P., Bruno, C., Alban, L., and Ronan, B. (2012) Emerging Patterns as Structural Alerts for Computational Toxicology. In Contrast Data Mining, Chapman and Hall/CRC. (59) Sherhod, R., Gillet, V. J., Judson, P. N., and Vessey, J. D. (2012) Automating Knowledge Discovery for Toxicity Prediction Using Jumping Emerging Pattern Mining. J. Chem. Inf. Model. 52, (60) Carhart, R. E., Smith, D. H., and Venkataraghavan, R. (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Model. 25, (61) Sherhod, R., Judson, P. N., Hanser, T., Vessey, J. D., Webb, S. J., and Gillet, V. J. (2014) Emerging pattern mining to aid toxicological knowledge discovery. J. Chem. Inf. Model. 54, (62) Fan, H., and Ramamohanarao, K. (2006) Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans. Knowl. Data Eng. 18,

Emerging patterns mining and automated detection of contrasting chemical features

Emerging patterns mining and automated detection of contrasting chemical features Emerging patterns mining and automated detection of contrasting chemical features Alban Lepailleur Centre d Etudes et de Recherche sur le Médicament de Normandie (CERMN) UNICAEN EA 4258 - FR CNRS 3038

More information

Automated detection of structural alerts (chemical fragments) in (eco)toxicology

Automated detection of structural alerts (chemical fragments) in (eco)toxicology , http://dx.doi.org/10.5936/csbj.201302013 CSBJ Automated detection of structural alerts (chemical fragments) in (eco)toxicology Alban Lepailleur a,b, Guillaume Poezevara a,c, Ronan Bureau a,b,* Abstract:

More information

The In Silico Model for Mutagenicity

The In Silico Model for Mutagenicity Inn o vat i v e te c h n o l o g i e s, c o n c e p t s an d ap p r o a c h e s The In Silico Model for Mutagenicity Giuseppina Gini 1, Thomas Ferrari 1 and Alessandra Roncaglioni 2 1 DEI, Politecnico

More information

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli

More information

Novel Methods for Graph Mining in Databases of Small Molecules. Andreas Maunz, Retreat Spitzingsee,

Novel Methods for Graph Mining in Databases of Small Molecules. Andreas Maunz, Retreat Spitzingsee, Novel Methods for Graph Mining in Databases of Small Molecules Andreas Maunz, andreas@maunz.de Retreat Spitzingsee, 05.-06.04.2011 Tradeoff Overview Data Mining i Exercise: Find patterns / motifs in large

More information

Large scale classification of chemical reactions from patent data

Large scale classification of chemical reactions from patent data Large scale classification of chemical reactions from patent data Gregory Landrum NIBR Informatics, Basel Novartis Institutes for BioMedical Research 10th International Conference on Chemical Structures/

More information

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.0 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

Mining Toxicity Structural Alerts from SMILES: A New Way to Derive Structure Activity Relationships

Mining Toxicity Structural Alerts from SMILES: A New Way to Derive Structure Activity Relationships Mining Toxicity Structural Alerts from SMILES: A New Way to Derive Structure Activity Relationships Thomas Ferrari, Giuseppina Gini Department of Electronics and Information Politecnico di Milano Milan,

More information

Interactive Feature Selection with

Interactive Feature Selection with Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.

More information

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.1 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

Quantitative Structure Activity Relationships: An overview

Quantitative Structure Activity Relationships: An overview Quantitative Structure Activity Relationships: An overview Prachi Pradeep Oak Ridge Institute for Science and Education Research Participant National Center for Computational Toxicology U.S. Environmental

More information

Structure-Activity Modeling - QSAR. Uwe Koch

Structure-Activity Modeling - QSAR. Uwe Koch Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity

More information

Priority Setting of Endocrine Disruptors Using QSARs

Priority Setting of Endocrine Disruptors Using QSARs Priority Setting of Endocrine Disruptors Using QSARs Weida Tong Manager of Computational Science Group, Logicon ROW Sciences, FDA s National Center for Toxicological Research (NCTR), U.S.A. Thanks for

More information

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding OECD QSAR Toolbox v.3.2 Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding Outlook Background Objectives Specific Aims The exercise Workflow

More information

Research Article. Chemical compound classification based on improved Max-Min kernel

Research Article. Chemical compound classification based on improved Max-Min kernel Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(2):368-372 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Chemical compound classification based on improved

More information

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning and Simulated Annealing Student: Ke Zhang MBMA Committee: Dr. Charles E. Smith (Chair) Dr. Jacqueline M. Hughes-Oliver

More information

In Silico Investigation of Off-Target Effects

In Silico Investigation of Off-Target Effects PHARMA & LIFE SCIENCES WHITEPAPER In Silico Investigation of Off-Target Effects STREAMLINING IN SILICO PROFILING In silico techniques require exhaustive data and sophisticated, well-structured informatics

More information

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding OECD QSAR Toolbox v.3.3 Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding Outlook Background Objectives Specific Aims The exercise Workflow

More information

An Integrated Approach to in-silico

An Integrated Approach to in-silico An Integrated Approach to in-silico Screening Joseph L. Durant Jr., Douglas. R. Henry, Maurizio Bronzetti, and David. A. Evans MDL Information Systems, Inc. 14600 Catalina St., San Leandro, CA 94577 Goals

More information

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database OECD QSAR Toolbox v.3.3 Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database Outlook Background The exercise Workflow Save prediction 23.02.2015

More information

Molecular Fragment Mining for Drug Discovery

Molecular Fragment Mining for Drug Discovery Molecular Fragment Mining for Drug Discovery Christian Borgelt 1, Michael R. Berthold 2, and David E. Patterson 2 1 School of Computer Science, tto-von-guericke-university of Magdeburg, Universitätsplatz

More information

Screening and prioritisation of substances of concern: A regulators perspective within the JANUS project

Screening and prioritisation of substances of concern: A regulators perspective within the JANUS project Für Mensch & Umwelt LIFE COMBASE workshop on Computational Tools for the Assessment and Substitution of Biocidal Active Substances of Ecotoxicological Concern Screening and prioritisation of substances

More information

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule Frank Hoonakker 1,3, Nicolas Lachiche 2, Alexandre Varnek 3, and Alain Wagner 3,4 1 Chemoinformatics laboratory,

More information

OECD QSAR Toolbox v.3.4. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.4. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding OECD QSAR Toolbox v.3.4 Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding Outlook Background Objectives Specific Aims The exercise Workflow

More information

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.4.1 OECD QSAR Toolbox v.4.1 Step-by-step example on how to predict the skin sensitisation potential approach of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific

More information

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity Webb et al. Journal of Cheminformatics 2014, 6:8 RESEARCH ARTICLE Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity Samuel J Webb

More information

LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/ Deliverable Report. Deliverable Name and Number

LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/ Deliverable Report. Deliverable Name and Number LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/000435 Deliverable Report Deliverable Name and Number Deliverable 2 Report on the identified criteria for non-testing methods, and their scores Deliverable

More information

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs William (Bill) Welsh welshwj@umdnj.edu Prospective Funding by DTRA/JSTO-CBD CBIS Conference 1 A State-wide, Regional and National

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

OECD QSAR Toolbox v.3.4

OECD QSAR Toolbox v.3.4 OECD QSAR Toolbox v.3.4 Step-by-step example on how to predict the skin sensitisation potential approach of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific

More information

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification Knowledge and Information Systems (20XX) Vol. X: 1 29 c 20XX Springer-Verlag London Ltd. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification Nikil Wale Department of Computer

More information

OECD QSAR Toolbox v.3.3

OECD QSAR Toolbox v.3.3 OECD QSAR Toolbox v.3.3 Step-by-step example on how to predict the skin sensitisation potential of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific Aims Read

More information

Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors

Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors Mol2Net, 2015, 1(Section B), pages 1-6, Proceedings 1 SciForum Mol2Net Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors Flavia C. Silva, Ekaterina V. Varlamova, Rodolpho

More information

CDK & Mass Spectrometry

CDK & Mass Spectrometry CDK & Mass Spectrometry October 3, 2011 1/18 Stephan Beisken October 3, 2011 EBI is an outstation of the European Molecular Biology Laboratory. Chemistry Development Kit (CDK) An Open Source Java TM Library

More information

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

has its own advantages and drawbacks, depending on the questions facing the drug discovery. 2013 First International Conference on Artificial Intelligence, Modelling & Simulation Comparison of Similarity Coefficients for Chemical Database Retrieval Mukhsin Syuib School of Information Technology

More information

Chemical Space: Modeling Exploration & Understanding

Chemical Space: Modeling Exploration & Understanding verview Chemical Space: Modeling Exploration & Understanding Rajarshi Guha School of Informatics Indiana University 16 th August, 2006 utline verview 1 verview 2 3 CDK R utline verview 1 verview 2 3 CDK

More information

Read-Across or QSARs?

Read-Across or QSARs? Replacing Experimentation Read-Across or QSARs? Which one to apply and when? Presented by: Dr. Faizan SAHIGARA Chemical Watch Expo 2017 26th April, 2017 Berlin Germany KREATiS, 23 rue du creuzat, 38080

More information

DISCOVERING new drugs is an expensive and challenging

DISCOVERING new drugs is an expensive and challenging 1036 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 8, AUGUST 2005 Frequent Substructure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, Nikil

More information

OECD QSAR Toolbox v4.0 Simplifying the correct use of non-test methods

OECD QSAR Toolbox v4.0 Simplifying the correct use of non-test methods OECD QSAR Toolbox v4.0 Simplifying the correct use of non-test methods Stakeholders Day IT tool training 4 April 2017 Tomasz Sobanski Andrea Gissi Marta Sannicola Computational assessment and dissemination

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor

More information

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures

More information

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov CADD Group Chemical Biology Laboratory Frederick National Laboratory for Cancer Research National Cancer Institute, National Institutes

More information

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available QMRF identifier (JRC Inventory):To be entered by JRC QMRF Title: Insubria QSAR PaDEL-Descriptor model for prediction of NitroPAH mutagenicity. Printing Date:Jan 20, 2014 1.QSAR identifier 1.1.QSAR identifier

More information

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined OECD QSAR Toolbox v.3.3 Step-by-step example of how to build a userdefined QSAR Background Objectives The exercise Workflow of the exercise Outlook 2 Background This is a step-by-step presentation designed

More information

Statistical concepts in QSAR.

Statistical concepts in QSAR. Statistical concepts in QSAR. Computational chemistry represents molecular structures as a numerical models and simulates their behavior with the equations of quantum and classical physics. Available programs

More information

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals OECD QSAR Toolbox v.4.1 Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals Background Outlook Objectives The exercise Workflow 2 Background This is a

More information

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python PharmaSUG 2018 - Paper AD34 Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python Kristen Cardinal, Colorado Springs, Colorado, United States Hao Sun, Sun

More information

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity OECD QSAR Toolbox v.4.1 Tutorial illustrating new options of the structure similarity Outlook Background Aims PubChem features The exercise Workflow 2 Background This presentation is designed to familiarize

More information

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics... 1 1.1 Chemoinformatics... 2 1.1.1 Open-Source Tools... 2 1.1.2 Introduction to Programming Languages... 3 1.2 Chemical Structure

More information

Similarity Search. Uwe Koch

Similarity Search. Uwe Koch Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance

More information

MultiCASE CASE Ultra model for severe skin irritation in vivo

MultiCASE CASE Ultra model for severe skin irritation in vivo MultiCASE CASE Ultra model for severe skin irritation in vivo 1. QSAR identifier 1.1 QSAR identifier (title) MultiCASE CASE Ultra model for severe skin irritation in vivo, Danish QSAR Group at DTU Food.

More information

Data Mining in the Chemical Industry. Overview of presentation

Data Mining in the Chemical Industry. Overview of presentation Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry

More information

Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification. Technical Report

Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification. Technical Report Acyclic Subgraph based Descriptor Spaces for Chemical Compound Retrieval and Classification Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200

More information

Practical QSAR and Library Design: Advanced tools for research teams

Practical QSAR and Library Design: Advanced tools for research teams DS QSAR and Library Design Webinar Practical QSAR and Library Design: Advanced tools for research teams Reservationless-Plus Dial-In Number (US): (866) 519-8942 Reservationless-Plus International Dial-In

More information

KATE2017 on NET beta version https://kate2.nies.go.jp/nies/ Operating manual

KATE2017 on NET beta version  https://kate2.nies.go.jp/nies/ Operating manual KATE2017 on NET beta version http://kate.nies.go.jp https://kate2.nies.go.jp/nies/ Operating manual 2018.03.29 KATE2017 on NET was developed to predict the following ecotoxicity values: 50% effective concentration

More information

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.

More information

Merging Applicability Domains for in Silico Assessment of Chemical Mutagenicity

Merging Applicability Domains for in Silico Assessment of Chemical Mutagenicity pubs.acs.org/jcim Merging Applicability Domains for in Silico Assessment of Chemical Mutagenicity Ruifeng Liu* and Anders Wallqvist* DoD Biotechnology High Performance Computing Software Applications Institute,

More information

Case study: Category consistency assessment in Toolbox for a list of Cyclic unsaturated hydrocarbons with respect to repeated dose toxicity.

Case study: Category consistency assessment in Toolbox for a list of Cyclic unsaturated hydrocarbons with respect to repeated dose toxicity. Case study: Category consistency assessment in Toolbox for a list of Cyclic unsaturated hydrocarbons with respect to repeated dose toxicity. 1. Introduction The aim of this case study is to demonstrate

More information

Introduction to Chemoinformatics and Drug Discovery

Introduction to Chemoinformatics and Drug Discovery Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.

More information

RSC Publishing. Principles and Applications. In Silico Toxicology. Liverpool John Moores University, Liverpool, Edited by

RSC Publishing. Principles and Applications. In Silico Toxicology. Liverpool John Moores University, Liverpool, Edited by In Silico Toxicology Principles and Applications Edited by Mark T. D. Cronin and Judith C. Madden Liverpool John Moores University, Liverpool, UK RSC Publishing Contents Chapter 1 In Silico Toxicology

More information

Applications of multi-class machine

Applications of multi-class machine Applications of multi-class machine learning models to drug design Marvin Waldman, Michael Lawless, Pankaj R. Daga, Robert D. Clark Simulations Plus, Inc. Lancaster CA, USA Overview Applications of multi-class

More information

Grouping and Read-Across for Respiratory Sensitisation. Dr Steve Enoch School of Pharmacy and Biomolecular Sciences Liverpool John Moores University

Grouping and Read-Across for Respiratory Sensitisation. Dr Steve Enoch School of Pharmacy and Biomolecular Sciences Liverpool John Moores University Grouping and Read-Across for Respiratory Sensitisation Dr Steve Enoch School of Pharmacy and Biomolecular Sciences Liverpool John Moores University Chemicals are grouped into a category Toxicity data from

More information

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method Ágnes Peragovics Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method PhD Theses Supervisor: András Málnási-Csizmadia DSc. Associate Professor Structural Biochemistry Doctoral

More information

Biological Read-Across: Species-Species and Endpoint- Endpoint Extrapolation

Biological Read-Across: Species-Species and Endpoint- Endpoint Extrapolation Biological Read-Across: Species-Species and Endpoint- Endpoint Extrapolation Mark Cronin School of Pharmacy and Chemistry Liverpool John Moores University England m.t.cronin@ljmu.ac.uk Integrated Testing

More information

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015 Ignasi Belda, PhD CEO HPC Advisory Council Spain Conference 2015 Business lines Molecular Modeling Services We carry out computational chemistry projects using our selfdeveloped and third party technologies

More information

Xia Ning,*, Huzefa Rangwala, and George Karypis

Xia Ning,*, Huzefa Rangwala, and George Karypis J. Chem. Inf. Model. XXXX, xxx, 000 A Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets

More information

ADMET property estimation, oral bioavailability predictions, SAR elucidation, & QSAR model building software www.simulations-plus.com +1-661-723-7723 What is? is an advanced computer program that enables

More information

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004 Computational Methods and Drug-Likeness Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004 The Problem Drug development in pharmaceutical industry: >8-12 years time ~$800m costs >90% failure

More information

QSAR in Green Chemistry

QSAR in Green Chemistry QSAR in Green Chemistry Activity Relationship QSAR is the acronym for Quantitative Structure-Activity Relationship Chemistry is based on the premise that similar chemicals will behave similarly The behavior/activity

More information

OECD QSAR Toolbox v.3.4

OECD QSAR Toolbox v.3.4 OECD QSAR Toolbox v.3.4 Predicting developmental and reproductive toxicity of Diuron (CAS 330-54-1) based on DART categorization tool and DART SAR model Outlook Background Objectives The exercise Workflow

More information

Introduction to Chemoinformatics

Introduction to Chemoinformatics Introduction to Chemoinformatics Dr. Igor V. Tetko Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) Institute of Bioinformatics & Systems Biology (HMGU) Kyiv, 10 August

More information

Receptor Based Drug Design (1)

Receptor Based Drug Design (1) Induced Fit Model For more than 100 years, the behaviour of enzymes had been explained by the "lock-and-key" mechanism developed by pioneering German chemist Emil Fischer. Fischer thought that the chemicals

More information

Machine learning for ligand-based virtual screening and chemogenomics!

Machine learning for ligand-based virtual screening and chemogenomics! Machine learning for ligand-based virtual screening and chemogenomics! Jean-Philippe Vert Institut Curie - INSERM U900 - Mines ParisTech In silico discovery of molecular probes and drug-like compounds:

More information

Constraint-Based Data Mining and an Application in Molecular Feature Mining

Constraint-Based Data Mining and an Application in Molecular Feature Mining Constraint-Based Data Mining and an Application in Molecular Feature Mining Luc De Raedt Chair of Machine Learning and Natural Language Processing Albert-Ludwigs-University Freiburg Joint work with Lee

More information

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK Chemoinformatics and information management Peter Willett, University of Sheffield, UK verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics

More information

Development of a Structure Generator to Explore Target Areas on Chemical Space

Development of a Structure Generator to Explore Target Areas on Chemical Space Development of a Structure Generator to Explore Target Areas on Chemical Space Kimito Funatsu Department of Chemical System Engineering, This materials will be published on Molecular Informatics Drug Development

More information

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Dr James Chisholm,* Dr John Barnard, Dr Julian Hayward, Dr Matthew Segall*, Mr Edmund Champness*, Dr Chris Leeding,* Mr Hector

More information

JCICS Major Research Areas

JCICS Major Research Areas JCICS Major Research Areas Chemical Information Text Searching Structure and Substructure Searching Databases Patents George W.A. Milne C571 Lecture Fall 2002 1 JCICS Major Research Areas Chemical Computation

More information

Solved and Unsolved Problems in Chemoinformatics

Solved and Unsolved Problems in Chemoinformatics Solved and Unsolved Problems in Chemoinformatics Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nürnberg D-91052 Erlangen, Germany Johann.Gasteiger@fau.de Overview objectives of lecture

More information

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Computational chemical biology to address non-traditional drug targets. John Karanicolas Computational chemical biology to address non-traditional drug targets John Karanicolas Our computational toolbox Structure-based approaches Ligand-based approaches Detailed MD simulations 2D fingerprints

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline OECD QSAR Toolbox v.3.4 Example for predicting Repeated dose toxicity of 2,3-dimethylaniline Outlook Background Objectives The exercise Workflow Save prediction 2 Background This is a step-by-step presentation

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

Drug Informatics for Chemical Genomics...

Drug Informatics for Chemical Genomics... Drug Informatics for Chemical Genomics... An Overview First Annual ChemGen IGERT Retreat Sept 2005 Drug Informatics for Chemical Genomics... p. Topics ChemGen Informatics The ChemMine Project Library Comparison

More information

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Iván Solt Solutions for Cheminformatics Drug Discovery Strategies for known targets High-Throughput Screening (HTS) Cells

More information

Exploring the black box: structural and functional interpretation of QSAR models.

Exploring the black box: structural and functional interpretation of QSAR models. EMBL-EBI Industry workshop: In Silico ADMET prediction 4-5 December 2014, Hinxton, UK Exploring the black box: structural and functional interpretation of QSAR models. (Automatic exploration of datasets

More information

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007 Computational Chemistry in Drug Design Xavier Fradera Barcelona, 17/4/2007 verview Introduction and background Drug Design Cycle Computational methods Chemoinformatics Ligand Based Methods Structure Based

More information

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre Dr. Sander B. Nabuurs Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre The road to new drugs. How to find new hits? High Throughput

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction

Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction Stefan Kuhn 1, Björn Egert 2, Steffen Neumann 2, Christoph Steinbeck 1European Bioinformatics Institute

More information

Notes of Dr. Anil Mishra at 1

Notes of Dr. Anil Mishra at   1 Introduction Quantitative Structure-Activity Relationships QSPR Quantitative Structure-Property Relationships What is? is a mathematical relationship between a biological activity of a molecular system

More information

Biologically Relevant Molecular Comparisons. Mark Mackey

Biologically Relevant Molecular Comparisons. Mark Mackey Biologically Relevant Molecular Comparisons Mark Mackey Agenda > Cresset Technology > Cresset Products > FieldStere > FieldScreen > FieldAlign > FieldTemplater > Cresset and Knime About Cresset > Specialist

More information

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery AtomNet A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach, Michael Dzamba, Abraham Heifets Victor Storchan, Institute for Computational and

More information

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction PostDoc Journal Vol. 2, No. 3, March 2014 Journal of Postdoctoral Research www.postdocjournal.com QSAR Study of Thiophene-Anthranilamides Based Factor Xa Direct Inhibitors Preetpal S. Sidhu Department

More information

Structural biology and drug design: An overview

Structural biology and drug design: An overview Structural biology and drug design: An overview livier Taboureau Assitant professor Chemoinformatics group-cbs-dtu otab@cbs.dtu.dk Drug discovery Drug and drug design A drug is a key molecule involved

More information

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model OECD QSAR Toolbox v.4.1 Step-by-step example for building QSAR model Background Objectives The exercise Workflow of the exercise Outlook 2 Background This is a step-by-step presentation designed to take

More information

Introduction. OntoChem

Introduction. OntoChem Introduction ntochem Providing drug discovery knowledge & small molecules... Supporting the task of medicinal chemistry Allows selecting best possible small molecule starting point From target to leads

More information