Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark

Size: px

Start display at page:

Download "Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark"

James Carter
5 years ago
Views:

This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.

Yun Tang* Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China *S Supporting Information Downloaded via 148.251.232.

1 This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.org/crt Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark Hongbin Yang, Jie Li, Zengrui Wu, Weihua Li, Guixia Liu, and Yun Tang* Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai , China *S Supporting Information Downloaded via on November 15, 2018 at 20:28:59 (UTC). See for options on how to legitimately share published articles. ABSTRACT: Identification of structural alerts for toxicity is useful in drug discovery and other fields such as environmental protection. With structural alerts, researchers can quickly identify potential toxic compounds and learn how to modify them. Hence, it is important to determine structural alerts from a large number of compounds quickly and accurately. There are already many methods reported for identification of structural alerts. However, how to evaluate those methods is a problem. In this paper, we tried to evaluate four of the methods for monosubstructure identification with three indices including accuracy rate, coverage rate, and information gain to compare their advantages and disadvantages. The Kazius Ames mutagenicity data set was used as the benchmark, and the four methods were MoSS (graph-based), SARpy (fragment-based), and two fingerprint-based methods including Bioalerts and the fingerprint (FP) method we previously used. The results showed that Bioalerts and FP could detect key substructures with high accuracy and coverage rates because they allowed unclosed rings and wildcard atom or bond types. However, they also resulted in redundancy so that their predictive performance was not as good as that of SARpy. SARpy was competitive in predictive performance in both training set and external validation set. These results might be helpful for users to select appropriate methods and further development of methods for identification of structural alerts. INTRODUCTION substructures for alerts. 9 Although only a few substructures Chemical toxicity is an important issue in drug discovery and were identified from a small data set by Ashby, it was widely environmental risk assessment. In silico prediction of chemical accepted and developed. 10,11 Afterward, several toxicity end toxicity is less expensive and time-saving compared to points and structural types were intensively explored by experts, experimental in vitro or in vivo assays. 1 Although machine and their limitations such as applicability domain were also learning methods, such as support vector machine (SVM) and studied. 12,13 Therefore, detection of potential substructures for random forest (RF), have obtained high predictive accuracy, alert is highly desirable, which could quickly identify toxic they are considered as black boxes. Interpretable models are compounds from a large data set and guide drug design. more significant, friendly, and acceptable for researchers in Initially, experts identified structural alerts through exploring medicinal chemistry. 2 Structural alerts (SA), like an expert structure activity relationships (SARs) of toxic compounds 14 system, can help identify key substructures to certain toxicity to create an expert system. Derek Nexus 15 and Genetox Expert and avoid potential toxic compounds in very early stages. It has Alerts from LeadScope 16 are two representatives of commercial been widely applied not only in drug discovery, 3,4 but also in expert systems to predict toxicity. After that, automated other fields such as cosmetic research and environmental detection of structural alerts has become a hotspot in protection. 5 computational toxicology, and there are several excellent Structural alerts are defined as chemical substructures, whose methods and toolkits available. 17,18 presence may be related to the capability of a substance to In general, methodologies implemented to determine cause certain adverse effects to organs. 6 Chemical mutagenicity structural alerts may be roughly classified into fragmentbased, graph-based, and fingerprint-based approaches (Figure is one of the most widely studied end points for structural alerts. It is usually determined via the famous Ames test, named 1). The fragment-based approach first cuts the bonds of each after the inventor Prof. Bruce Ames, to detect potential compound to get all possible fragments and then calculates the mutagens. 7 The qualitative reproducibility of the Ames test frequency of the fragments occurring in toxic and nontoxic among laboratories is generally good. 8 As early as in 1988, Ashby found strong associations between chemical structures Received: March 27, 2017 and their mutagenicity to Salmonella and suggested 11 Published: May 9, American Chemical Society 1355

Figure 1. Three categories of approaches implemented to determine structural alerts. (A) Fragment-based methods that cut all possible bonds to obtain substructures.

2 Figure 1. Three categories of approaches implemented to determine structural alerts. (A) Fragment-based methods that cut all possible bonds to obtain substructures. (B) Graph-based methods that search subgraphs by a levelwise algorithm. (C) Fingerprint-based methods that regard the predefined substructures as potential structural alerts. compound sets. For example, CASE 19 uses KLN 20 code, a linear expression for a compound, as descriptors to detect SAs. CatSAR 21 uses HQSAR, a module implemented in the commercial software SYBYL from Tripos Inc., to generate all possible substructures. Ferrari et al. developed a software named SARpy that can extract structural alerts from data set by fragmentation of all the compounds. 22 Golbamaki et al. explored new clues on carcinogenicity-related substructures via SARpy. 23,24 The graph-based approach defines chemical molecules as mathematic graphs that consist of a set of vertices and edges that respectively represent the atoms and bonds of compounds. Then the goal of detecting substructures is converted into searching subgraphs. 25 MolFea, 26 a linear substructure mining program, utilizes the Levelwise Version Space algorithm 27 to handle minimum and maximum frequency constraints. Nevertheless, the shortcoming of MolFea is obvious because it only mines linear substructures. This weakness was overcome by a popular algorithm named MoSS, 28 which utilizes depth-firstsearch association rules to mine substructures that considers side chains and cycles. Gaston 29 is another graph-based mining algorithm, which divides substructures into three subsets, namely paths, trees, and cycles. It allows user-defined node types like [N,O] that indicates nitrogen or oxygen. Ahlberg et al. 30 proposed a new graph-based mining method that uses Atom Signature as descriptor. Atom Signature, suggested by Faulon et al., 31 is a linear expression of a compound and can be calculated via Chemical Development Kit. 32 Molecular fingerprints can be viewed as a set of predefined fragments. 33 For example, the MDL MACCS (Molecular ACCess System) 34 fingerprint contains 166 public keys that represent the most common substructures. Thus, they can be used to detect potential structural alerts via calculation of their frequencies in a data set. All these methodologies have their advantages and defects in computing performance, predictive accuracy, or chemical space. They have been widely used in toxicological research and drug discovery However, to the best of our knowledge, there is no study to compare and evaluate these identification methods with a benchmark data set yet. In this paper, we compared four typical methods among the three types of approaches, namely MoSS (graph-based), SARpy (fragment-based), Bioalerts, and FP (both fingerprint-based), using a chemical Ames mutagenicity data set as the benchmark. We evaluated them from three aspects: (1) to assess the substructures identified by each method and analyze their rankings; (2) to compare the substructures with previous findings; and (3) to package the substructure sets detected by each method as rule-based prediction models to evaluate their predictive performance. Then we deeply analyzed the process of the detection systems and summarized their defects, which might be helpful for users to select appropriate methods and further development of methods for identification of structural alerts. MATERIALS AND METHODS Data Preparation. The most popular data set for chemical Ames mutagenicity is Hansen s benchmark that contains 6512 compounds. 39 Kazius data set, 40 a subset of Hansen s benchmark, is also a commonly used one. To ensure the compatibility, the Kazius data set was used as the benchmark here to compare different mining methods. The preparation of the data set was done by Kazius. 41 To validate the robustness of the methods, the rest of the Hansen s data set (after removing the compounds in Kazius data set) was used as the external validation set to further evaluate the mining methods. OpenBabel 42 was used to convert all the compounds into canonical SMILES format, which can be directly used to mine structural alerts. Methods for Identification of Structural Alerts. Four detecting methods were used in this study, namely MoSS, SARpy, Bioalerts, and FP (fingerprint). Mostly the substructures were described as SMILES, graphs, or fragments that only contain basic information on atoms and bonds. In FP, the substructures were described as SMARTS, an extension of SMILES, which contains more information about the atoms and bonds and allows use wildcard atom or bond types. SMARTS-based structural alerts are more diverse but more complicated. MoSS. MoSS is a graph-based method for structural alerts that generates fragments by embedding them in all molecules in parallel throughout the growth process. 28 This search strategy allows for a restricted depth first search algorithm, which results in excellent 1356

3 Figure 2. (A) Ideal substructure that separates the compounds into two subsets with lower entropy. The red part indicates the number of active (toxic) compounds, whereas the green part represents the inactive compounds; (B) function curve of entropy with variable of p that means the proportion of active (or inactive) compounds; (C) less desirable substructure in which though the active rate in positive set is high, the corresponding compounds are too rare, and the overall entropy changes little, that is, the IG is low. computing performance and theoretically covers all chemical space. MoSS was implemented in KNIME (V2.10.4). 43 We set the fragment size between 2 and 15, the minimum focus support (true positive rate) as 2%, the maximum complement support (false positive rate) as 10%, and other parameters were set as default. SARpy. SARpy ( is a free standalone tool that uses String mining to create substructures from the SMILES notation of training compounds. 22,23 Different from other methods, only entire branches or entire cycles are considered as potential structural alerts. SARpy evaluates existing substructures by likelihood ratio, a measure of precision intrinsic to the test. In the implementation of SARpy, the atom number was set between 2 and 15, the precision was set to OPTIMAL, and only positive rules (mutagens) were extracted. Bioalerts. Bioalerts ( is a python library for the derivation of structural alerts from toxicity data sets. It is a fingerprint-based method, which mainly relies on RDkit 44 implementation of Morgan fingerprints. 45 P-value is used in Bioalerts to evaluate the significance of a substructure. In Bioalerts, threshold_frequency was set to 0.75, and other parameters were set to default (p-value 0.05, nb 5). The structural alerts mined by Bioalerts were converted into SMARTS by RDKit. FP. The FP method was composed of three steps: (1) calculation of fingerprint; (2) removal of redundant substructures; and (3) assessment of potential structural alerts and removing redundant. PaDEL-Descriptor ( V2.20) 46 was used to calculate the fingerprints of MACCS (166 bits), PubChem (889 bits), 47 Klekota-Roth (4860 bits), 48 and Function Group Substructure (307 bits, classified by Christian Laggner and defined in OpenBabel). After the fingerprints of training compounds were calculated, we could get a matrix that shows the relationships between the compounds and the substructures defined in the fingerprint dictionary. Direct deletion of the duplicated substructures is impractical since we could not automatically determine whether two substructures represented by SMARTS are equivalent. Therefore, we first verified whether the sets of compounds matching the two substructures were the same, then manually judged whether they were the same substructure. Subtle differences between two substructures were ignored, and we regarded them as the same to avoid redundancy. For example, MACCS63 - [#7]=[#8], KR [!#1]N O, and KR N O all indicate double bond between a nitrogen and an oxygen. Though the meanings of the three SMARTS were different, they were considered to be the same, and the latter two were removed. We assessed and refined them through a threshold of ACC (accuracy rate) > Evaluation of Substructures and Predictive Models. Indices of Substructures for Assessment. Supposing that molecules can be categorized as mutagen or nonmutagen based on a binary property X, X will have two possible values (1 means mutagen and 0 means nonmutagen). For a potential structural alert (substructure identified by the four methods) T, we used condition t to define compounds that contain the substructure and used t to define compounds that do not contain the substructure. ACC is the rate between the number of mutagens that contain a certain substructure and the number of all compounds that contain the substructure (eq 1). Positive rate, also called coverage rate, is the probability of compounds containing the substructure (eq 2): ACC (accuracy rate) = PX ( = 1 t) (1) PR (positive rate) = Pt ( ) (2) where P(A) means probability of observing A. P(A B) is the probability of observing event A given the condition B. Besides the assessment methods mentioned earlier, another more complicated method called information gain (IG) was also used to assess a substructure. It was used in decision tree classification models to measure the relevance of an attribute to a category in data mining. 49 Shen et al. used this method to evaluate the importance of the substructures in MACCS fingerprint and labeled the highest substructures as structural alerts. 33 IG is defined as the difference between the information entropy of the origin data set and the weighted average information entropies of the two data sets separated 1357

4 by a substructure. 50 Eq 3 shows the definition of information entropy, in which p means the probability of molecules in one category. Eq 4 shows the formation of IG used in this study. An ideal substructure whose IG is very high can separate the data set into two: a high positive rate one and a high negative rate one (Figure 2a), in which the two separated data sets have a very low information entropy compared to the original data set. Information entropy as a function depends on the positive rate of a data set. When the data set is the most disordered, that is, positive rate equals 0.5, the entropy equals 1. When the data set is the most regular, that is, positive rate equals 0 or 1, the entropy equals 0. The function tendency is shown in Figure 2b. A substructure with high IG should not only have high ACC, but also be common enough (high PR). A substructure that has low PR with high ACC (Figure 2c) will get a low IG score: Ep ( ) = p log( p) (1 p) log(1 p) (3) IG = EPX ( ( = 1)) Pt ( ) EPX ( ( = 1 t)) Pt () EPx ( ( = 0 t)) (4) We calculated the three indices, IG, ACC, and PR, of all the potential structural alerts detected by the mentioned methods. OpenBabel and its python wrapper Pybel 51 were used to match SMARTS with compounds and verify equality of two compounds. Performance Evaluation of Predictive Models. For each predictive model, a sorted list of its structural alerts was generated. The structural alerts were sorted by either ACC or IG. Two indicators were calculated to show the accuracy and robustness of the models including positive coverage rate and negative coverage rate described further: PCR( L) = P( X = 1 C L ) (5) NCR( L) = P( X = 0 C L ) (6) where C L means the compounds that contain at least one substructure in the top L places of the potential structural alert list. C L means the compounds that contain none of the top L structural alerts. Comparison of Methods with Previous Work. The substructures identified by the four methods were then compared with the structural alerts reported in previous publications. ToxAlerts 52 collected several structural alerts for genotoxic carcinogenicity and mutagenicity from publications including Kazius et al., 40 Benigni et al., 53 Bailey et al., 54 and Ashby s. 7 We then collected these structural alerts and manually discarded the redundant patterns. If two alerts were similar, we would select the more general one. For example, there were three aromatic nitro alerts in ToxAlerts. Two of them were detected by Kazius, and the other was defined by Benigni R et al. 53 We selected the most general pattern, which is [a!r0][$([nx3+](=[ox1])([o-]))]. In addition, previously Kazius detected structural alerts with Gaston from the same data set as we used. 41 Therefore, we compared the results from Kazius with all the substructures identified by the four methods here and ranked these substructures with IG values. RESULTS Data Sets and Substructures Obtained. In this study, to evaluate and compare the methods for identification of structural alerts, chemical Ames mutagenicity data set was selected as the benchmark at first. The key substructures for chemical mutagenicity were then obtained by those methods and used for evaluation and comparison of those methods. The Kazius data set of chemical Ames mutagenicity, used as the training set, contained 4069 distinct compounds, of which 2294 were labeled as mutagens and 1775 as nonmutagens. The external validation set, the rest of the Hansen s data set (after removing the compounds in Kazius data set), was used to evaluate the performance of the methods, which consisted of 3363 compounds, including 1736 mutagens and 1627 nonmutagens. On the basis of the Kazius data set, 129 frequent substructures were obtained from MoSS, while 130 potential structural alerts were found from SARpy. Bioalerts detected 1094 patterns, from which we selected 395 unique substructures whose IG values were higher than In FP method, the fingerprints output 6214 bits in total. Then we retained the fragments whose accuracy is higher than 0.75 and discarded the nonsubstructure patterns (e.g., MACCS49 is [! +0] ). Finally, 173 patterns were left with a manual screening because many patterns were similar (Table S1). Distribution of Substructures with Three Indices. ACC, PR, and IG are the three most important indices that could decide the capability of the substructures to predict the toxicity of a compound. These three indices were hence used to evaluate the performance of the four methods. After calculations, the three indices of each substructure were plotted in Figure 3. Although both the ACC and PR values of a Figure 3. Distribution of the substructures with their ACC and PR and colored by IG values. substructure were expected to be close to 1, it was nearly impossible to get a substructure of which ACC and PR values were both very high. When the ACC value became close to 1, the maximum of PR value was simultaneously decreased. The border of PR (regarding ACC as independent variable) formed a curved virtual boundary. The more close to the virtual boundary a substructure was, the higher IG value it had. The highest IG value was and the corresponding ACC = 0.809, PR = As the ACC value became higher or lower (while the PR value became lower or higher), the IG value decreased. Figure 4 shows the distribution of the substructures detected by each method. The FP method occupied the highest average IG value, , and the highest average PR value. The ACC values of most of its substructures distributed between The average ACC value from FP was not as high as those from Bioalerts and SARpy. Bioalerts had the highest average ACC value, 0.941, and its IG value was also high. The average PR value of the substructures detected by Bioalerts was not high because most of its substructures distributed PR between 0 and However, if considering the absolute number of substructures with high PR values (>0.1), Bioalerts was still outstanding. As for SARpy, though it did not remove those substructures with low ACC values (<0.75), most of the substructures it detected had high ACC values, while the IG 1358

Figure 4. Statistics of the substructures detected by each method with their ACC, PR, and IG values. The legend in each subgraph displays the average index of each method. Figure 5.

5 Figure 4. Statistics of the substructures detected by each method with their ACC, PR, and IG values. The legend in each subgraph displays the average index of each method. Figure 5. Performance curves of the four models in training set and comparison with other models in external validation set. The three commercial models (PipelinePolit, MultiCASE, and DEREK) were evaluated in five-fold cross-validation by Hansen et al. 39 values were very low because of the lower PR values. Substructures detected by MoSS were the lowest in both ACC and IG values, though the average PR value was high. From a general view, Bioalerts was the best detecting method, and FP method focused more on PR, while SARpy preferred ACC. The average IG values of FP was higher than the average IG values of SARpy. Capability and Performance of the Four Methods. To evaluate the capability of the methods, three aspects were investigated. At first, we inspected the performance of the rulebased models using the substructures detected by the four methods. Figure 5 shows the performance of the four models in both training set and external validation set. SARpy could get the highest accuracy compared to other three methods in both training set and validation set, which illustrated that the structural alerts identified by SARpy could help predict the toxicity of a compound due to its high accuracy and interpretability. The performance of FP method and Bioalerts were similar, and FP method had a higher maximum coverage rate (89.4% for training set and 86.2% for validation set). This implied that the substructures obtained by FP method were more common than the others. As for MoSS, its accuracy was the lowest compared to other methods in both training set and validation set. The performances of the four methods in validation set were compared with the previous models made by Hansen et al. (Figure 5). The results showed that PipelinePilot, based on molecular fingerprints and machine learning methods, performed the best. MultiCASE was similar in performance with the four methods we used, and all these methods were rule-based methods that automatically detect structural alerts in a data set. The expert system DEREK performed the worst compared to other methods. Among the five structural alerts detection methods, SARpy performed the best in prediction. Then we compared the substructures identified by the four methods with those in publications. From ToxAlerts, we collected 46 patterns represented as SMARTS format to compare with the structural alerts we obtained. All the patterns were represented as SMARTS listed in Table S1. The comparison results of the four methods with patterns from ToxAlerts as references were shown in Table 1. Twenty patterns were matched by at least one method. SARpy matched the most patterns compared to the other three methods, and plus score (indicating the substructures were more specific than 1359

6 Table 1. Comparison of the Four Methods with the Patterns from ToxAlerts a was a little lower in comparison with Bioalerts. Though MoSS could detect azo group and fully matched 10 patterns from ToxAlerts, which was similar to other methods, the number of missing structural alerts was also high so that the accuracy of this method was unsatisfactory. In terms of time-consuming of each method, we made a simple comparison. As illustrated in Table 2, the graph-based Table 2. Time Consuming of Different Methods method substructure mining (sec) fragmentation or FP calculation evaluation (sec) sum (sec) MoSS <3 SARpy FP 2337 (about 39 min) Bioalerts a V means completely matched or almost the same. X means entirely missing. + means the substructures mined by the method were more specific compared to ToxAlerts. + + or + ++ indicates the number of the relevant substructures was high and higher. means the substructures were more common compared to ToxAlerts, and means much more common substructures. For example, Cl * is much more common than C C Cl. ToxAlerts) was also the highest, which could explain why SARpy obtained the highest performance in prediction. FP method had the highest minus score (indicating the substructures were more common than ToxAlerts). The substructures identified by FP were common enough so that the maximal positive coverage rate was higher, and the accuracy detecting method MoSS ran the fastest (only few seconds that might be ignored), whereas the fingerprint-based method FP performed the slowest among the four methods. The results illustrated that the graph-based methods have advantages in computational performance and may be better to detect alerts in a big data set. Nevertheless, in this study, all these methods were acceptable in performance since the slowest one took only about 40 min. Therefore, when alerts are detected in a small data set with less compounds, all four methods could be used. Key Structural Alerts for Chemical Mutagenicity. Since the chemical Ames mutagenicity data set was used as the benchmark to evaluate the four methods, some key structural alerts for chemical mutagenicity were obtained. Ranked by IG values, the top substructures identified by the four methods we used and by Kazius were shown in Table 3. These patterns were mainly divided into two categories, nitrogenous functional groups and fragments in poly aromatic compounds. Among them, N O was shared by all the four methods and Kazius method. Others were mostly detected by FP method and Bioalerts, and they were very similar to subtle difference. For example, the third pattern obtained by Bioalerts cc(c)c allows both biphenyl and condensed ring (naphthalene derivatives). The 11th pattern [#6](:c)(:c)(:c) defined in Pubchem fingerprint constrained that compounds should be condensed rings. DISCUSSION Analysis of Different Methodologies for Structural Alerts. All four methods we compared in this study assess a potential structural alerts based on the occurrence of the substructure in a data set. From the aspect of statistics, a large data set (hundreds or thousands of compounds) is required to make the frequency of the substructures significant. In contrast, it is difficult to learn the rules from a small data set by computational methods, and empirical rules and expert systems may be more helpful. Most detecting methods for structural alerts regarded ACC and its analogs as the most important factor. For example, SARpy sorted the substructure candidates based on likelihood ratio, which is monotonically increasing as ACC increases. Another similar assessment method called enrichment factor 55 was also used in this field to rank the substructures. 1,38,56 Other studies implemented accuracy or p-value, which is also similar, 30 that is, all these evaluation indicators are monotonically correlated. MoSS uses two thresholds, min 1360

7 Table 3. Key Structural Alerts Detected by Different Methods and Ordered by IG supporting rate and max complemental rate, to extract the potential structural alerts. If the thresholds are too narrow, the substructures obtained will be very few and if too wide they will be too many and result in false positive substructures. It is more sensible to widen the thresholds and use additional assessment process to enrich the structural alerts. Using ACC as the main index for substructure detection may lead to overfitting, in which the substructures identified are too specific so that it will lack representativeness. Overly specific structural alerts may have higher accuracy but are less significant, which is nonbeneficial for researchers. For example, in Figure S1a, although the three substructures mined by SARpy had accuracy of 100%, the emerging frequency of them was only about 10, and the constructs were not common enough. Comparatively, a pattern mined by Kazius could match a large number of compounds in the data sets, and meanwhile the accuracy of the prediction was still high (Figure S1b). This substructure indicated that condensed ring as skeleton may result in genotoxicity. We designed two similar SMARTS, shown in Figure S1c, to repeat the result of Kazius, and got a similar result. Therefore, PR is also critical in assessment of a substructure to avoid over specific structural alerts. According to the distribution of structural alerts shown in Figure 4, we cannot expect a substructure with both high ACC and high PR. A compromise index is necessary to evaluate the significance of a substructure. IG is preferable and is widely applied in feature selection in machine learning methods and is an important index in decision tree. The distribution in Figure 4 also suggested that a substructure with extremely high ACC value might not be as useful as that with a lower ACC but higher PR value, according to the IG values. Although IG value can be regarded as a comprehensive score in evaluation of a substructure, there are still some shortages. If a substructure has a low ACC value but high PR value, its IG value will be still very high for its commonness. However, if its ACC is too low, the researchers may lose interest in it or doubt the potency. For example, the substructure allylamine ( N C C C ) had a comparatively high IG score, but its ACC was only On one hand, the high occurrence represented the significance of the pattern that deserved to pay attention, on the other hand, the accuracy was too low and the indices illustrated that more than 600 compounds containing the substructure were inactive. It is still hard to balance the contradiction between the accuracy and positive rate even though the IG score can comprehensively evaluate the performance. Implication for Method Development of Structural Alerts. Many identified substructures for alert were similar, especially with fingerprint-based methods. A good example is the structural alerts shown in Table 3. From the view of chemistry or chemoinformatics, N O, N O, cn O, N =* are different ( * indicates any atom, c means an aromatic carbon, and indicates any bond type). Each of them had high IG and ACC values, and they were correlated. Meanwhile, cc(c)c differed from cccc(c)c in the view of fragment, but they may both indicate condensed rings so that their indices were the same. However, it is difficult to judge whether they are redundant. Current detecting methods for structural alerts do not consider this problem because they only consider whether a substructure s ACC (or other indices) reaches the threshold. These methods can generate hundreds of fragments that are possibly related and even hierarchical (Figure S2 illustrates one example). A lot of work need to be done to eliminate the redundant fragments. Another kind of doubtful structural alerts was caused by the concurrence of two substructures. A typical example is o-nbromobenzene. Through calculation of the frequency of the compounds that contained o-n-bromobenzene, we found that all six compounds were positive. However, after we analyzed the six compounds, we assumed that the toxicity might not be caused by o-n-bromobenzene rather than other structural alerts like nitro group, anthracene analogue, or benzidine (Figure S3). All six compounds have other structural alerts that may be the primary cause of their toxicities, and the o-n-bromobenzene was probably a false positive structural alert. Most identification methods for structural alerts did not consider the effect of existing structural alerts in a compound, so that false positives might happen with limited supporting compounds. Therefore, in the future development of identification methods for structural alerts, it is necessary to consider how to generalize the specific structural alerts, which will also avoid redundancy. 1361

8 Some other methods that consider the occurrence of two or more substructures, such as emerging pattern, 57 are excellent and deserve to be more widely applied and referenced. A molecular pattern is defined to be a set of molecular fragments. 58 Gillet et al. proposed a jumping emerging pattern (JEP) 59 mining algorithm that used atom pairs 60 as descriptors to find the patterns. Afterward, emerging pattern mining 61 was proposed to mine toxic features using the contrast pattern tree algorithm. 62 However, the patterns may be identified by chance if the supporting data set is limited, and it may still be challenging to avoid false positives. It should be noted that the hypothesis that electrophilic reactivity leads to chemical mutagenicity has been acknowledged as a common sense; 14 hence, it is amenable to finding the causal relationships between chemical mutagenicity and discrete substructures. In contrast, many other toxicological end points, in particular toxicities associated with pharmacophoric toxicophores, may not be suitable to this sort of analysis. Therefore, it is necessary to develop new methods for structural alerts of those toxicological end points. CONCLUSIONS In this paper, four methodologies were compared and evaluated in terms of their capability on identification of structural alerts using chemical Ames mutagenicity data set as a benchmark. All the methods are freely available and easy to use. Three parameters, namely IG, ACC, and PR, were used as main evaluation indices. By comparison of the performance of the four methods, we found that fingerprint-based methods demonstrated strong capability in identification of significant structural alerts. They could cover most of the positive sets and the accuracy was acceptable; however, they contained too many redundant patterns and false positives. MoSS was the fastest and also could identify a lot of significant structural alerts, while its assessment method may result in high false positive and false negative. SARpy could detect highly accurate and specific substructures owing to its assessment of likelihood ratio, and it is more preferable to use its structural alerts to build rule-based predictive models. At the end of the paper, we showed that current methods for structural alerts might identify redundant and overspecific substructures. Therefore, it is still challenging for us to automatically detect structural alerts, which are more convincing and beneficial to researchers. ASSOCIATED CONTENT *S Supporting Information The Supporting Information is available free of charge on the ACS Publications website at. Structural alerts detected by SARpy, MoSS, FP, and Bioalerts; structural alerts collected from ToxAlerts (XLSX) Supporting figures (PDF) AUTHOR INFORMATION Corresponding Author *Phone: ytang234@ecust.edu.cn. ORCID Hongbin Yang: Weihua Li: Yun Tang: Funding This work was supported by the National Key Research and Development Program (Grant No. 2016YFA ), the National Natural Science Foundation of China (Grant Nos and ), and the 111 Project (Grant No. B07023). Notes The authors declare no competing financial interest. ACKNOWLEDGMENTS The authors would like to thank Dr. Zhengyu Yin of Dupont Shanghai Innovation Center for the valuable comments in preparation of this manuscript. ABBREVIATIONS ACC, accuracy rate; IG, information gain; JEP, jumping emerging pattern; PR, positive rate; RF, random forest; SA, structural alerts; SAR, structure activity relationships; SVM, support vector machine REFERENCES (1) Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., Lee, P. W., and Tang, Y. (2012) In silico prediction of chemical Ames mutagenicity. J. Chem. Inf. Model. 52, (2) Webb, S. J., Hanser, T., Howlin, B., Krause, P., and Vessey, J. D. (2014) Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity. J. Cheminf. 6, 8. (3) Stepan, A. F., Walker, D. P., Bauman, J., Price, D. A., Baillie, T. A., Kalgutkar, A. S., and Aleo, M. D. (2011) Structural alert/reactive metabolite concept as applied in medicinal chemistry to mitigate the risk of idiosyncratic drug toxicity: a perspective based on the critical examination of trends in the top 200 drugs marketed in the United States. Chem. Res. Toxicol. 24, (4) Liu, R., Yu, X., and Wallqvist, A. (2015) Data-driven identification of structural alerts for mitigating the risk of druginduced human liver injuries. J. Cheminf. 7, 4. (5) Benigni, R., Bossa, C., Alivernini, S., and Colafranceschi, M. (2012) Assessment and validation of US EPA s OncoLogic(R) expert system and analysis of its modulating factors for structural alerts. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev. 30, (6) Pizzo, F., Gadaleta, D., Lombardo, A., Nicolotti, O., and Benfenati, E. (2015) Identification of structural alerts for liver and kidney toxicity using repeated dose toxicity data. Chem. Cent. J. 9, 62. (7) Maron, D. M., and Ames, B. N. (1983) Revised methods for the Salmonella mutagenicity test. Mutat. Res. 113, (8) Piegorsch, W. W., and Zeiger, E. (1991) Measuring Intra-Assay Agreement for the Ames Salmonella Assay. In Statistical Methods in Toxicology: Proceedings of a Workshop during EUROTOX 90 Leipzig, Germany, September 12 14, 1990 (Hothorn, L., Ed.) pp 35 41, Springer, Berlin Heidelberg. (9) Ashby, J., and Tennant, R. W. (1988) Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. Mutat. Res., Genet. Toxicol. Test. 204, (10) Shelby, M. D. (1988) The genetic toxicity of human carcinogens and its implications. Mutat. Res., Genet. Toxicol. Test. 204, (11) Williams, G. M., Mori, H., and McQueen, C. A. (1989) Structure-activity relationships in the rat hepatocyte DNA-repair test for 300 chemicals. Mutat. Res., Rev. Genet. Toxicol. 221, (12) Enoch, S. J., Ellison, C. M., Schultz, T. W., and Cronin, M. T. D. (2011) A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity. Crit. Rev. Toxicol. 41, (13) Li, H. Q., Qi, F. H., and Wang, S. Y. (2005) A comparison of model selection methods for multi-class support vector machines.

9 Computational Science and Its Applications - Iccsa , (14) Benigni, R., and Bossa, C. (2006) Structural Alerts of Mutagens and Carcinogens. Curr. Comput.-Aided Drug Des. 2, (15) Ridings, J. E., Barratt, M. D., Cary, R., Earnshaw, C. G., Eggington, C. E., Ellis, M. K., Judson, P. N., Langowski, J. J., Marchant, C. A., Payne, M. P., Watson, W. P., and Yih, T. D. (1996) Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. Toxicology 106, (16) (2012) Rules-based System Designed to Support the ICH M7 Guidelines on Impurities, Leadscope, Inc. genetox_expert_alerts/ (accessed April 24, 2017). (17) Lepailleur, A., Poezevara, G., and Bureau, R. (2013) Automated detection of structural alerts (chemical fragments) in (eco)toxicology. Comput. Struct. Biotechnol. J. 5, e (18) Floris, M., Raitano, G., Medda, R., and Benfenati, E. (2016) Fragment Prioritization on a Large Mutagenicity Dataset. Mol. Inf /minf (19) Klopman, G. (1984) Artificial intelligence approach to structureactivity studies. Computer automated structure evaluation of biological activity of organic molecules. J. Am. Chem. Soc. 106, (20) Klopman, G., and McGonigal, M. (1981) Computer simulation of physical-chemical properties of organic molecules. 1. Molecular system identification. J. Chem. Inf. Model. 21, (21) Cunningham, A. R., Moss, S. T., Iype, S. A., Qian, G., Qamar, S., and Cunningham, S. L. (2008) Structure-Activity Relationship Analysis of Rat Mammary Carcinogens. Chem. Res. Toxicol. 21, (22) Ferrari, T., Cattaneo, D., Gini, G., Golbamaki Bakhtyari, N., Manganaro, A., and Benfenati, E. (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ. Res. 24, (23) Golbamaki, A., Benfenati, E., Golbamaki, N., Manganaro, A., Merdivan, E., Roncaglioni, A., and Gini, G. (2016) New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds. J. Environ. Sci. Health C Environ. Carcinog. Ecotoxicol. Rev. 34, (24) Golbamaki, A., and Benfenati, E. (2016) In Silico Methods for Carcinogenicity Assessment. Methods Mol. Biol. 1425, (25) Liu, P., Agrafiotis, D. K., and Rassokhin, D. N. (2011) Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching. J. Chem. Inf. Model. 51, (26) Kramer, S., De Raedt, L., and Helma, C. (2001) Molecular Feature Mining in HIV Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , ACM. (27) Raedt, L. D., and Kramer, S. (2001) The levelwise version space algorithm and its application to molecular fragment finding. In Proceedings of the 17th international joint conference on artificial intelligence, Vol. 2, pp , Morgan Kaufmann Publishers Inc., Seattle, WA. (28) Borgelt, C., and Berthold, M. R. (2002) Mining molecular fragments: Finding relevant substructures of molecules. In ICDM 2003 Proceedings, 2002 IEEE International Conference on Data Mining, pp 51 58, IEEE. (29) Nijssen, S., and Kok, J. N. (2004) A quickstart in frequent structure mining can make a difference. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , ACM. (30) Ahlberg, E., Carlsson, L., and Boyer, S. (2014) Computational Derivation of Structural Alerts from Large Toxicology Data Sets. J. Chem. Inf. Model. 54, (31) Faulon, J. L., Visco, D. P., and Pophale, R. S. (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43, (32) Steinbeck, C., Han, Y. Q., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E. (2003) The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, (33) Shen, J., Cheng, F., Xu, Y., Li, W., and Tang, Y. (2010) Estimation of ADME properties with substructure pattern recognition. J. Chem. Inf. Model. 50, (34) Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, (35) Sun, L., Zhang, C., Chen, Y. J., Li, X., Zhuang, S. L., Li, W. H., Liu, G. X., Lee, P. W., and Tang, Y. (2015) In silico prediction of chemical aquatic toxicity with chemical category approaches and substructural alerts. Toxicol. Res. 4, (36) Chen, Y., Cheng, F., Sun, L., Li, W., Liu, G., and Tang, Y. (2014) Computational models to predict endocrine-disrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol. Environ. Saf. 110, (37) Yang, H., Li, X., Cai, Y., Wang, Q., Li, W., Liu, G., and Tang, Y. (2017) In silico prediction of chemical subcellular localization via multi-classification methods. MedChemComm in press. DOI: / C7MD00074J. (38) Li, X., Chen, L., Cheng, F., Wu, Z., Bian, H., Xu, C., Li, W., Liu, G., Shen, X., and Tang, Y. (2014) In silico prediction of chemical acute oral toxicity using multi-classification methods. J. Chem. Inf. Model. 54, (39) Hansen, K., Mika, S., Schroeter, T., Sutter, A., ter Laak, A., Steger-Hartmann, T., Heinrich, N., and Muller, K. R. (2009) Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J. Chem. Inf. Model. 49, (40) Kazius, J., McGuire, R., and Bursi, R. (2005) Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 48, (41) Kazius, J., Nijssen, S., Kok, J., Back, T., and Ijzerman, A. P. (2006) Substructure mining using elaborate chemical representation. J. Chem. Inf. Model. 46, (42) O Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G. R. (2011) Open Babel: An open chemical toolbox. J. Cheminf. 3, 33. (43) (2007) KNIME: The Konstanz Information Miner, Springer. (accessed Jan 10, 2017). (44) Landrum, G. (2016) RDKit. (accessed Jan 10, 2017). (45) Cortes-Ciriano, I. (2016) Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets. J. Cheminf. 8, 13. (46) Yap, C. W. (2011) PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 32, (47) (2009) PubChem Substructure Fingerprint. ftp://ftp.ncbi.nlm.nih. gov/pubchem/specifications/pubchem_fingerprints.txt (accessed Jan 10, 2017). (48) Klekota, J., and Roth, F. P. (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24, (49) Quinlan, J. R. (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc. (50) Sokolova, M., and Szpakowicz, S. (2010) In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. (51) O Boyle, N. M., Morley, C., and Hutchison, G. R. (2008) Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2, 5. (52) Sushko, I., Salmina, E., Potemkin, V. A., Poda, G., and Tetko, I. V. (2012) ToxAlerts: A Web Server of Structural Alerts for Toxfic Chemicals and Compounds with Potential Adverse Reactions. J. Chem. Inf. Model. 52, (53) Benigni, R., and Bossa, C. (2008) Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. Mutat. Res., Rev. Mutat. Res. 659,

10 (54) Bailey, A. B., Chanderbhan, R., Collazo-Braier, N., Cheeseman, M. A., and Twaroski, M. L. (2005) The use of structure-activity relationship analysis in the food contact notification program. Regul. Toxicol. Pharmacol. 42, (55) Jensen, B. F., Vind, C., Padkjaer, S. B., Brockhoff, P. B., and Refsgaard, H. H. F. (2007) In silico prediction of cytochrome P450 2D6 and 3A4 inhibition using Gaussian kernel weighted k-nearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. J. Med. Chem. 50, (56) Chen, Y. J., Cheng, F. X., Sun, L., Li, W. H., Liu, G. X., and Tang, Y. (2014) Computational models to predict endocrinedisrupting chemical binding with androgen or oestrogen receptors. Ecotoxicol. Environ. Saf. 110, (57) Dong, G., and Li, J. (1999) Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp43 52, ACM. (58) Bertrand, C., Guillaume, P., Bruno, C., Alban, L., and Ronan, B. (2012) Emerging Patterns as Structural Alerts for Computational Toxicology. In Contrast Data Mining, Chapman and Hall/CRC. (59) Sherhod, R., Gillet, V. J., Judson, P. N., and Vessey, J. D. (2012) Automating Knowledge Discovery for Toxicity Prediction Using Jumping Emerging Pattern Mining. J. Chem. Inf. Model. 52, (60) Carhart, R. E., Smith, D. H., and Venkataraghavan, R. (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Model. 25, (61) Sherhod, R., Judson, P. N., Hanser, T., Vessey, J. D., Webb, S. J., and Gillet, V. J. (2014) Emerging pattern mining to aid toxicological knowledge discovery. J. Chem. Inf. Model. 54, (62) Fan, H., and Ramamohanarao, K. (2006) Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans. Knowl. Data Eng. 18,

Emerging patterns mining and automated detection of contrasting chemical features

Emerging patterns mining and automated detection of contrasting chemical features Alban Lepailleur Centre d Etudes et de Recherche sur le Médicament de Normandie (CERMN) UNICAEN EA 4258 - FR CNRS 3038