Validation of QSAR Models - Strategies and Importance

Size: px
Start display at page:

Download "Validation of QSAR Models - Strategies and Importance"

Transcription

1 Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 511 International Journal of Drug Design and Discovery Volume Issue 3 July September Validation of QSAR Models - Strategies and Importance Ravichandran Veerasamy 1*, Harish Rajak, Abhishek Jain 3, Shalini Sivadasan 1, Christapher P. Varghese 1 and Ram Kishore Agrawal 3 1 Faculty of Pharmacy, AIMST University, Semeling 08100, Kedah, Malaysia. Institute of Pharmaceutical Sciences, Guru Ghasidas University, Bilaspur (CG), India. 3 Department of Pharmaceutical Sciences, Dr. H. S. Gour University, Sagar, India ABSTRACT: Quantitative Structure-Activity Relationship (QSAR) is based on the hypothesis that changes in molecular structure reflect changes in the observed response or biological activity. The success of any quantitative structure activity relationship model depends on the accuracy of the input data, selection of appropriate descriptors, statistical tools and the validation of the developed model. Validation is a crucial aspect of QSAR modeling. Validation is the process by which the reliability and significance of a procedure are established for a specific purpose. Hence in this review we focus on the importance of validation of QSAR models and different methods of validation. KEYWORDS: QSAR; Internal validation; External validation; Randomization. Introduction A variety of in vitro and computational methods are being proposed for the toxicological assessment of chemicals owing to the increasing requirement for alternative methods to in vivo toxicity testing. More efforts will be given to include methods such as in vitro techniques and Quantitative Structure-Activity Relationship (QSAR) models in the regulatory decision-making process in Europe 1, under the new registration, evaluation, and authorization of chemicals (REACH) system 3. For reasons of practicality, cost effectiveness and animal welfare, it is envisaged that QSAR will play an important role in the assessment of existing chemicals for which further information may be required under the REACH system. Therefore, it will be essential that the QSAR models used will produce reliable estimates. QSAR is a widely accepted predictive and diagnostic process used for finding associations between chemical structures and biological activity. QSAR has emerged and has evolved trying to fulfill the medicinal chemist s need and desire to predict biological response 4. The quantitative structure-activity relationship paradigm first found its way into the practice of agro chemistry, pharmaceutical * For correspondence: Tel.: Ext. 109; Fax: phravi75@rediffmail.com chemistry, toxicology, and eventually most facets of chemistry. The overall goals of QSAR retain their original essence and remain focused on the predictive ability of the approach and its receptiveness to mechanistic interpretation. QSAR includes all statistical methods, by which biological activities (most often expressed by logarithms of equipotent molar activities) are related with structural elements (Free Wilson analysis), physicochemical properties (Hansch analysis), or fields (threedimensional QSAR). It is a technique that quantifies the relationship between structure and biological data and is useful for optimizing the groups that modulate the potency of the molecule 5. There are different types of computational methods in QSAR depends upon the data complexity. Those are twodimensional (D), three-dimensional (3D) and higher dimensional methods 6. D-QSAR is insensitive to the conformational arrangements of atoms in space, while 3D- QSAR needs information on the position of the atoms in three spatial dimensions 7. In 4D-QSAR for each molecule, a set of automatically docked orientations and conformations are developed by genetic algorithms 8-1. Induced-fit scenarios of ligands upon binding to the active site and solvation models can be thought of as the fifth (protein flexibility) and sixth (entropy) dimensions in 5Dand 6D-QSAR, respectively. 511

2 51 International Journal of Drug Design and Discovery Volume Issue 3 July September 011 The process of QSAR model development can be generally divided into three stages: data preparation, data analysis and model validation (Scheme 1). These steps represent a standard practice of any QSAR modeling and their implementations are often determined by the researcher s interests, experience and software availability. Acquiring a good quality QSAR model depends on many factors, such as the quality of biological data, the choice of descriptors, variable selection, statistical methods and validation Validation is a crucial aspect of any QSAR modeling. It is the process by which the reliability and relevance of a procedure are established for a specific purpose 19. Formal validation is one of the most overlooked steps in the model development at many times. In the QSAR community, the validation of a model is little more than an assessment of statistical fit and, occasionally, predictivity using crossvalidation techniques. However, it is now being accepted that validation is a more important process that includes assessment of issues such as data quality, applicability of the model and mechanistic interpretability in addition to statistical assessment 0. In this review, we discussed about the validation parameters for the QSAR models which are developed by multiple linear regression (MLR) and partial least square regression (PLS). Scheme 1 Various stages of QSAR model development

3 Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 513 Importance of validation of QSAR models The QSAR models are useful for various purposes including the prediction of activities of untested chemicals. The success of drug discovery efforts depends heavily on the use of structure-activity relationship techniques 1. Over the years of development, many methods, algorithms and techniques have been discovered and applied in QSAR studies. Main challenge in QSAR is to select the group of descriptors which describe the most critical structural and physicochemical features associated with activity. Quality of biological data and effective descriptor or variable selection is an initial essential part of the QSAR modeling process 3. The application of QSAR models is depends on statistical significance and predictive ability of these models. The regulatory decisions, justification of usage and use of a particular QSAR model is depend on the model ability to predict unknown chemicals with some known degree of certainity 18,4. QSAR model can lead to false prediction of biological activity if the developed QSAR model is not validated. So validation of QSAR models, after model development, is most important part in QSAR studies. QSAR has clearly matured, although it still has a way to go. The estimation of accuracy of predictions is a critical problem in QSAR modeling 5. Only in this decade, validation of QSAR models has received considerable attention 14-17, Four tools of assessing validity of QSAR models 35 are (i) cross-validation, (ii) bootstrapping, (iii) randomization of the response data, and (iv) external validation. Several principles for assessing the validity of QSAR models were proposed at an International workshop held in Setubal (Portugal), which were subsequently modified in 004 by the OECD Work Programme on QSARs 1,3. Against this background, in this review we have discussed about the different validation methods of QSAR models. Validation Methods for QSAR Models Validation methods are needed to establish the predictiveness of a model on unseen data and to help determine the complexity of an equation that the amount of data justifies. Using the data that created the model (an internal method) or using a separate data set (an external method) can help validate the QSAR model (Scheme ). The methods of least squares fit (R ), crossvalidation (Q ) 36-38, adjusted R (R adj), chi-squared test ( ), rootmean-squared error (RMSE), bootstrapping and scrambling (Y-Randomization) 39,40 are internal methods of validating a model. The best method of validating a model is an external method, such as evaluating the QSAR model on a test set of compounds. These are statistical methodologies used to ensure the models created are sound and unbiased ( good model ). A poor model can do more harm than good, thus confirming the model as a good model is of utmost importance. Scheme Various stages of QSAR model validation

4 514 International Journal of Drug Design and Discovery Volume Issue 3 July September 011 Internal Validation Least Squares Fit The most common internal method of validating the model is least squares fitting. This method of validation is similar to linear regression and is the R (squared correlation coefficient) for the comparison between the predicted and experimental activities. An improved method of determining R is the robust straight line fit, where data points are away from the central data points (essentially data points a specified standard deviation away from the model) are given less weight when calculating the R. An alternative to this method is the removal of outliers (compounds from the training set) from the dataset in an attempt to optimize the QSAR model and is only valid if strict statistical rules are followed. The difference between the R and R adj value is less than 0.3 indicates that the number of descriptors involved in the QSAR model is acceptable. The number of descriptors is not acceptable if the difference is more than 0.3. R = Fit of the Model NXY ( x) ( Y) ([NX ( X) ] [NY ( Y) ]) Fit of the QSAR models can be determined by the methods of chi-squared ( ) and root-mean squared error (RMSE). These methods are used to decide if the model possesses the predictive quality reflected in the R. The use of RMSE shows the error between the mean of the experimental values and predicted activities. The chi squared value exhibits the difference between the experimental and predicted bioactivities: n (y i y ˆ i) = i1 ŷ n RMSE = Sqrt i1 i (y ˆ i y m) n1 Where, y and ŷ are the experimental and predicted bioactivity for an individual compound in the training set, y m is the mean of the experimental bioactivities, and n is the number of molecules in the set of data being examined. Large chi-square or RMSE values ( 0.5 and 1.0, respectively) reflect the model s poor ability to accurately predict the bioactivities even the model is having large R value ( 0.7). For good predictive model the chi and RMSE values should be low (<0.5 and <0.3, respectively). These methods of error checking can also be used to aid in creating models and are especially useful in creating and validating models for nonlinear data sets, such as those created with Artificial Neural Network (ANN) 41. However, excellent values of R, and RMSE are not sufficient indicators of model validity. Thus, alternative parameters must be provided to indicate the predictive ability of models. In principle, two reasonable approaches of validation can be envisaged one based on prediction and the other based on the fit of the predictor variables to rearranged response variables. Cross-validation A common method for internally validating a QSAR model is cross-validation (CV, Q, q, or jack-knifing). CV process repeats the regression many times on subsets of data. Usually each molecule is left out once (only), in turn, and the R is computed using the predicted values of the missing molecule. Sometimes more than one molecule (leave many out, LMO) is left out at a time. CV is often used to determine how large a model can be used for a given data set. A cross-validated R is usually smaller than the overall R for a QSAR equation. It is used as a diagnostic tool to evaluate the predictive power of an equation. CV used to measure a model s predictive ability and draw attention to the possibility a model has been overfitted. Over-fitting refers to the phenomenon in which a predictive model may well describe the relationship between predictors and response, but may subsequently fail to provide valid predictions for new compounds. Overfitting of the model is usually suspected when the R value from the original model is significantly larger (5%) than the Q value (Difference between R and Q should not be more than 0.3) 36. CV values are considered more characteristic of the predictive ability of the model. Thus, CV is considered a measure of goodness of prediction and not fit in the case of R. The process of CV begins with the removal of one or a group of compounds, which becomes a temporary test set, from the training set. A CV model is created from the remaining data points using the descriptors from the original model, and tested on the removed molecules for its ability to correctly predict the bioactivities. In the leave-one-out (LOO) method of CV, the process of removing a molecule, and creating and validating the model against the individual molecules is performed for the entire training set. Once complete, the mean is taken of all the Q values and reported. The data utilized in obtaining Q is an augmented training set of the compounds (data points) used to determine R. The method of removing one molecule from the training set is considered to be an inconsistent method 37,38. A more correct method is leave-many-out (LMO), where a group of compounds is selected for validation of the CV model. This method of cross-validation is especially useful if the training set used to create the model is small ( 0

5 Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 515 compounds) or if there is no test set. For good predictability R Q value should not exceed 0.3. The equation for CV is Q PRESS = 1 N (y y ) i1 i m PRESS = N (y pred,i y i ) i1 where the y i is the data value(s) not used to construct the CV model. PRESS is the predictive residual sum of the squares. Beware of Q To validate a QSAR model, most of researchers apply the LOO or LMO CV procedures. The outcome from this procedure is a cross-validated correlation coefficient R (Q ). Frequently, Q is used as a criterion of both robustness and predictive ability of the model. Many authors consider high Q (for instance, Q > 0.5) as an indicator or even as the ultimate proof of the high predictive power of, the QSAR model. They do not test the models for their ability to predict the activity of compounds of an external test set (i.e., compounds that have not been used in the QSAR model development). There are several examples of recent publications, in which the authors claim that their models have high predictive ability without validating them by use of an external test set Some authors validate their models by the use of only one or two compounds that were not used in QSAR model development 47,48 and still claim that their models are highly predictive. However, it has been found that if a test set with known values of biological activities is available for prediction, no correlation may exist between the predicted and observed activities for the test set 49,14. Bootstrapping Bootstrapping is another method of internal validation where samples are selected randomly from the data set. In the simplest form of bootstrapping, instead of repeatedly analyzing subsets of the data, sub samples of the data are repeatedly analyzed. Each sub sample is a random sample with replacement from the full sample. In a typical bootstrap validation, K groups of size n are generated by a repeated random selection of n objects from the original data set. Some of these objects can be included in the same random sample several times, whereas other objects may never be selected. The model obtained from n randomly selected objects is used to predict the target properties for the excluded samples. A high average Q in the bootstrap validation is a demonstration of the model robustness. Randomization test (Scrambling model) The predictive power of the equation is poor when the observations are not sufficiently independent of each other. One way to test for this is by randomization of the dependent variables. This procedure ensures that the model is not due to a chance. The set of activity values is reassigned randomly to different molecules, and repeating the entire modeling procedure. This process is repeated many times. If the random models activity prediction is comparable to the original equation, the set of observations is not sufficient to support the model. The creation of a Scrambled Model 39,40 is a unique method of checking the descriptors used in the model because the bioactivities are randomized ensuring the new model is created from a bogus data set. The basis for this method is to test the validity of the original QSAR model and to ensure that the selected descriptors are appropriate. These new models (Scram-models) are created using the same descriptors as the original model, yet the bioactivities are changed. After each Scram-model is created, validation is performed using the methods mentioned earlier. To ensure that the Scram-models are truly random, the process of changing the bioactivities can be repeated and as each new Scram-model is created its R and Q values. Each time the R and Q values of the Scram-models are substantially lower further enforces that the true QSAR model is sound. The basis of using this method is to validate the original QSAR model because the Scrammodels are created using the original descriptors and bogus bioactivities. The model would be in question if there was a strong correlation (R > 0.50) 50 between the randomized bioactivities and the predicted bioactivities, specifically that the model is not responsive to the bioactivities. External validation Several authors have suggested that the only way to estimate the true predictive power of a QSAR model is to compare the predicted and observed activities of an (sufficiently large) external test set of compounds that were not used in the model development 49,50, The problem in external validation is how can we select the training and test set? Roy et al. clearly discussed that how we can solve this problem in one of their article. To estimate the predictive power of a QSAR model, Golbraikh and Tropsha recommended use of the following statistical characteristics of the test set 14 : (i) correlation coefficient R between the predicted and observed activities; (ii) coefficients of determination (R ) [54] (predicted vs. observed activities r 0, and observed vs. predicted activities r 0 '); (iii) slopes k and k' of the regression lines through the origin. They consider a QSAR

6 516 International Journal of Drug Design and Discovery Volume Issue 3 July September 011 model is predictive, if the following conditions are satisfied 14 : R pred > 0.6, r r 0 / r < 0.1, r, r 0 / r < 0.1 and 0.85 < k < 1.15 or 0.85 < k < The predictive ability of the selected model was also confirmed by external R pred. A value of R pred is greater than 0.6 may be taken as an indicator of good external predictability. test i1 Pr ed test R 1 i1 (y y ) exp exp pred (y y ) Where y tr is the average value for the dependent variable for the training set. The lack of the correlation between Q and R was noted in, Kubinyi et al. 49, Novellino et al. 51, Norinder 5, and Golbraikh and Tropsha 14, publication, where they demonstrated that all of the above-mentioned criteria are necessary to adequately assess the predictive ability of a QSAR model. Norinder suggest 5 that the external test set must contain at least five compounds, representing the whole range of both descriptor and activities of compounds included into the training set. Recently the use of internal versus external validation has been a matter of great debate 55. One group of QSAR workers supports internal validation, while the other group considers that internal validation is not a sufficient test for checking robustness of the models and external validation must be done. Hawkins et al., the major group of supporters of internal validation, are of the opinion that cross-validation is able to assess the model fit and to check whether the predictions will carry over to fresh data not used in the model fitting exercise. They have argued that when the sample size is small, holding a portion of it back for testing is wasteful and it is much better to use computationally more burdensome leave-one-out crossvalidation 56,57. Recent term to check the external predictability of the selected model is r m, which was proposed by Roy and Paul (008) [58] and it was calculated by the following formula m 0 r r 1 r r Where r is squared correlation coefficient between observed and predicted values and r 0 is squared correlation coefficient between observed and predicted tr values with intercept value set to zero. A value of r m is greater than 0.5 may be taken as an indicator of good external predictability. Unlike external validation parameters (R pred etc.), the r m (overall) statistic is not based only on limited number of test set compounds. It includes prediction for both test set and training set (using LOO predictions) compounds. Thus, this statistic is based on prediction of comparably large number of compounds. The r m (overall) statistic may be advantageous when the test set size is considerably small and regression based external validation parameter may be less reliable and highly dependent on individual test set observations. The r m (overall) statistic may be used for selection of the best predictive models from among comparable models are obtained, where some models show comparatively better internal validation parameters and some other models show comparatively superior external validation parameters. Other validation parameter named as R p to check the acceptability of the selected model has been reported by Roy (007) 55. The parameter R p, which penalize the model R for the difference between squared mean correlation coefficient (R r) of the randomized models and squared correlation coefficient (R ) of the non-randomized model. A value of R p should be greater than 0.5 may be taken as an indicator of model acceptability. R R * 1 R R p r The value of r m (overall) determines whether the range of predicted activity values for the whole dataset of molecules are really close to the observed activity or not (best predictive model or not). The value of R p, on the divergent, determines whether the model obtained is really robust or obtained as a result of chance only. Hence it can be inferred that a QSAR model can be considered acceptable if the values of r m (overall) and R p are equal to or above 0.5 (or at least near 0.5). The selection of robust and well predictive QSAR models on the basis of R, Q and R pred may mislead the search for the ideally predictive model so it may also be done on the basis of few other parameters, such as R CVext, r - r 0 / r, r - r 0 / r, k, k, r m (overall) and R p. When can we accept the developed QSAR model as reliable and predictive one? A developed QSAR model can be accepted generally in QSAR (MLR and PLS) studies when it can satisfy the following criterion (The following values are the minimum recommended values for significant QSAR model

7 Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 517 meanwhile these evaluation measures are depend on the response measure scale or measure unit): If correlation coefficient R 0.8 (for in vivo data). If coefficient of determination R 0.6 If the standard deviation s is not much larger than standard deviation of the biological data. If its F value indicate that overall significance level is better than 95%. If its confidence interval of all individual regression coefficients proves that they are justified at the 95% significance level. If cross-validated R (Q ) > 0.5 If R for external test set, R pred > 0.6 Randomized R value should be as low as to R. Randomized Q value should be as low as to Q. (r r 0)/r < 0.1 and 0.85 < = k < = 1.15, or (r r' 0) / r < 0.1 and 0.85 < = k' < = 1.15 (for test set). r m (overall) and R p are 0.5 (or at least near 0.5). In addition, the biological data should cover a range of at least one, two or even more logarithmic units: they should be well distributed over whole distance. Also, physicochemical parameter should be spread over a certain range and should be more or less evenly distributed. Equation has to be rejected If the above mentioned statistical measures are not satisfied If the number of the variables in the regression equation is unreasonably large. If standard deviation is smaller than error in the biological data. Applicability domain Activity of the entire universe of chemicals can not be predicted even by a robust and validated QSAR model. The prediction of a modeled response using QSAR is valid only if the compound being predicted is within the applicability domain of the model. The applicability domain is a theoretical region of the chemical space, defined by the model descriptors and modeled response and, thus, by the nature of the training set molecules 59. It is possible to check whether a new chemical lies within applicability domain using the leverage approach. A compound will be considered outside the applicability domain when the leverage value is higher than the critical value of 3p/n, where p is the number of model variables plus 1 and n is the number of objects used to develop the model. Recently Jaworska et al. 60 reviewed the methods and criteria for estimating applicability domain through training set interpolation based on range, distance, geometrical and probability density distribution based approaches. A cluster-based approach have been proposed by Stanforth et al. 61 to modeling the domain of applicability by applying an intelligent version of the K- means clustering algorithm, modeling the training set as a collection of clusters in the descriptors space, assigning a test compound fuzzy membership of each individual cluster from which an overall distance may be calculated. Guha and Jurs 50 have used a classification method that divides the regression residuals from a previously generated model into a good class and a bad class and builds a classifier based on the division. The trained classifier is then used to determine the class of the residual of a new compound. Dispute on QSAR Validation Hawkins et al., the major group of supporters of internal validation, are of the opinion that cross-validation is able to assess the model fit and to check whether the predictions will carry over to fresh data not used in the model fitting exercise. They have argued that when the sample size is small, holding a portion of it back for testing is wasteful and it is much better to use computationally more burdensome leave-one-out cross-validation 56,57. An inconsistency between internal and external predictivity was reported in a few QSAR studies 51,5,6. It was reported that, in general, there is no relationship between internal and external predictivity 63,64 : high internal predictivity may result in low external predictivity and vice versa. In many cases, comparable models are obtained where some models show comparatively better internal validation parameters and some other models show comparatively superior external validation parameters. This may create a problem in selecting the final model. So it is must to develop some good validation techniques to overcome the entire above mentioned disputes. Conclusion Validation of QSAR models is a very important aspect to understand reliability of model for prediction of a new compound not present in the data set. If we consider 1000 reported QSAR models, out of which only 50 to 60 models are really predictive but its not sure that these 60 models have been obeyed all the conditions and validation parameters discussed in this articles. Our opinion is, both internal and external validation strategies are important and, in fact, one should adopt all available validation strategies to check robustness of the model. Only few

8 518 International Journal of Drug Design and Discovery Volume Issue 3 July September 011 reported QSAR models were following all the validation characteristics mentioned in this article 65,66. As a conclusion, not only the above recommendations for validation of QSAR models by different scientist and researcher should be followed, also the chemical space of training and test sets has to be analyzed; real outliers, with respect to congeneric character and structural similarity, have to be discovered and eliminated. Even then, predictions by QSAR models are remains as a risky procedure. So, still we need proper validation techniques to select a good predictive QSAR models. References [1] Von, P.C.; Kuhne, R.; Ebert, R.U.; Altenburger, R.; Liess, M.; Schuurmann, G. Chem. Res. Toxicol. 005, 18, [] enterprise/reach/overview.htm [3] Combes, R.; Balls, M.; Bansil, L. ATLA. 1991, 30, [4] Seidel, J.K.; Schaper, K.J. Chemical structure and biological activity, Verlag Chemie Weinheim: New York, 1979, [5] Hansch, C. In Comprehensive Medicinal Chemistry, Ramsden, C. A. Ed.; Pergamon Press: New York, 1990; Vol. 4, pp 5-8. [6] Livingstone, D.J. Predicting Chemical Toxicity and Fate. CRC Press LLC: Boca Raton, FL, 004, [7] Winkler, D.A. Briefing Bioinformat. 00, 3, [8] Albuquerque, M.G.; Hopfinger, A.J.; Barreiro, E.J.; De Alencastro, R.B. J. Chem. Inf. Comput. Sci. 1998, 38, [9] Santos-Filho, O.; Hopfinger, A.; J. Comput-Aided Mol. Des. 001, 15, 1-1. [10] Ravi, M.; Hopfinger, A.; Hormann, R.; Dinan, L. J. Chem. Inf. Comput. Sci. 001, 41, [11] Krasowski, X. Hong, A. Hopfinger, N. Harrison. J. Med. Chem. 45 (00) [1] Hong, M. X.; Hopfinger, A. J. Chem. Inf. Comput. Sci. 003, 43, [13] Tetko, I.V.; Sushko, I.; Pandey, A.K.; Zhu, H.; Tropsha, A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.; Varnek, A. J. Chem. Inf. Model. 008, 48, [14] Golbraikh, A.; Tropsha, A. J. Mol. Graphics Mod. 00, 0, [15] Tropsha, A.; Gramatica, P.; Gombar, V.K. QSAR Comb. Sci. 003,, [16] Tong, W.; Xie, Q.; Hong, H.; Shi, L.; Fang, H.; Perkins, R. Environ. Health Perspect. 004, 11, [17] Aptula, A.O.; Jeliazkova, N.G.; Schultz, T.W.; Cronin, M.T.D. QSAR Comb.Sci. 005, 4, [18] He, L.; Jurs, P.C. J. Mol. Graphics Mod. 005, 3, [19] Ghafourian, T.; Cronin, M.T.D. SAR QSAR Environ. Res. 005, 16, [0] Roy, K.; Leonard, J.T. QSAR Comb. Sci. 006, 5, [1] Kolossov, E.; Stanforth, R. SAR and QSAR Environ. Res. 007, 18, [] Roy, P.P.; Roy, K. QSAR Comb. Sci. 008, 7, [3] Walker, J.D.; Jawrska, J.; Comber, J.H.I.; Schultz, T.W.; Dearden, J.C. Environ. Toxicol. Chem. 003,, [4] Roy, P.P.; Leonard, J.T.; Roy, K. Chemom. Intell. Lab. Sys. 008, 90, [5] Tong, W.; Hong, H.; Xie, Q.; Shi, L.; Fang, H.; Perkins, R. Curr. Comput. Aided Drug Des. 005, 1, [6] He, L.; Jurs, P.C. J. Mol. Graph. Mod. 005, 3, [7] Ghafourian, T.; Cronin, M.T.D. SAR QSAR Environ. Res. 005, 16, [8] Balls, M. ; Blaauboer, B.J. ; Fentem, J.H. ATLA. 1995, 3, [9] McKinney, J.D.; Richard, A.; Waller, C.; Newman, M.C.; Gerberick, F. Toxicol. Sci. 000, 56, [30] Tong, W.; Welsh, W.J.; Shi, L.; Fang, H.; Perkins, R. Environ. Toxicol. Chem. 003,, [31] Worth, A.P.; Cronin, M.T.D. ATLA. 004, 3, [3] Cronin, M.T.D.; Jaworska, J.S.; Walker, J.D.; Comber, M.H.I. Watts, C.D.; Worth, A.P. Environ. Health Perspect. 003, 111, [33] M.T.D. Cronin, J.S. Jaworska, J.D. Walker, M.H.I. Comber, C.D. Watts, A.P. Worth. Environ. Health Persp. 111 (003) [34] Worth, A.P.; Leeuwen, C.J.; Hartung, T. SAR QSAR Environ. Res. 15 (004) [35] A.P. Worth, T. Hartung, C.J. Leeuwen. SAR QSAR Environ. Res. 15 (004)

9 Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 519 [36] A.R. Leach. Molecular modeling: Principles and applications. Pearson Education Ltd. Harlow, England, 001. [37] Shao, J. J. Am. Stat. Assoc. 1993, 88, [38] Besal, E. J. Math. Chem. 001, 9, [39] Wold, S.; Ericksson, L. Partial least squares projections to latent structures (PLS) in chemistry. In Encyclopedia of computationalchemistry, Ragu & Schleyer, P. (ed.), John Wiley & Sons, Chichester, 1998, Vol. 3, [40] Yasri, A.; Hartsough, D. J. Chem. Inf. Comput. Sci. 001, 41, [41] Schneider, G.; Wrede, P. Prog. Biophys. Mol. Biol. 1998, 70, 175. [4] Gironbs, X.; Gallegos, A.; Ramon, C.D. J. Chem Inf Comput. Sci. 000, 46, [43] Bordhs, B.; Kijmives, T.; Szant, Z.; Lopata, A. J. Agric. Food Chem. 000, 48, [44] Fan, Y.; Shi, L.M.; Kohn, K. W.; Pommier, Y.; Weinstein, J.N. J. Med. Chem. 001, 44, [45] Randic, M.; Basak, S.C. J. Chem. Inf. Comput.Sci. 000, 40, [46] Suzuki, T.; Ide, K.; Ishida, M.; Shapiro, S. J. Chem. Inf. Comput. Sci. 001, 41, [47] Recanatini, M.; Cavalli, A.; Belluti, F.; Piazzi, L.; Rarnpa, A.; Bisi, A.; Gobbi, S.; Valenti, P.; Andrisano, V. ; Bartolini, M.; Cavrini, V. J. Med. Chem. 000, 43, [48] Morbn, J. A.; Campillo, M.; Perez, V.; Unzeta, M.; Pardo, L. J. Med. Chem. 000, 43, [49] Kubinyi, H.; Hamprecht, F.A.; Mietzner, T. J. Med. Chem. 1998, 41, [50] Guha, R.; Jurs, P.C. J. Chem. Inf. Model. 005, 45, [51] Novellino, E.; Fattorusso, C.; Greco, G. Pharm. Acta Helv. 1995, 70, [5] Norinder, U. J. Chemom. 1996, 10, [53] Zefirov, N.S.; Palyulin, V.A. J. Chem. Inf.Comput. Sci. 001, 41, [54] Sachs, L. Applied Statistics: A Handbook of Techniques, Springer-Verlag, BerlirdNew York, [55] Roy, K. Expert Opin. Drug Discov. 007,, [56] Hawkins, D.M.; Basak, S.C.; Mills, D. J. Chem. Inf.Comput. Sci. 003, 43, [57] Hawkins, D.M. J. Chem. Inf. Comput. Sci. 004, 44, 1-1. [58] Roy, K.; Paul, S. QSAR Comb. Sci. 008, 8, [59] Atkinson, A.C. Plots, Transformations and Regression. Clarendon Press, Oxford, UK (1985). [60] Jaworska, J.; Nikolovzjeliazkova, N.; Aldenberg, T. Altern. Lab. Anim. 005, 33, [61] Stanforth, R.W.; Kolossov, E.; Mirkin, B. QSAR Comb. Sci. 007, 6, [6] Kubinyi, H. A general view on similarity and QSAR studies. In: Computer-Assisted [63] Lead Finding and Optimization, van de Waterbeemd H, Testa B, Folkers G (Eds), VHChA and VCH, Basel, Weinheim. 1997, 9-8. [64] Kubinyi, H.; Hamprecht, F.A.; Mietzner, T. J. Med. Chem. 1998, 41, [65] Ravichandran, V.; Shalini, S.; Sundram, K.M.; Dhanaraj, S.A. Eurp. J. Med. Chem. 010, 45, [66] Roy, P.P.; Paul, S.; Indrani, M.; Roy, K. Molecules. 009, 14,

REVIEW ON: QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) MODELING

REVIEW ON: QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) MODELING REVIEW ON: QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) MODELING Umma Muhammad 1, Adamu Uzairu and David Ebuka Arthur 1 Department of Pre-nd Sci& Tech, School of General Studies,Kano State Polytechnic.

More information

Validation and Predictivity of QSAR Models

Validation and Predictivity of QSAR Models Validation and Predictivity of QSAR Models Hugo Kubinyi, Germany E-Mail kubinyi@t-online.de http://home.t-online.de/ home/kubinyi 15th EURO-QSAR Symposium Istanbul, Turkey, Sept. 2004 Validation of QSAR

More information

Structure-Activity Modeling - QSAR. Uwe Koch

Structure-Activity Modeling - QSAR. Uwe Koch Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity

More information

Prediction of Acute Toxicity of Emerging Contaminants on the Water Flea Daphnia magna by Ant Colony Optimization - Support Vector Machine QSTR models

Prediction of Acute Toxicity of Emerging Contaminants on the Water Flea Daphnia magna by Ant Colony Optimization - Support Vector Machine QSTR models Electronic Supplementary Material (ESI) for Environmental Science: Processes & Impacts. This journal is The Royal Society of Chemistry 017 Prediction of Acute Toxicity of Emerging Contaminants on the Water

More information

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships QSAR/QSPR modeling Alexandre Varnek Faculté de Chimie, ULP, Strasbourg, FRANCE QSAR/QSPR models Development Validation

More information

LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/ Deliverable Report. Deliverable Name and Number

LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/ Deliverable Report. Deliverable Name and Number LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/000435 Deliverable Report Deliverable Name and Number Deliverable 2 Report on the identified criteria for non-testing methods, and their scores Deliverable

More information

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available QMRF identifier (JRC Inventory):To be entered by JRC QMRF Title: Insubria QSAR PaDEL-Descriptor model for prediction of NitroPAH mutagenicity. Printing Date:Jan 20, 2014 1.QSAR identifier 1.1.QSAR identifier

More information

Detecting bad regression models: multicriteria fitness functions in regression analysis

Detecting bad regression models: multicriteria fitness functions in regression analysis Analytica Chimica Acta 515 (4) 199 8 Detecting bad regression models: multicriteria fitness functions in regression analysis Roberto Todeschini, Viviana Consonni, Andrea Mauri, Manuela Pavan Milano Chemometrics

More information

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of

More information

Quantitative Structure-Activity Relationship (QSAR) computational-drug-design.html

Quantitative Structure-Activity Relationship (QSAR)  computational-drug-design.html Quantitative Structure-Activity Relationship (QSAR) http://www.biophys.mpg.de/en/theoretical-biophysics/ computational-drug-design.html 07.11.2017 Ahmad Reza Mehdipour 07.11.2017 Course Outline 1. 1.Ligand-

More information

1. Introduction. Central European Journal of Chemistry

1. Introduction. Central European Journal of Chemistry Cent. Eur. J. Chem. 9(1) 2011 165-174 DOI: 10.2478/s11532-010-0135-7 Central European Journal of Chemistry a tool for QSAR-modeling of carcinogenicity: an unexpected good prediction based on a model that

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

Statistical concepts in QSAR.

Statistical concepts in QSAR. Statistical concepts in QSAR. Computational chemistry represents molecular structures as a numerical models and simulates their behavior with the equations of quantum and classical physics. Available programs

More information

Exploring the black box: structural and functional interpretation of QSAR models.

Exploring the black box: structural and functional interpretation of QSAR models. EMBL-EBI Industry workshop: In Silico ADMET prediction 4-5 December 2014, Hinxton, UK Exploring the black box: structural and functional interpretation of QSAR models. (Automatic exploration of datasets

More information

RSC Publishing. Principles and Applications. In Silico Toxicology. Liverpool John Moores University, Liverpool, Edited by

RSC Publishing. Principles and Applications. In Silico Toxicology. Liverpool John Moores University, Liverpool, Edited by In Silico Toxicology Principles and Applications Edited by Mark T. D. Cronin and Judith C. Madden Liverpool John Moores University, Liverpool, UK RSC Publishing Contents Chapter 1 In Silico Toxicology

More information

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.

More information

Comparison of Different Approaches to Define the Applicability Domain of QSAR Models

Comparison of Different Approaches to Define the Applicability Domain of QSAR Models Molecules 2012, 17, 4791-4810; doi:10.3390/molecules17054791 Article OPEN ACCESS molecules ISSN 1420-3049 www.mdpi.com/journal/molecules Comparison of Different Approaches to Define the Applicability Domain

More information

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database OECD QSAR Toolbox v.3.3 Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database Outlook Background The exercise Workflow Save prediction 23.02.2015

More information

1.QSAR identifier 1.1.QSAR identifier (title): Nonlinear QSAR: artificial neural network for acute oral toxicity (rat 1.2.Other related models:

1.QSAR identifier 1.1.QSAR identifier (title): Nonlinear QSAR: artificial neural network for acute oral toxicity (rat 1.2.Other related models: QMRF identifier (ECB Inventory):Q17-10-1-297 QMRF Title: Nonlinear QSAR: artificial neural network for acute oral toxicity (rat cell line) Printing Date:Mar 30, 2011 1.QSAR identifier 1.1.QSAR identifier

More information

International Journal of Pharma and Bio Sciences V1(1)2010. Abhilash M

International Journal of Pharma and Bio Sciences V1(1)2010. Abhilash M Abhilash M Department of Biotechnology,The Oxford college of Engineering,Bangalore,INDIA. Corresponding Author abhibiotek@gmail.com Abstract Quantitative structure activity relationship (QSAR) represents

More information

JCICS Major Research Areas

JCICS Major Research Areas JCICS Major Research Areas Chemical Information Text Searching Structure and Substructure Searching Databases Patents George W.A. Milne C571 Lecture Fall 2002 1 JCICS Major Research Areas Chemical Computation

More information

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods Subho Majumdar School of Statistics, University of Minnesota Envelopes in Chemometrics August 4, 2014 1 / 23 Motivation

More information

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling The Lady Tasting Tea More Predictive Modeling R. A. Fisher & the Lady B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea Fisher was skeptical that she could distinguish

More information

Course Specification Medicinal Chemistry-1

Course Specification Medicinal Chemistry-1 Quality Assurance Unit Faculty of Pharmacy-Assiut University Course Specification Medicinal Chemistry-1 Programme(s) on which the course is given: B. Sc. (Pharmaceutical science) Major or Minor element

More information

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs William (Bill) Welsh welshwj@umdnj.edu Prospective Funding by DTRA/JSTO-CBD CBIS Conference 1 A State-wide, Regional and National

More information

Biological Read-Across: Species-Species and Endpoint- Endpoint Extrapolation

Biological Read-Across: Species-Species and Endpoint- Endpoint Extrapolation Biological Read-Across: Species-Species and Endpoint- Endpoint Extrapolation Mark Cronin School of Pharmacy and Chemistry Liverpool John Moores University England m.t.cronin@ljmu.ac.uk Integrated Testing

More information

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov CADD Group Chemical Biology Laboratory Frederick National Laboratory for Cancer Research National Cancer Institute, National Institutes

More information

Quantitative structure activity relationship and drug design: A Review

Quantitative structure activity relationship and drug design: A Review International Journal of Research in Biosciences Vol. 5 Issue 4, pp. (1-5), October 2016 Available online at http://www.ijrbs.in ISSN 2319-2844 Research Paper Quantitative structure activity relationship

More information

Qsar study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using different chemometrics tools

Qsar study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using different chemometrics tools Qsar study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using different chemometrics tools RAZIEH SABET, MOHSEN SHAHLAEI, AFSHIN FASSIHI a Department of Medicinal Chemistry,

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Regulatory use of (Q)SARs under REACH

Regulatory use of (Q)SARs under REACH Regulatory use of (Q)SARs under REACH Webinar on Information requirements 10 December 2009 http://echa.europa.eu 1 Using (Q)SAR models Application under REACH to fulfill information requirements Use of

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

LINEAR REGRESSION MODELS OF MOULTING ACCELERATING COMPOUNDS WITH INSECTICIDE ACTIVITY AGAINST SILKWORM BOMBYX MORI L. 1

LINEAR REGRESSION MODELS OF MOULTING ACCELERATING COMPOUNDS WITH INSECTICIDE ACTIVITY AGAINST SILKWORM BOMBYX MORI L. 1 LINEAR REGRESSION MODELS OF MOULTING ACCELERATING COMPOUNDS WITH INSECTICIDE ACTIVITY AGAINST SILKWORM BOMBYX MORI L. 1 Simona Funar-Timofei Timofei*, Alina Bora, Luminita Crisan, Ana Borota Institute

More information

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak

More information

QSAR MODELS FOR PREDICTING TOXICITIES OF MICROCYSTINS IN CYANOBACTERIA USING GETAWAY DESCRIPTORS

QSAR MODELS FOR PREDICTING TOXICITIES OF MICROCYSTINS IN CYANOBACTERIA USING GETAWAY DESCRIPTORS QSAR MODELS FOR PREDICTING TOXICITIES OF MICROCYSTINS IN CYANOBACTERIA USING GETAWAY DESCRIPTORS 2 Alex A. Tardaguila, 2 Jennifer C. Sy, and 1 Eric R. Punzalan 1 Chemistry Department, De La Salle University,

More information

OECD QSAR Toolbox v.3.4

OECD QSAR Toolbox v.3.4 OECD QSAR Toolbox v.3.4 Step-by-step example on how to predict the skin sensitisation potential approach of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

What is a property-based similarity?

What is a property-based similarity? What is a property-based similarity? Igor V. Tetko (1) GSF - ational Centre for Environment and Health, Institute for Bioinformatics, Ingolstaedter Landstrasse 1, euherberg, 85764, Germany, (2) Institute

More information

Priority Setting of Endocrine Disruptors Using QSARs

Priority Setting of Endocrine Disruptors Using QSARs Priority Setting of Endocrine Disruptors Using QSARs Weida Tong Manager of Computational Science Group, Logicon ROW Sciences, FDA s National Center for Toxicological Research (NCTR), U.S.A. Thanks for

More information

Chemical Space: Modeling Exploration & Understanding

Chemical Space: Modeling Exploration & Understanding verview Chemical Space: Modeling Exploration & Understanding Rajarshi Guha School of Informatics Indiana University 16 th August, 2006 utline verview 1 verview 2 3 CDK R utline verview 1 verview 2 3 CDK

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

1.3.Software coding the model: QSARModel Molcode Ltd., Turu 2, Tartu, 51014, Estonia

1.3.Software coding the model: QSARModel Molcode Ltd., Turu 2, Tartu, 51014, Estonia QMRF identifier (ECB Inventory):Q8-10-14-176 QMRF Title: QSAR for acute oral toxicity (in vitro) Printing Date:Mar 29, 2011 1.QSAR identifier 1.1.QSAR identifier (title): QSAR for acute oral toxicity (in

More information

Molecular Descriptors Family on Structure Activity Relationships 5. Antimalarial Activity of 2,4-Diamino-6-Quinazoline Sulfonamide Derivates

Molecular Descriptors Family on Structure Activity Relationships 5. Antimalarial Activity of 2,4-Diamino-6-Quinazoline Sulfonamide Derivates Leonardo Journal of Sciences ISSN 1583-0233 Issue 8, January-June 2006 p. 77-88 Molecular Descriptors Family on Structure Activity Relationships 5. Antimalarial Activity of 2,4-Diamino-6-Quinazoline Sulfonamide

More information

OECD QSAR Toolbox v.3.3

OECD QSAR Toolbox v.3.3 OECD QSAR Toolbox v.3.3 Step-by-step example on how to predict the skin sensitisation potential of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific Aims Read

More information

Medicinal Chemist s Relationship with Additivity: Are we Taking the Fundamentals for Granted?

Medicinal Chemist s Relationship with Additivity: Are we Taking the Fundamentals for Granted? Medicinal Chemist s Relationship with Additivity: Are we Taking the Fundamentals for Granted? J. Guy Breitenbucher Streamlining Drug Discovery Conference San Francisco, CA ct. 25, 2018 Additivity as the

More information

experiment3 Introduction to Data Analysis

experiment3 Introduction to Data Analysis 63 experiment3 Introduction to Data Analysis LECTURE AND LAB SKILLS EMPHASIZED Determining what information is needed to answer given questions. Developing a procedure which allows you to acquire the needed

More information

International Journal of Chemistry and Pharmaceutical Sciences

International Journal of Chemistry and Pharmaceutical Sciences Jain eha et al IJCPS, 2014, Vol.2(10): 1203-1210 Research Article ISS: 2321-3132 International Journal of Chemistry and Pharmaceutical Sciences www.pharmaresearchlibrary.com/ijcps Efficient Computational

More information

International Journal of Research and Development in Pharmacy and Life Sciences. Review Article

International Journal of Research and Development in Pharmacy and Life Sciences. Review Article International Journal of Research and Development in Pharmacy and Life Sciences Available online at http//www.ijrdpl.com October - November, 2012, Vol. 1, No.4, pp 167-175 ISSN: 2278-0238 Review Article

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

Practical QSAR and Library Design: Advanced tools for research teams

Practical QSAR and Library Design: Advanced tools for research teams DS QSAR and Library Design Webinar Practical QSAR and Library Design: Advanced tools for research teams Reservationless-Plus Dial-In Number (US): (866) 519-8942 Reservationless-Plus International Dial-In

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites. J. Andrew Surface

Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites. J. Andrew Surface Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites Introduction J. Andrew Surface Hampden-Sydney College / Virginia Commonwealth University In the past several decades

More information

Quantitative Structure Activity Relationships: An overview

Quantitative Structure Activity Relationships: An overview Quantitative Structure Activity Relationships: An overview Prachi Pradeep Oak Ridge Institute for Science and Education Research Participant National Center for Computational Toxicology U.S. Environmental

More information

Applied Regression Modeling

Applied Regression Modeling Applied Regression Modeling A Business Approach Iain Pardoe University of Oregon Charles H. Lundquist College of Business Eugene, Oregon WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION CONTENTS

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

Bayesian Modeling and Classification of Neural Signals

Bayesian Modeling and Classification of Neural Signals Bayesian Modeling and Classification of Neural Signals Michael S. Lewicki Computation and Neural Systems Program California Institute of Technology 216-76 Pasadena, CA 91125 lewickiocns.caltech.edu Abstract

More information

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.4.1 OECD QSAR Toolbox v.4.1 Step-by-step example on how to predict the skin sensitisation potential approach of a chemical by read-across based on an analogue approach Outlook Background Objectives Specific

More information

Sorana-Daniela BOLBOACĂ a, Lorentz JÄNTSCHI b. Abstract

Sorana-Daniela BOLBOACĂ a, Lorentz JÄNTSCHI b. Abstract Computer-Aided Chemical Engineering, ISSN 1570-7946, 24(2007) 2007 Elsevier B.V./Ltd. All rights reserved. 965 Modelling the Inhibition Activity on Carbonic Anhydrase I of Some Substituted Thiadiazole-

More information

Prentice Hall Mathematics, Geometry 2009 Correlated to: Connecticut Mathematics Curriculum Framework Companion, 2005 (Grades 9-12 Core and Extended)

Prentice Hall Mathematics, Geometry 2009 Correlated to: Connecticut Mathematics Curriculum Framework Companion, 2005 (Grades 9-12 Core and Extended) Grades 9-12 CORE Algebraic Reasoning: Patterns And Functions GEOMETRY 2009 Patterns and functional relationships can be represented and analyzed using a variety of strategies, tools and technologies. 1.1

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction PostDoc Journal Vol. 2, No. 3, March 2014 Journal of Postdoctoral Research www.postdocjournal.com QSAR Study of Thiophene-Anthranilamides Based Factor Xa Direct Inhibitors Preetpal S. Sidhu Department

More information

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification Paper SP08 Estimating terminal half life by non-compartmental methods with some data below the limit of quantification Jochen Müller-Cohrs, CSL Behring, Marburg, Germany ABSTRACT In pharmacokinetic studies

More information

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Quantitative Structure Activity Relationships: Linear Regression Modelling and Validation Strategies by Example

Quantitative Structure Activity Relationships: Linear Regression Modelling and Validation Strategies by Example Quantitative Structure Activity Relationships: Linear Regression Modelling and Validation Strategies by Example Sorana D. Bolboacă & Lorentz Jäntschi OUTLINE OBJECTIVES QUANTITATIVE STRUCTURE ACTIVITY

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Tackling Statistical Uncertainty in Method Validation

Tackling Statistical Uncertainty in Method Validation Tackling Statistical Uncertainty in Method Validation Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

More information

QSAR STUDY OF ANTIMALARIA OF XANTHONE DERIVATIVES USING MULTIPLE LINEAR REGRESSION METHODS

QSAR STUDY OF ANTIMALARIA OF XANTHONE DERIVATIVES USING MULTIPLE LINEAR REGRESSION METHODS PROCEEDING OF 3 RD INTERNATIONAL CONFERENCE ON RESEARCH, IMPLEMENTATION AND EDUCATION OF MATHEMATICS AND SCIENCE YOGYAKARTA, 16 17 MAY 2016 QSAR STUDY OF ANTIMALARIA OF XANTHONE DERIVATIVES USING MULTIPLE

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Notes of Dr. Anil Mishra at 1

Notes of Dr. Anil Mishra at   1 Introduction Quantitative Structure-Activity Relationships QSPR Quantitative Structure-Property Relationships What is? is a mathematical relationship between a biological activity of a molecular system

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Big Data Analysis with Apache Spark UC#BERKELEY

Big Data Analysis with Apache Spark UC#BERKELEY Big Data Analysis with Apache Spark UC#BERKELEY This Lecture: Relation between Variables An association A trend» Positive association or Negative association A pattern» Could be any discernible shape»

More information

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt ACS Spring National Meeting. COMP, March 16 th 2016 Matthew Segall, Peter Hunt, Ed Champness matt.segall@optibrium.com Optibrium,

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Interpreting Computational Neural Network QSAR Models: A Detailed Interpretation of the Weights and Biases

Interpreting Computational Neural Network QSAR Models: A Detailed Interpretation of the Weights and Biases 241 Chapter 9 Interpreting Computational Neural Network QSAR Models: A Detailed Interpretation of the Weights and Biases 9.1 Introduction As we have seen in the preceding chapters, interpretability plays

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.1 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

1.QSAR identifier 1.1.QSAR identifier (title): QSAR for female rat carcinogenicity (TD50) of nitro compounds 1.2.Other related models:

1.QSAR identifier 1.1.QSAR identifier (title): QSAR for female rat carcinogenicity (TD50) of nitro compounds 1.2.Other related models: QMRF identifier (ECB Inventory): QMRF Title: QSAR for female rat carcinogenicity (TD50) of nitro compounds Printing Date:Feb 16, 2010 1.QSAR identifier 1.1.QSAR identifier (title): QSAR for female rat

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration Chris Luscombe, Computational Chemistry GlaxoSmithKline Summary of Talk Traditional approaches SAR Free-Wilson

More information

MultiCASE CASE Ultra model for severe skin irritation in vivo

MultiCASE CASE Ultra model for severe skin irritation in vivo MultiCASE CASE Ultra model for severe skin irritation in vivo 1. QSAR identifier 1.1 QSAR identifier (title) MultiCASE CASE Ultra model for severe skin irritation in vivo, Danish QSAR Group at DTU Food.

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS Richard Brereton r.g.brereton@bris.ac.uk Pattern Recognition Book Chemometrics for Pattern Recognition, Wiley, 2009 Pattern Recognition Pattern Recognition

More information

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS 159 CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS 6.1 INTRODUCTION The purpose of this study is to gain on insight into structural features related the anticancer, antioxidant

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds Dr James Chisholm,* Dr John Barnard, Dr Julian Hayward, Dr Matthew Segall*, Mr Edmund Champness*, Dr Chris Leeding,* Mr Hector

More information

Uncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error

Uncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error Uncertainty, Error, and Precision in Quantitative Measurements an Introduction Much of the work in any chemistry laboratory involves the measurement of numerical quantities. A quantitative measurement

More information

Relations in epidemiology-- the need for models

Relations in epidemiology-- the need for models Plant Disease Epidemiology REVIEW: Terminology & history Monitoring epidemics: Disease measurement Disease intensity: severity, incidence,... Types of variables, etc. Measurement (assessment) of severity

More information