In Search of Desirable Compounds

Size: px
Start display at page:

Download "In Search of Desirable Compounds"

Transcription

1 In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Abhyuday Mandal University of Georgia Kjell Johnson Arbor Analytics, LLC Abstract Drug discovery scientific teams evaluate a compound s performance across a number of endpoints, often including its ability to bind to target(s), as well as its absorption, distribution, metabolism, excretion, and safety characteristics. A popular way to prioritize compounds is through desirability scoring across the selected endpoints. In addition to desirability scores, drug discovery teams can measure or compute many other compound descriptors that are thought to be related to the endpoints of interest. Scientists would like to identify a small number of such descriptors which can be tweaked to improve the performance of a compound. To that end, one primary objective of these teams is to identify the most crucial descriptors and relationships among the descriptors that impact compound desirability. Finding these relationships can help teams understand relationships between predictors and compounds desirability, and can possibly suggest molecular alterations that could improve compound desirability. In this work, we propose a novel method based on the desirability scoring framework for identifying important descriptors. We also explore ways to visualize high dimensional predictor spaces that can effectively inform teams decisions in synthesizing new compounds. Keywords: Importance frequency, Desirability Scores, Drug Discovery, Pharmaceutical Chemistry, Variable Prioritization. 1 Introduction Due to a number of technological advances, drug discovery research has rapidly changed over the past few decades. Traditionally, new molecular entities were painstakingly 1

2 synthesized one-at-a-time, and smaller collections of molecules were analyzed across a handful of in-vivo and in-vitro screens. Using this relatively small amount of information, teams would identify a few compounds with optimal characteristics and move towards development. The advent of combinatorial chemistry and automated methods for synthesizing compounds enabled companies to efficiently create hundreds to thousands of molecules, thus greatly expanding companies compound libraries. Robotic technology and biological nanotechnology advancements also enabled companies to conduct in-vitro screening of large libraries across a host of biological targets for efficacy, pharmacokinetic and pharmacodynamic properties, and safety characteristics. Furthermore, the field of computational chemistry now has the ability to numerically describe compounds molecular structure and physical characteristics. Hence, the number of compounds to evaluate, as well as the number of descriptors about those compounds has greatly expanded, leaving discovery teams with the difficult chore of sifting through this information to identify only the very most promising compounds to push forward. Humans are not well equipped to compare and contrast objects across an array of information. Instead, it is much easier to prioritize objects across just one, preferably, or perhaps two variables. Because of this inherent human nature, discovery teams have used desirability functions (Harrington, 1965) to generate one-number summaries when trying to optimize across multiple responses (Mandal et. al., 2007). There are various choices of desirability functions, one of the most common being the weighted geometric mean. In a global sense, these scores rank compounds from best to worse, where the higher the score, the better the compound. Often prior knowledge can help to determine the desirability threshold above which teams seek compounds. For some projects, a candidate set of desirable molecules emerge. For other projects, however, the set of compounds that currently exist may not achieve high enough desirability scores to merit moving any forward. In cases like these, teams would like to better understand characteristics of the compounds that are related to the desirability score and synthesize new compounds that improve upon those characteristics. Hence the compound characteristics that are thought to be related to the desirability score are also evaluated. This could be looked upon as a general regression set up where response Y is the compound desirability score and predictor variables X 1,..., X p represent the compound characteristics. Predictive performance and parsimony are the two main concerns when a regression model is constructed. Removing irrelevant variables not only makes the model simpler and interpretable but usually improves the prediction accuracy. Variable selection methods such as stepwise and best subsets are not as popular as they were before since they are time consuming (especially as the number of predictors become large) and sometimes might eliminate some of the useful predictors. The recently introduced Least Angle Regression (Efron et. al., 2004) algorithm, motivated by forward stagewise method is computationally very efficient. In recent days, penalized 2

3 least square based variable selection methods are more popular. LASSO (Tibshirani, 1996) shrinks all the least square regression coefficients towards zero. In some cases application of LASSO makes some of the coefficients absolutely zero, which leads to a simpler model. Under some modifications, LAR algorithm computes LASSO solutions. LASSO does not perform well when the variables are highly correlated. Elastic net (Zou and Hastie, 2005) is another method which outperforms LASSO in the presence of highly correlated predictors. LASSO is based on L 1 penalty and Elastic net is based on a linear combination of L 1 and L 2 penalties. Fan and Li (2001) introduced another penalty function namely, SCAD. SCAD estimators satisfy oracle properties. Fan and Lv (2004) introduced the sure independence screen (SIS) method that puts importance on the variables with high marginal correlation with the response. Although powerful, these methods do not solve the problem of efficiently identifying a handful of variables that can be tweaked in order to improve the efficacy of a compound. In this sense our problem is different from the standard variable selection problem, which we will discuss later. In this paper we propose an elegant method that can be easily used in order to identify important descriptors. In this problem, along with the values of the response for each compound we have some additional information, i.e. the scientists also know whether the compound is desirable, moderately desirable or undesirable. Naturally, here we use both the descriptive information as well as the desirability information to improve/enhance the performance of a compound. As mentioned before, the primary focus of the paper is to develop a simple and elegant method to prioritize some variables because the drug discovery team s objective is to identify a few characteristics of a compound to be modified in order to improve its desirability. To this end, the paper has been organized as follows. In Section 2 we discuss the data set provided by a pharmaceutical company. Given this context, we propose our method in Section 3. In Section 4 we illustrate this method on the pharmaceutical data, and in Section 5 we perform a cross validation study in order to investigate the performance of the method. We conclude our paper with some discussions in Section 6. 2 Pharmaceutical Chemistry Example. Early in the adjudication process, a pharmaceutical discovery team had obtained a desirability score for 10, 640 compounds. In addition, the team possessed values of 28 molecular descriptors for all of the compounds. Each of these descriptors is chemically interpretable, and changes in each of them can be translated into practical alterations of the molecule. For proprietary reasons, specific details about the data used in this work cannot be disclosed. However, we will provide general descriptions of the predictors and the desirability scores. Based on the desirability scores, the compounds are classified into three groups: desirable, moderately desirable, and undesirable. For this particular 3

4 problem, a compound is desirable if its score is greater than 5 and is undesirable if its score is less than 3. A distribution of the scores for this set of compounds is presented in Figure 1. Figure 1: Distribution of Desirability Scores. Undesirable Desirable Frequency Desirability Scores From Figure 1 we see that the distribution of the scores is more or less symmetric. In the data set, we have 2289 desirable compounds and 3323 undesirable compounds. One of the team s primary objectives was to understand predictors in such a way that they could make alterations to undesirable compounds to make them moderately desirable or better. 3 Proposed Variable Prioritization Methodology Let Y be the desirability score of a compound. We assume that a compound will be considered desirable if Y > C 2, undesirable if Y < C 1 and moderately desirable otherwise. Here the cut-offs C 1 and C 2 would depend on the nature of compound library at hand and are chosen by the scientists. In a data set of N compounds, suppose 4

5 D are desirable, M D are moderately desirable and U D are undesirable compounds. Also assume that the data set has p continuous descriptors denoted by X 1, X 2,..., X p. Let, Ŷ be the predicted desirability score of a compound based on these predictors, that is Ŷ = ˆf(X 1,..., X p ); for some unknown function f. Let D be the number of compounds which are predicted to be desirable, that is, D = N i=1 I{Ŷi > C 2 } and similarly define MD and ÛD. Suppose we remove a predictor from the set of predictors, say X j, and predict Y based on the rest of the predictors. Let Ŷ j i be the predicted desirability score of the i th compound after removal of j th (j) predictor from the dataset. Now, let D new denote the number of compounds which were originally either undesirable or moderately desirable but with the exclusion of the jth predictor, its predicted value falls in the desirable range. In other words, D(j) new = N i=1 I{Ŷ j i > C 2 and Y i < C 2 }. Similarly ÛD(j) new denotes the number of compounds that move from desirable group to moderately desirable or undesirable group. Finally let Ûd(j) new be the number of compounds which move from moderately desirable group to undesirable group, with the exclusion of X j. Let P j be the proportion of compounds which are predicted to be desirable based on the set of predictors excluding X j, among the compounds which are observed as moderately or undesirable and Q j be the proportion of compounds which are predicted to be undesirable based on the set of predictors excluding X j, among the compounds which are observed as desirable. Similarly, R j be the proportion of compounds which are predicted to be undesirable based on the set of predictors excluding X j, among the compounds which are observed as moderately desirable. That is, P j = D (j) new MD + UD, Q j = ÛD(j) new D, R j = Ûd(j) new MD. We can compute P, Q and R for each variable in the set of predictors and judge whether or not to use this variable to modify the molecule. Removal of the j th variable could also be seen as setting X j = 0, which is equivalent to tweaking a molecule such that the value of the j th predictor becomes 0. Suppose that a variable X j has a high value of P. This would indicate that corresponding alteration of X j would shift considerably large proportion of compounds into the desirable range. Hence, this result would suggest changing the molecule based on X j. On the other hand, if Q j and/or R j is high, altering X j in this way would cause a large proportion of compounds to move from a more desirable to a less desirable category, 5

6 and hence we should not consider alterations of the predictors that would raise these Q and R scores. While making decisions on which predictors should be considered to alter the molecule, we need to consider P, Q, and R together. Given these considerations, we propose the following variable prioritization method: For each predictor X j, do: (1) Remove X j. (2) Predict Y based on X 1, X 2,..., X j 1, X j+1,...x p. (3) Compute P j, Q j, R j. end. Select important variables based on P ; Q; R plots. Note that a variable with a higher P score may not be included in the set of important predictors, on the other hand, a variable having higher Q or R score should not be excluded. We recommend plotting P, Q and R scores for each variable to understand the relative scoring within the problem. This proposed visualization method can help to pinpoint the important variables from a vast set of variables which might have either a good or an adverse effect to the desirability score of a compound. Having identified a variable as important, we can then use that information as feedback into the scientific process. In this case we could use the information for future molecular synthesis. Now, let X Important be the set of important predictors. Since we suggest of choosing the important variables from the P, Q and R plots, the choice of the members and the cardinality of X Important is subjective. A practical approach would be to rank the predictors according to their Q and R scores and choose a few of them based on their ranks. We illustrate the method with analyzing a data and also perform a cross validation in Section 5. There we rank the variables based on their P, Q and R values (in the ascending order) and choose first three variables based on the Q and R scores and construct X QR High. We also construct XP High which consists of the variables which rank among top three variables based on their P scores. Finally we construct X Important based on the variables which are in X QR High but not in XP High. Scientists might also be interested in finding the relationship between the desirability score and the predictors, that is, estimating the unknown function η in Y = η(x 1,..., X p ) + ε, with ε N(0, σ 2 ), where Y is the desirability score and (X 1,..., X p ) are the p predictors. We consider a second order polynomial for η. This is a common choice of response function in response surface methodology (Wu and Hamada, 2009). In order to reduce the number of parameters and avoid over-fitting, we consider the second order polynomial only for the important variables. Hence the model becomes, Y = g(x Important ) + h(x Remaining ) + ε (1) 6

7 where X Important is the set of important variables chosen using our proposed variable prioritization methodology and X Remaining denotes the others. Here we approximate g( ) by a second order polynomial which contains all the linear, quadratic and interaction terms of the important variables, and approximate h( ) by a linear combination of other variables. The choice of the model could be subjective in this scenario and there are scopes for model selection for this kind of data. Figure 2: PQR plots used to identify the important predictors P plot Q plot P X10 X15 X16 X21 X22 X23 Q 0e+00 1e 04 2e 04 3e 04 4e 04 5e 04 X1 X10 X11 X15 X19 X23 X Variables Variables R plot R X5 X11 X8 X17 X16 X Variables 7

8 4 Pharmaceutical Chemistry Example Revisited In this Section we applied the proposed methodology to the data set at hand. As mentioned in Section 2, a compound is considered as desirable if it s desirability score (Y ) is greater than 5, moderately desirable if Y is between 3 and 5 and undesirable if Y is less than 3. In this data set we have 28 predictors and we follow the steps described in Section 3. The P, Q and R values are illustrated in Figure 2. In this example the P scores are comparatively higher for X 15, X 10, X 22, X 21 and X 23, and is significantly lower for X 16. This means if a compound from the undesirable or moderately desirable group is chosen and reconstructed with X j =0, j {10, 15, 21, 22, 23}, then the chance of the new compound to fall in the desirable group is higher. We may consider X 10, X 15, X 21, X 22 and X 23 to be removed from the set of predictors, but before we move on, we need to check the Q and R plots as well. From the Q plot for all the variables, we see the variables X 1, X 10, X 11, X 15, X 19, X 23 and X 26 have a higher Q score. This implies, the chances of a desirable compound to become an undesirable compound is high if we tweak the compound such that X j =0, j {1, 10, 11, 15, 19, 23, 26}. Hence we should not set them to be 0 and they should be considered as the members of X Important. Similarly, the R plot indicates that there is a little chance that a moderately desirable compound will become undesirable if the scientists tweak the compound such that X 16 =0, where as chances are higher if they set X 11, X 5, X 8, X 25 and X 17 to be 0. While building a model as suggested in Equation (1), we need to choose variables to keep in X Important. As we mentioned in Section 3, choice of the members of X Important is subjective. From Figure 2, we see that, X 1, X 5, X 8, X 10, X 11, X 15, X 19, X 23, X 25 and X 26 have either higher Q or higher R. Among them, we exclude X 10 and X 15 since both of them have higher P scores. Hence X Important consists of X 1, X 5, X 8, X 11, X 19, X 23, X 25 and X 26. Once the X Important set is built, We fit the model (1) based on these variables and the adjusted R 2 of the fitted model is Fitted Model (1) is a second order model and consists of 65 parameters. From a parsimonious perspective, we could fit a LASSO (Tibshirani, 1996) in order to get rid of irrelevant predictors. If there are correlated variables in the predictors set, Elastic Net (Zou and Hastie, 2005) could also be used. 8

9 Figure 3: Importance Frequency and Second Order Frequency plot for all the variables P Q Importance Frequency P X15 X22 Importance Frequency Q X Variables (a) R Variables (b) Second Order Frequency Importance Frequency R X5 X11 X25 Frequencies X2 X1 X3 X9 X15 X16 X18 X21 X22 X Variables (c) Variables (d) 9

10 5 A Cross Validation Study and Model Comparison In this section we evaluate the performance of our method through a cross validation study. The cross validation scheme is as follows. 1. We construct a training data set by randomly selecting 1000 observations from the entire data set, and we also randomly select a test data of size 500 from the remaining observations. 2. We apply the variable prioritization method proposed in Section 3 to the training data. At first we rank the variables according to their P, Q and R scores. We choose the top three variables based on Q and R scores, and construct X QR High. Again we construct another set which consists of the top three variables based on the P score, namely, XHigh P. Now, X Important consists of the variables which belong to X QR High but do not belong XP High, that is, X Important = X QR High \ XP High. 3. We fit the model (1) described in Section 3 using this X Important. 4. We predict the desirability scores of the observations in the test data using the fitted model (Step 3) and we calculate R 2, the correlation between yi T and y ˆ i T, the observed and predicted desirability score of the i th compound in the test data, respectively. 5. We repeat the steps above (1 through 4) 100 times. We define the importance frequency of a variable as the number of times a variable appears among the top three variables in terms of a performance measure. We mentioned in the simulation scheme that while performing the cross validation study, for each training data set we calculate the P, Q and R score of a variable. Now once we have P, Q and R scores for all the variables, for all the training data sets, we compute the importance frequency of each variable in terms of the P, Q, and R scores. In panels (a), (b) and (c) of Figure 3, we plot the P, Q and R importance frequencies. We point out the variables which have considerably higher or lower importance frequencies. In Table 1 we show the importance frequency distribution of the variables with respect to P and R scores. From the panel (b) of Figure 3 we see that X 11 has slightly higher frequency than the rest of the variables, which have almost the same frequencies. Hence we do not draw importance frequency table with respect to Q scores. Now, without prioritizing the variables, if we consider a second order response surface model for all the variables, we have, Y = η(x 1,..., X p ) + ε = Zγ + ε, where ε N(0, σ 2 ) (2) 10

11 Table 1: Importance frequencies and Second order frequencies of the descriptors Importance frequencies Second order frequencies Frequency (P ) (R) Model(2)+LASSO < 20 other variables None X 25 X 23, X X 22 X 5 X X 15 X 11 X 1, X 18, X 21 > 80 None None X 2, X 9, X 15, X 16, X 22 where Y n 1 is vector of desirability scores, γ v 1 is the coefficient vector and, Z=(X 1, X 2..., X p, X1 2,..., X2 p, X 1 X 2,..., X p 1 X p ) n v, where v is the total number of linear and second order terms in the model. We estimate the parameters of Model (2) by ordinary least square method. In this kind of problems, typically p is large which enforces a large value of v. One might be tempted to use some well known variable selection techniques in order to fit a parsimonious model. But we need to keep in mind that it will defeat our purpose of identifying a handful of predictors to be tweaked in order to make the compounds more desirable. However, for the sake of comparison, we choose LASSO (Tibshirani, 1996) as our variable selection technique. Hence γ is estimated in the model as, ˆγ=argmin γ ȳ Zγ 2, subject to, p i=1 γ j t, where t is the tuning parameter. Tuning parameter is chosen by optimizing the generalized cross validation style statistic (Tibshirani, 1996). In order to fit Model (2) with LASSO penalty, we use LAR algorithm (Efron et. al, 2004) with LASSO modification. We apply this model to the same training data that we obtain in step(1) of the previously described simulation scheme and check the predictive performance by applying the fitted model to the test data. We repeat this 100 times for the same training and test data sets we used before. Finally we compute R 2 for each of the 100 simulations via calculating the correlation coefficient between predicted and observed desirability scores of the test data. We also count the number of times a variable selected as a second order term in the fitted LASSO model to the training data. By second order term we mean either quadratic main effect (x 2 i ) or interaction effect (x i x j ). We call this count as second order frequency. Table 1 and panel (d) of Figure 3 show the second order frequencies of each variable. One may argue that the importance frequencies and second order frequencies are not directly comparable, but it can be used to illustrate our point that the proposed PQR method identifies a handful of predictors that can be tweaked to improve the performance of the compound more efficiently. This happens because our Model (1) follows the effect heredity principle (Chipman 1996) (also called the marginality principle by McCullagh and Nelder, 1989). 11

12 In Figure 4 we plot the R 2 for Models (1) and (2), with and without LASSO penalty. Model (1) outperforms Model (2) by a large extent. We can also see that the predictive performance of Model(2)+LASSO (i.e. full second order model with LASSO penalty) is only marginally better than the proposed model. This establishes the usefulness of the proposed method which is very simple for the scientists to use. Figure 4: We compare the predictive performances of Model (1), Model (2) and Model (2)+LASSO. In the right panel we take a closer look of the boxplots of the left panel. Training data size=1000, Test data size=500 Training data size=1000, Test data size=500 R R Model(1) Model(2) Model(2)+LASSO Model(1) Model(2) Model(2)+LASSO 6 Summary and Concluding Remarks The proposed variable prioritization method uses qualitative information (i.e. whether the response falls in the desirable, undesirable or moderately desirable group) and the quantitative values of the response. Once the method is applied, it will be easier for the scientists to visualize the importance of the predictors and come up with a fewer set of predictors in order to conduct further research. We proposed a model considering the second order terms of the prioritized variables and studied its performance through a cross validation study. In order to compute P, Q and R scores, we need to choose 12

13 a prediction function f as mentioned in Section 3. In this paper we chose multiple linear regression to predict Y for this purpose. One could also use regression trees, for instance Random forest (Breiman, 2001) regression, as well. However, application of Random Forest could increase the computation time. As illustrated by the Figures 3 and 4, our proposed method performs almost as good as sophisticated variable selection techniques like LASSO, but is much simpler for the scientists to use, and also identifies a smaller number of descriptors that the scientists would be interested in. The focus of the paper is variable prioritization to identify a small number of predictors that can be modified to improve the performance of the compound. One could plot the P, Q and R scores and identify or rank the predictors based on their respective scores. Once the set of important predictors is built then one might think of fitting a model in order to explore the relationship among the predictors and the response variable. The proposed model motivated by the second order response surface modeling fits reasonably well to the data. There are scopes for further research in model selection for this kind of data. Here we compute marginal P, Q and R scores i.e. we removed a single variable from the predictors set at a time. Group P, Q and R could also be computed by removing a group of variables from a predictors set. References 1. Breiman, L. (2001). Random forests. Machine Learning, 45, Chen, H.-W., Wong, W. K. and Xu, H (2012). An Augmented Approach to the Desirability Function. Journal of Applied Statistics, in press. 3. Chipman, H. (1996). Bayesian variable selection with related predictors. Canadian Journal of Statistics 24, Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion) Ann. Statist 32, Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96, Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. Roy. Statist. Soc. Ser. B 70, Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistical Sinica 20,, Harrington, E. C. (1965). The Desirability Function. Industrial Quality Control 4, Hastie, T., Tibshirani, R. and Friedman, J. (2005). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition). Springer-Verlag, New York. 13

14 10. McCullagh, P., Nelder, J. A. (1989). Generalized Linear Models (2nd Edition). Chapman and Hall, London. 11. Mandal, A., Johnson, K., Wu, C. F. J. (2007). Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Technique. Journal of Chemical Information and Modelling 47, Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58, Wu, C. F. J., and Hamada, M. (2000). Experiments: Planning, Analysis, and Parameter Design Optimization New York: Wiley. 14. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67,

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques

Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques J. Chem. Inf. Model. 2007, 47, 981-988 981 Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques Abhyuday Mandal* Department of Statistics, University

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

PENALIZING YOUR MODELS

PENALIZING YOUR MODELS PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms

SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms Revision submitted to Technometrics Abhyuday Mandal Ph.D. Candidate School of Industrial and Systems Engineering

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Machine Learning for Biomedical Engineering. Enrico Grisan

Machine Learning for Biomedical Engineering. Enrico Grisan Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

arxiv: v2 [stat.ml] 22 Feb 2008

arxiv: v2 [stat.ml] 22 Feb 2008 arxiv:0710.0508v2 [stat.ml] 22 Feb 2008 Electronic Journal of Statistics Vol. 2 (2008) 103 117 ISSN: 1935-7524 DOI: 10.1214/07-EJS125 Structured variable selection in support vector machines Seongho Wu

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Discussion of Least Angle Regression

Discussion of Least Angle Regression Discussion of Least Angle Regression David Madigan Rutgers University & Avaya Labs Research Piscataway, NJ 08855 madigan@stat.rutgers.edu Greg Ridgeway RAND Statistics Group Santa Monica, CA 90407-2138

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

STRUCTURED VARIABLE SELECTION AND ESTIMATION

STRUCTURED VARIABLE SELECTION AND ESTIMATION The Annals of Applied Statistics 2009, Vol. 3, No. 4, 1738 1757 DOI: 10.1214/09-AOAS254 Institute of Mathematical Statistics, 2009 STRUCTURED VARIABLE SELECTION AND ESTIMATION BY MING YUAN 1,V.ROSHAN JOSEPH

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

BART: Bayesian additive regression trees

BART: Bayesian additive regression trees BART: Bayesian additive regression trees Hedibert F. Lopes & Paulo Marques Insper Institute of Education and Research São Paulo, Brazil Most of the notes were kindly provided by Rob McCulloch (Arizona

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Sure Independence Screening

Sure Independence Screening Sure Independence Screening Jianqing Fan and Jinchi Lv Princeton University and University of Southern California August 16, 2017 Abstract Big data is ubiquitous in various fields of sciences, engineering,

More information

Generalized Linear Models in Collaborative Filtering

Generalized Linear Models in Collaborative Filtering Hao Wu CME 323, Spring 2016 WUHAO@STANFORD.EDU Abstract This study presents a distributed implementation of the collaborative filtering method based on generalized linear models. The algorithm is based

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms

SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms GA-ENHANCED SELC 1 SELC : Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms ABHYUDAY MANDAL Department of Statistics University of Georgia Athens, GA 30602-1952 (amandal@stat.uga.edu)

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Regularized Discriminant Analysis and Its Application in Microarrays

Regularized Discriminant Analysis and Its Application in Microarrays Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,

More information

The Double Dantzig. Some key words: Dantzig Selector; Double Dantzig; Generalized Linear Models; Lasso; Variable Selection.

The Double Dantzig. Some key words: Dantzig Selector; Double Dantzig; Generalized Linear Models; Lasso; Variable Selection. The Double Dantzig GARETH M. JAMES AND PETER RADCHENKO Abstract The Dantzig selector (Candes and Tao, 2007) is a new approach that has been proposed for performing variable selection and model fitting

More information

(Received April 2008; accepted June 2009) COMMENT. Jinzhu Jia, Yuval Benjamini, Chinghway Lim, Garvesh Raskutti and Bin Yu.

(Received April 2008; accepted June 2009) COMMENT. Jinzhu Jia, Yuval Benjamini, Chinghway Lim, Garvesh Raskutti and Bin Yu. 960 R. DENNIS COOK, BING LI AND FRANCESCA CHIAROMONTE Johnson, R. A. and Wichern, D. W. (2007). Applied Multivariate Statistical Analysis. Sixth Edition. Pearson Prentice Hall. Jolliffe, I. T. (2002).

More information

Variable Selection and Weighting by Nearest Neighbor Ensembles

Variable Selection and Weighting by Nearest Neighbor Ensembles Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] 8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

Tuning Parameter Selection in L1 Regularized Logistic Regression

Tuning Parameter Selection in L1 Regularized Logistic Regression Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University

More information

Boosted Lasso and Reverse Boosting

Boosted Lasso and Reverse Boosting Boosted Lasso and Reverse Boosting Peng Zhao University of California, Berkeley, USA. Bin Yu University of California, Berkeley, USA. Summary. This paper introduces the concept of backward step in contrast

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Introduction to Chemoinformatics

Introduction to Chemoinformatics Introduction to Chemoinformatics Dr. Igor V. Tetko Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) Institute of Bioinformatics & Systems Biology (HMGU) Kyiv, 10 August

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Model Selection and Estimation in Regression with Grouped Variables 1

Model Selection and Estimation in Regression with Grouped Variables 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information