INTRODUCTION THEORY. 402 Volume 65, Number 4, 2011 APPLIED SPECTROSCOPY

Size: px
Start display at page:

Download "INTRODUCTION THEORY. 402 Volume 65, Number 4, 2011 APPLIED SPECTROSCOPY"

Transcription

1 Elastic Net Grouping Variable Selection Combined with Partial Least Squares Regression (EN-PLSR) for the Analysis of Strongly Multi-collinear Spectroscopic Data GUANG-HUI FU, QING-SONG XU,* HONG-DONG LI, DONG-SHENG CAO, and YI-ZENG LIANG School of Mathematical Sciences and Computing Technology, Central South University, Changsha , P.R. China (G.-H.F., Q.-S.X.); and Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha , P.R. China (H.-D.L., D.-S.C., Y.-Z.L.) In this paper a novel wavelength region selection algorithm, called elastic net grouping variable selection combined with partial least squares regression (EN-PLSR), is proposed for multi-component spectral data analysis. The EN-PLSR algorithm can automatically select successive strongly correlated prediction variable groups related to the response variable using two steps. First, a portion of the correlated predictors are selected and divided into subgroups by means of the grouping effect of elastic net estimation. Then, a recursive leave-one-group-out strategy is employed to further shrink the variable groups in terms of the root mean square error of cross-validation (RMSECV) criterion. The performance of the algorithm with real near-infrared (NIR) spectroscopic data sets shows that the EN-PLSR algorithm is competitive with full-spectrum PLS and moving window partial least squares (MWPLS) regression methods and it is suitable for use with strongly correlated spectroscopic data. Index Headings: Elastic net; Grouping variable selection; Partial least squares regression; PLSR; Multi-collinear analysis; Near-infrared spectroscopy; NIR spectroscopic data; EN-PLSR; Wavelength region selection algorithm. INTRODUCTION In chemistry, multivariate calibration methods are widely applied for investigating multi-component spectroscopic data. So far, a number of popular methods have been utilized to achieve this goal: (1) principal component regression (PCR), 1,2 (2) partial least squares (PLS), 3 10 (3) ridge regression, 11 (4) artificial neural networks (ANN), 12,13 and (5) least absolute shrinkage and selection operator (LASSO). 14 Also, a large number of generalized methods or algorithms based on the above-mentioned models have been proposed recently, such as elastic net estimator, 15 moving window partial least squares regression (MWPLS), 16 Bayesian linear regression with variable selection (BLR-VS), 17 and the competitive adaptive reweighted sampling (CARS) method for multivariate calibration. 18 The LASSO method finds a least-square solution with the constraint on the L 1 -norm of the regression coefficient vector. The key variables for LASSO can be selected step by step according to the LARS algorithm. 19 Elastic net is the generalization of LASSO and can be used more widely than LASSO in multi-component analysis due to its advantages over LASSO, for example, the grouping effect, which is introduced in the following section. One of the challenges for analysis of near-infrared spectral data in chemometrics is how to deal with the p.. n problem, where n is the number of the observations and p is the number of the variables (i.e., factors or predictors). Generally speaking, grouped variables should be of concern in large p and small n problems. 20 Chemists often take many factors into consideration in order to measure properties with precision, whereas only a few observations can be obtained due to difficulties met in the experiments. Statisticians advise employing methods such as cross-validation (CV) and group variable methods when the size of a sample is small. Techniques for grouping variables have been studied many times for large-scale predictors Another challenge is multi-collinearity or correlation of the predictors, because traditional multiple linear regression (MLR) fails in such situations. Thus, novel data analysis methods should be explored for this special case. In this paper we focus on addressing near-infrared (NIR) spectroscopic data, which is generally multi-component as well as multi-collinear. By combining elastic net with partial least squares regression, we propose a new algorithm called EN- PLSR to cope with this case. Under the data-driven principle in statistics, the EN-PLSR algorithm would automatically obtain a number of strongly correlated variable groups through elastic net estimation, and subsequently recursive PLS calibration models are built based on these variable groups. The performance for real data sets shows that the EN-PLSR algorithm has good prediction accuracy and is extraordinarily suitable for strongly correlated data. The remainder of the paper is organized as follows: the Theory section offers the basic theory including variable selection and the grouping effect of the elastic net. The Algorithm section gives the details of the EN-PLSR algorithm. The Experimental and Results sections provide a description of the data sets used and give the experimental results and discussion, respectively. Finally, we give a summary and further discussion. THEORY Variable Selection. Let y ¼ (y 1, y 2,..., y n ) T be the response vector and X ¼ (x 1, x 2,..., x p ) be the predictor matrix, where x j ¼ (x 1j, x 2j,..., x nj ) T ( j ¼ 1, 2,..., p) are p predictors. For simplicity, we also suppose that the data are standardized to have zero mean and unit length for every variable. Namely, Received 19 July 2010; accepted 12 January * Author to whom correspondence should be sent. qsxu@mail. csu.edu.cn. DOI: / X n i¼1 y i ¼ 0; X n i¼1 x ij ¼ 0; X n i¼1 x 2 ij ¼ 1; ð j ¼ 1; 2;... pþ ð1þ 402 Volume 65, Number 4, 2011 APPLIED SPECTROSCOPY /11/ $2.00/0 Ó 2011 Society for Applied Spectroscopy

2 (see Fig. 1) and the optimal point of LASSO is exactly located in the variable coordinate frame (for example, point A in Fig. 1). The Grouping Effect of Elastic Net. Elastic net, 15 an improved version of LASSO for analyzing high-dimensional data, is defined as follows: n o ˆb ¼ð1þk 2 Þ arg min jjy Xbjj 2 2 þ k 2jjbjj 2 2 þ k 1jjbjj 1 b ð4þ FIG. 1. Two-dimensional LASSO penalty (dashed) and elastic net penalty (solid). LASSO penalty field is convex and elastic net penalty field is strictly convex. In multivariate calibration, one usually considers the following linear regression: y ¼ Xb þ e ¼ x 1 b 1 þ x 2 b 2 þx p b p þ e where b ¼ (b 1, b 2,..., b p ) T is the regression coefficient vector. e is usually the Gauss noise, namely, e ; N(0, r 2 I), where I is the identity matrix. Generally, the number of predictors p is very large compared with the number of observations n. Thus, chemists pay more attention to seeking a sparse solution in the application for the sake of scientific insight into the predictor response relationship. Sparse means that a large proportion of elements of b are zero. The traditional method of solving the model given by Eq. 2 is to minimize the residual sum of squares by ordinary least squares (OLS). However, the OLS solutions are generally not sparse. Tibshirani proposed LASSO 14 by using L 1 -norm penalization on the ordinary least squares. The LASSO estimator solves the following optimal objective: subject to ð2þ ˆb ¼ arg min b jjy Xbjj 2 2 ð3aþ where k 1 and k 2 are two non-negative parameters. If k 2 ¼ 0, elastic net is exactly equivalent to LASSO. The scale factor (1 þ k 2 ) should be changed to [1 þ (k 2 /n)] when the variables are not standardized to have mean zero and L 2 -norm one. As shown in Eq. 4, elastic net estimation can be viewed as a penalized least squares methods. The penalty k 2 jjbjj 2 2 þ k 1jjbjj 1 is the convex combination of the LASSO and ridge penalties. Elastic net is the generalization of LASSO, so it can automatically perform variable selection. On the other hand, elastic net estimation can be employed to deal with data with large p and small n, which often appears when carrying out multi-component spectral analysis. It can even select all the p variables in the extreme, whereas LASSO can at most select n predictors. Another important property of elastic net is the grouping effect. Generally speaking, if the regression coefficients of the strongly correlated variables t to be equal (in the absolute value sense), the regression model exhibits a grouping effect. Particularly, when two variables are exactly identical, their coefficients in the regression model should be identical. Hui Zou et al. have pointed out that two completely identical predictors have identical coefficients in the model of Eq. 4 when k 2. 0 due to the strictly convex elastic net penalty k 2 jjbjj 2 2 þ k 1jjbjj 1 (see Fig. 1). Suppose that the predictors V i and V j are strongly correlated. According to the grouping effect of elastic net, the regression coefficients b i and b j should t to be similar, namely, synchronously zero or non-zero. In other words, the predictors V i and V j are synchronously selected or synchronously eliminated. Further assuming that the predictor V j is the most correlated variable with the predictor V i among all the predictors, and in the current step the predictor added to the model of Eq. 2 is V i (i.e., its regression coefficient b i varies from zero to non-zero), then theoretically, the variable added to the model of Eq. 2 in the next step ts to be predictor V j. This is the basis of the grouping strategy of the EN-PLSR algorithm for application to spectroscopic data. Generally speaking, in the full wavelength range of spectroscopic data, two successive channels (predictors) are strongly correlated. Thus, they should be in or out of the model (Eq. 2) together. X p j¼1 jb j jt ð3bþ where t is the non-negative tuning parameter. The LASSO can automatically perform regression and variable selection simultaneously. It shrinks the regression coefficients of the unimportant predictors to exactly zero by controlling the parameter t. This is because the L 1 -norm constraint R p j¼1 jb jjt is convex and singular at the origin 25 FIG. 2. The variable shrinkage by two steps with multi-component and strongly correlated spectroscopic data. APPLIED SPECTROSCOPY 403

3 FIG. 3. The curves of RMSECV with full-spectrum PLS, MWPLS, and EN-PLSR methods for m5psec combined with the four responses of the corn data. Furthermore, they are still successive in the solution sequence of elastic net. EN-PLSR ALGORITHM Motivation and Aim. Generally speaking, we often use prediction accuracy to judge whether a regression model is a good one. But in the large p and small n case, parsimony should also be considered: simpler models are preferred for the sake of scientific insight into the data structure and alleviation of the computational burden. In addition, shrinking the model can often improve the prediction accuracy. 26 So the method of reducing the model complexity is extremely important. The single-variable forward stepwise selection or backward stepwise elimination method is not a good strategy when the number of predictors is ultra-high. 27 Group variable selection is naturally proposed in this situation. In this work, we propose a group variable selection method that combines elastic net and TABLE I. The results for m5spec associated with four responses: moisture, oil, protein, and starch from the corn data set. No.LVs and No.Var are the number of components of the PLS and the number of selected variables, respectively. Method Moisture Oil Protein Starch PLS RMSECV No.LVs No.Var MWPLS RMSECV No.LVs No.Var EN-PLSR RMSECV No.LVs No.Var 40 a 69 b 90 c 80 d a k 2 ¼ 0.5, s ¼ b k 2 ¼ 3000, s ¼ c k 2 ¼ 200, s ¼ d k 2 ¼ 100, s ¼ Volume 65, Number 4, 2011

4 TABLE II. The selected key predictors and the corresponding wavelength regions for the corn data set. Method Response Selected predictors Selected wavelength regions (nm) PLS moisture oil protein starch MWPLS moisture , , , , oil , , 2236 protein 5 59, , , , , , , , , , starch 1 93, , , , , , , , EN-PLSR moisture , , oil , , , , protein , , starch , , , , , , PLSR. First, the predictors are preliminarily shrunk and the high collinear predictors that are kept are automatically grouped according to the grouping effect of elastic net. Then, a leave-one-group-out strategy is followed to further shrink the variables. More details are given below. Algorithm Description. The EN-PLSR algorithm mainly contains two steps. First, elastic net estimation is employed to shrink the predictors. All the predictors are selected automatically by elastic net estimation and a variable sequence is obtained. Then a portion of the variables (e.g., 40%, which can be decided by cross-validation) are selected as key variables. We have noted that the key variables selected by elastic net are piecewise successive for the collinear spectroscopic data in most cases. Under the constraint that the number of elements of any variable group cannot exceed some constant m (e.g., m ¼ 30), the EN-PLSR algorithm treats every continuous predictor segment as a subgroup in which the predictors are highly correlated for the sake of the grouping effect of elastic net. Then, EN-PLSR builds a PLSR model by using the aboveselected group variables and computes its root mean square error of cross-validation (RMSECV 0 ). The recursive leave-onegroup-out strategy is employed to further shrink the predictors in this step. In other words, a series of PLSR models are built using all the variable groups but one. Their corresponding RMSECV h is calculated and if the minimum of RMSECV h is less than RMSECV 0, the corresponding variable group is left out and the above procedure is repeated. It is worth noting that the three tuning parameters k 1, k 2, and A should be validated beforehand. The pseudo-codes of the EN-PLSR algorithm are supplied in the Appix. The EN-PLSR algorithm can be seen as a two-step variable shrinkage (see Fig. 2). First, some uninformative variables are automatically eliminated by elastic net. Second, the recursive leave-one-group-out strategy further shrinks the variables in terms of PLSR in the RMSECV sense. Tuning Parameters. Three tuning parameters, the number of latent variables A, the L 1 -norm penalty k 1, and the L 2 -norm penalty k 2 (see Eq. 4), must be further evaluated for prediction accuracy. However, k 1 can be replaced by the fraction of the L 1 -norm (s), which is always valued within [0,1]. In this paper, we choose k 2, s, and A as tuning parameters and they are estimated by the traditional ten-fold cross-validation method. In practice, we set a grid of three tuning parameters, for example, k 2 can be set to the following ten values: k 2 ¼f0.01, 0.1, 0.5, 1, 2, 5, 10, 100, 1000, 10000g. The range of s is [0,1] and is equally divided into 100 intervals. For the number of latent variables A, we let it be 1 to 15. Thus, we make use of ¼ grid points to search for the optimal combination of model parameters in terms of RMSECV, which is defined as follows: 8 9 >< RMSECV ¼ >: X K Xh i 2 y i ^y ð kþ i >= k¼1 i¼1 where y i is the measure value of the ith (i ¼ 1, 2,..., n) sample and ^y ð kþ i is the predicted value obtained by leaving the kthfold samples out. EXPERIMENTAL Data Set A: Corn. Data set A is cited from Ref. 28, which consists of 80 samples of corn measured on three different NIR spectrometers. The wavelength range is nm at 2 nm intervals and thus it contains 700 predictors (or variables) measured by three instruments called m5, mp5, and mp6 and correspondingly obtains three predictor matrices called m5spec, mp5spec, and mp6spec, respectively. The predictors are generally strongly correlated. Taking m5spec for example, 93.4% of the variables have correlation coefficients more than 0.92, and 49.4% of the variables have correlation coefficients more than The moisture, oil, protein, and starch values for each of the samples are also included as response variables and stored in the response matrix propvals. In this study, we employ m5spec as the predictor matrix together with four responses to investigate the performance of the EN-PLSR algorithm. Data Set B: Gasoline. The second data set, from Ref. 29, is another NIR spectral data set with NIR spectra and octane numbers of 60 gasoline samples. The NIR spectra were measured using diffuse reflection as log (1/R) from 900 nm to 1700 nm in 2 nm intervals, giving 401 wavelengths (variables). RESULTS AND DISCUSSION Data Set A: Corn. In this experiment, we used the m5spec as the predictor matrix together with the four responses to correspondingly build four models for investigating the EN-PLSR algorithm. Each predictor and response was first standardized to have zero mean and unit length. The maximal number of each variable subgroup was set to 30. The results of full-spectrum PLS and MWPLS 16 are also shown for the purpose of comparisons. We use Fig. 3 and Table I to describe the RMSECV for the four responses. We set the upper limit of the number of latent variables (LVs) to be 15. The RMSECV decreases as the n >; 1=2 ð5þ APPLIED SPECTROSCOPY 405

5 FIG. 4. EN-PLSR algorithm on m5spec of corn data set A with four responses. The finally selected variable groups (selected regions) are (a) nm and nm; (b) nm, nm, and nm; (c) nm and nm; and (d) nm, nm, nm, and nm. number of LVs increases. The minimal point for judging how many LVs should be chosen is not obvious or does not exist for the four responses. Large LVs is not a wise choice, so we use the standard introduced in Ref. 8: a component is judged significant if the ratio RMSECV a /RMSECV a 1 is smaller than 0.95 in this experiment, where RMSECV a is the RMSECV obtained by choosing a (a ¼ 2,..., 15) LVs. According to this criterion, the numbers of LVs for the model are, respectively, 14, 9, 12, and 10 for the four responses. Obviously, the EN- PLSR algorithm has the lowest RMSECV compared with the other two methods (see Table I). For the responses moisture and protein, RMSECV decreases faster with the increase of LVs than for the other two methods (see Fig. 3). The selected wavelength regions and the selected predictors for the four responses are shown in Table II and Fig. 4. Compared to MWPLS, the number of selected variables is much smaller for EN-PLSR for each response. This is very important in practice because the final regression model is more parsimonious. It is also interesting to note that some of the wavelength bands selected by the EN-PLSR algorithm are chemically meaningful. For example, the cm 1 band should be ascribed to water absorption 30 and the combination of the O H bond. 16 As is well known, band assignment is very difficult because, to the best of our knowledge, each wavelength of an NIR spectrum is influenced by a number of compounds and/or different vibration modes. However, we also found that some of wavelength bands selected by EN-PLSR are chemically meaningful according to Table 1 in the appix of Ref. 31. Data Set B: Gasoline. We employed the same data 406 Volume 65, Number 4, 2011

6 TABLE III. RMSECV, selected key predictors, and corresponding wavelength regions for the gasoline data set. Method RMSECV No. LVs No. Var Selected predictors Selected wavelength region (nm) PLS MWPLSR , , EN-PLSR a , , , , , , a k 2 ¼ 200, s ¼ , , , , , , , , We note that the variables in each group are strongly correlated, which indirectly supports our algorithm. FIG. 5. The EN-PLSR algorithm on gasoline data set B. The finally selected variable groups are , , , , , , and nm. FIG. 6. The curves of RMSECV with full-spectrum PLS, MWPLS, and EN- PLSR methods for the gasoline data set B. pretreatment and criterion for data set B as we did for data set A in this experiment, but we chose the optimal number of LVs corresponding to the minimum RMSECV. Seven variable groups with 69 predictors in total were finally selected as key variables (see Fig. 5). The RMSECV of full-spectrum PLS, MWPLS, and EN-PLSR methods are described in Fig. 6 and Table III. The RMSECV values of MWPLS and EN-PLSR are close; however, the number of key variables selected via EN- PLSR is much smaller than the number selected by MWPLS. CONCLUSIONS Variable selection for multi-collinear as well as strongly correlated data with small observations and large predictors is an issue of importance in chemometrics, especially in spectroscopic analysis. We propose in this paper a new method called EN-PLSR to deal with multi-collinear NIR spectroscopic data. The EN-PLSR algorithm is a variable grouping (or interval) selection method that focuses on automatically subgrouping the candidate variables and obtaining a parsimonious solution model while as much as possible retaining all the relevant information. The predictors in every group selected by the EN-PLSR algorithm are successive and strongly correlated. In the Experimental section, we employed NIR spectroscopic data to evaluate our algorithm, but the EN-PLSR methods can be applied to a wide variety of fields, for example, ultraviolet visible (UV-Vis) and mid-infrared (MIR) spectroscopic data analysis. In order to obtain the optimal solution, three tuning parameters, the L 1 and L 2 -norm penalty coefficients in elastic net, as well as the number of components in PLSR, should be carefully validated. ACKNOWLEDGMENTS I would like to express my gratitude to Prof. Rolf Manne, University of Bergen, Norway for his advice on this paper. This work was financially supported by the National Natural Science Foundation of P.R. China (Grants No and No ), and the international cooperation project on traditional Chinese medicines of ministry of science and technology of China (Grant No. 2007DFA40680). The studies meet with the approval of the university s review board. 1. T. Næs and H. Martens, J. Chemom. 2, 155 (1988). 2. M. K. Hartnett, G. Lightbody, and G. W. Irwin, Chemom. Intell. Lab. Syst. 40, 215 (1998). 3. P. Geladi and B. R. Kowalski, Anal. Chim. Acta 185, 1 (1986). 4. A. Höskuldsson, J. Chemom. 2, 211 (1988). 5. F. Lindgren, P. Geladi, and S. Wold, J. Chemom. 7, 45 (1993). 6. S. De Jong and C. J. F. ter Braak, J. Chemom. 8, 169 (1994). 7. B. S. Dayal and John F. Macgregor, J. Chemom. 11, 73 (1997). 8. S. Wold, M. Sjöström, and L. Eriksson, Chemom. Intell. Lab. Syst. 58, 109 (2001). 9. Q.-S. Xu, Y.-Z. Liang, and H.-L. Shen, J. Chemom. 15, 135 (2001). 10. H. Abdi, Wiley Interdisc. Rev.: Comput. Statist. 2, 97 (2010). 11. E. Vigneau, M. F. Devaux, E. M. Qannari, and P. Robert, J. Chemom. 11, 239 (1997). APPLIED SPECTROSCOPY 407

7 12. F. Despagne, D.-L. Massart, and P. Chabot, Anal. Chem. 72, 1657 (2000). 13. H. H. Thodberg, IEEE Trans. Neural Networks 7, 56 (1996). 14. R. Tibshirani, J. R. Statist. Soc. B 58, 267 (1996). 15. H. Zou and T. Hastie, J. Roy. Statist. Soc., Ser. B 67, 301 (2005). 16. J.-H. Jiang, R. J. Berry, H. W. Siesler, and Y. Ozaki, Anal. Chem. 74, 3555 (2002). 17. T. Chen and E. Martin, Anal. Chim. Acta 631, 13 (2009). 18. H.-D. Li, Y.-Z. Liang, Q.-S. Xu, and D.-S. Cao, Anal. Chim. Acta 648, 77 (2009). 19. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Annal. Statist. 32, 407 (2004). 20. M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J. A. Olson, Jr., J. R. Marks, and J. R. Nevins, Proc. Natl. Acad. Sci. U.S.A. 98, (2001). 21. T. Hastie, R. Tibshirani, M. Eisen, A. Alizadeh, D. Ross, U. Scherf, J. Weinstein, R. Levy, L. Staudt, W. C Chan, D. Botstein, and P. Brown, Genome Biol. 1, 1 (2000). 22. R. Daz-Uriarte, Technical Report, Spanish National Cancer Center (2003). 23. M. Dettling and P. Bühlmann, J. Multivariate Anal. 90, 106 (2004). 24. M. R. Segal, K. D. Dahlquist, and B. R. Conklin, J. Comput. Biol. 10, 961 (2003). 25. J. Fan and R. Li, J. Am. Statist. Assoc. 96, 1348 (2001). 26. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Leraning (Springer-Verlag, NY, 2001). 27. J. Fan and J. Lv, J. Royal Statist. Soc.: Ser. B (Statistical Methodology) 70, 849 (2008) J. H. Kalivas, Chemom. Intell. Lab. Syst. 37, 255 (1997). 30. D. Jouan-Rimbaud, D.-L. Massart, R. Leardi, and O. E. DeNoord, Anal. Chem. 67, 4295 (1995). 31. H. W. Siesler, Y. Ozaki, S. Kawata, and H. M. Heise, Eds., Near Infrared Spectroscopy: Principals, Instruments, Applications (Weinheim, Germany, Wiley-VCH, 2002). APPENDIX The Pseudo-codes of the EN-PLSR Algorithm. Input: Predictor matrix X ¼ (x 1, x 2,..., x p ), response vector y 2 R n31, d ¼ 1, m ¼ 30, RMSEV h ¼ 0. Using elastic net to select variable sequence, their indices denoted by a 1, a 2,..., a p1 by order (p 1, p); G d ¼fx a1 g for i ¼ 2top 1 do if (a i a i 1 ¼ 1 and card(g d ), m) then G d ¼fG d, x ai g; else d ¼ d þ 1, G d ¼fx a1 g; Compute RMSECV 0 of PLSR model by all d groups, d ¼ d þ 1; While RMSECV h, RMSECV 0 do Leave G h out from d groups; d ¼ d 1, RMSECV 0 ¼ RMSECV h ; for j ¼ 1tod do Compute RMSECV j of PLSR model by G 1,..., G j 1, G jþ1,..., G d RMSECV h ¼ arg minfrmsev 1,..., RMSECV d g Output: RMSECV 0, fg 1, G 2,..., G d g. Note, card(g d ) is the number of elements of the set G d. 408 Volume 65, Number 4, 2011

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

ReducedPCR/PLSRmodelsbysubspaceprojections

ReducedPCR/PLSRmodelsbysubspaceprojections ReducedPCR/PLSRmodelsbysubspaceprojections Rolf Ergon Telemark University College P.O.Box 2, N-9 Porsgrunn, Norway e-mail: rolf.ergon@hit.no Published in Chemometrics and Intelligent Laboratory Systems

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

EXTENDING PARTIAL LEAST SQUARES REGRESSION

EXTENDING PARTIAL LEAST SQUARES REGRESSION EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc.

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc. Generalized Least Squares for Calibration Transfer Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc. Manson, WA 1 Outline The calibration transfer problem Instrument differences,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Ridge and Lasso Regression

Ridge and Lasso Regression enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................

More information

Model Updating for Spectral Calibration Maintenance and Transfer Using 1-Norm Variants of Tikhonov Regularization

Model Updating for Spectral Calibration Maintenance and Transfer Using 1-Norm Variants of Tikhonov Regularization Anal. Chem. 2010, 82, 3642 3649 Model Updating for Spectral Calibration Maintenance and Transfer Using 1-Norm Variants of Tikhonov Regularization M. Ross Kunz, John H. Kalivas,*, and Erik Andries Department

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Tuning Parameter Selection in L1 Regularized Logistic Regression

Tuning Parameter Selection in L1 Regularized Logistic Regression Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Lass0: sparse non-convex regression by local search

Lass0: sparse non-convex regression by local search Lass0: sparse non-convex regression by local search William Herlands Maria De-Arteaga Daniel Neill Artur Dubrawski Carnegie Mellon University herlands@cmu.edu, mdeartea@andrew.cmu.edu, neill@cs.cmu.edu,

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Shrinkage regression

Shrinkage regression Shrinkage regression Rolf Sundberg Volume 4, pp 1994 1998 in Encyclopedia of Environmetrics (ISBN 0471 899976) Edited by Abdel H. El-Shaarawi and Walter W. Piegorsch John Wiley & Sons, Ltd, Chichester,

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Some Sparse Methods for High Dimensional Data. Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris

Some Sparse Methods for High Dimensional Data. Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris Some Sparse Methods for High Dimensional Data Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr A joint work with Anne Bernard (QFAB Bioinformatics) Stéphanie

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Dynamic-Inner Partial Least Squares for Dynamic Data Modeling

Dynamic-Inner Partial Least Squares for Dynamic Data Modeling Preprints of the 9th International Symposium on Advanced Control of Chemical Processes The International Federation of Automatic Control MoM.5 Dynamic-Inner Partial Least Squares for Dynamic Data Modeling

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Sparse Ridge Fusion For Linear Regression

Sparse Ridge Fusion For Linear Regression University of Central Florida Electronic Theses and Dissertations Masters Thesis (Open Access) Sparse Ridge Fusion For Linear Regression 2013 Nozad Mahmood University of Central Florida Find similar works

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

PAT: THE EXTRACTION OF MAXIMUM INFORMATION FROM MESSY SPECTRAL DATA. Zeng-Ping Chen and Julian Morris

PAT: THE EXTRACTION OF MAXIMUM INFORMATION FROM MESSY SPECTRAL DATA. Zeng-Ping Chen and Julian Morris 10th International IFAC Symposium on Computer Applications in Biotechnology Preprints Vol.1, une 4-6, 2007, Cancún, Mexico PAT: THE EXTRACTION OF MAXIMUM INFORMATION FROM MESSY SPECTRAL DATA Zeng-Ping

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Lecture 17 Intro to Lasso Regression

Lecture 17 Intro to Lasso Regression Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential

More information

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr With inputs from Anne Bernard, Ph.D student Industrial

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

A Comparison between LARS and LASSO for Initialising the Time-Series Forecasting Auto-Regressive Equations

A Comparison between LARS and LASSO for Initialising the Time-Series Forecasting Auto-Regressive Equations Available online at www.sciencedirect.com Procedia Technology 7 ( 2013 ) 282 288 2013 Iberoamerican Conference on Electronics Engineering and Computer Science A Comparison between LARS and LASSO for Initialising

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

STK4900/ Lecture 5. Program

STK4900/ Lecture 5. Program STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods

Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods Ellen Sasahara Bachelor s Thesis Supervisor: Prof. Dr. Thomas Augustin Department of Statistics

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples Kyung-Min Lee and Timothy J. Herrman Office of the Texas State Chemist, Texas A&M AgriLife Research

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Supplementary material (Additional file 1)

Supplementary material (Additional file 1) Supplementary material (Additional file 1) Contents I. Bias-Variance Decomposition II. Supplementary Figures: Simulation model 2 and real data sets III. Supplementary Figures: Simulation model 1 IV. Molecule

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Design and experiment of NIR wheat quality quick. detection system

Design and experiment of NIR wheat quality quick. detection system Design and experiment of NIR wheat quality quick detection system Lingling Liu 1, 1, Bo Zhao 1, Yinqiao Zhang 1, Xiaochao Zhang 1 1 National Key Laboratory of Soil-Plant-Machine System, Chinese Academy

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Cross model validation and multiple testing in latent variable models

Cross model validation and multiple testing in latent variable models Cross model validation and multiple testing in latent variable models Frank Westad GE Healthcare Oslo, Norway 2nd European User Meeting on Multivariate Analysis Como, June 22, 2006 Outline Introduction

More information