On High-Dimensional Cross-Validation

Size: px
Start display at page:

Download "On High-Dimensional Cross-Validation"

Transcription

1 On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan 5 WEI-YING WU Department of Applied Mathematics, University of National Dong Hwa, Hualien 97401, Taiwan AND CHING-KANG ING 10 Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan SUMMARY Cross-validation (CV) is one of the most popular methods for model selection. By splitting n 15 data points into a training sample of size n c and a validation sample of size n v with n v /n 1 and n c, Shao (1993) showed that subset selection based on CV is consistent in a regression model of p candidate variables with p << n. However, in the case of p >> n, not only does CV s consistency remain undeveloped, but subset selection is also practically infeasible. In this paper, we fill this gap by using CV as a backward elimination tool for eliminating variables that are 20 included by high-dimensional variable screening methods possessing sure screening property. By choosing an n v such that n v /n converges to 1 at a rate faster than the one in Shao s (1993) paper, we establish the consistency of our selection procedure. We also illustrate the finite-sample performance of the proposed procedure using Monte Carlo simulation. Some key words: 25 Consider the linear model 1. CROSS-VALIDATION y = Xβ + e, where y = (y 1,..., y n ) T is a n 1 response vector, X = (x 1,..., x n ) T, x T t is a p vector of covariates (predictors) for 1 t n, β is the regression coefficient including p unknown paramters, and e = (e 1,..., e n ) T are uncorrelated with the mean zero and variance σ 2 random 30 disturbances. Under our consideration, p >> n and many of parameters in β could be zero that means the sparse condition. Cross-validation (CV) is a famous method to compare various models according to their prediction performance. The data set are usually split into two parts: {(y t, x t ), t s} and 1

2 35 2 WEI-CHENG HSIAO, WEI-YING WU AND CHING-KANG ING {(y t, x t ), t s c }, where s is the testing set of the subset of {1,..., n} with size n v and its complement set s c is the training set with size n c = n n v. For the well-known leave-one-out cross-validation (n v = 1), under the model depending on the subset of {1,..., p} denoted by J α with size d Jα, the respective CV is defined as n ˆΓ Jα,n = n 1 (y s X ˆβ Jα,s Jα,s c)2 s= where X Jα is the n d Jα matrix which consists of the columns of X indexed by i J α, X Jα,s is the n v d Jα matrix containing the rows of X Jα indexed by i s, and ˆβ Jα,s c is the model fitted under the remaining data set after removed sth data set. However, the leave-one-out crossvalidation is not variable selection consistent; see Efron (1986) and Shao (1993). To solve this drawback on the leave-one-out cross-validation, for the fixed p case, Shao (1993) proposed another model selection method through the multifold cross-validation, and its consistency (i.e., selecting all relevant variables and no irrelevant variables, with probability approaching 1) was also verified. A brief review for Shao (1993) is given under balanced incomplete sampling and Monte Carlo sampling. For the balanced incomplete case, let B be a collection of b subsets of elements that have testing set size n v. B is selected according to the following balance conditions: (i) every t {1,..., n} appears in the same number of subsets in B; and (ii) each (t, t ) with 1 t < t n, appears in the same number of subsets in B. All nonempty subsets of {1,..., p} denoted by J α are compared under the average of the squared prediction error over all subsets s in B defined by BICV (n v ), and an appropriate model is chosen by minimizing ˆΓ BICV J α,n = 1 n v b y s ŷ Jα,s c 2 (1) s B where ŷ Jα,s c = X J α,s ˆβ Jα,s c, ˆβ Jα,s c is the least square estimator of β J α using the training data set. But for the practical execution, Shao (1993) further suggested a Monte Carlo sampling, abbreviated as MCCV (n v ). MCCV (n v ) selects a model by minimizing ˆΓ MCCV J α,n = 1 n v b y s ŷ Jα,s c 2, (2) s R where R is a collection of randomly drawing b subsets of 1,..., n that have size n v. However, for the concerned high dimensional problem, the model selection through the above cross-validation seems to be infeasible because of a heavy combinatorial search THE CROSS-VALIDATION IN THE HIGH DIMENSIONAL PROBLEM For the high dimensional problem, many popular variable screening methods, such as Lasso (Tibshirani (1996)), LARS (Efron et al. (2004)), Adaptive Lasso (Zou (2006)), SIS (Fan and Lv (2008)), and OGA (Ing and Lai (2011)), can search relevant variables. However, after screening, not only relevant variables but many irrelevant variables could be involved in the selected submodel. To remove those included irrelevant variables, different model selection criteria could be applied. For example, AIC (Akaike (1974)), BIC (Schwarz (1978)), Extended BIC in (Chen and Chen (2008)), and HDIC in (Ing and Lai (2011)).

3 On High-Dimensional Cross-Validation 3 In this paper, along the screening path, the cross-validation concept inspired from Shao (1993) is employed to search the true model which only contains all relevant variables and the proposed procedure is detailedly introduced under the following steps: Step 1 (Screening). By applying screen method on the full model {1,..., p}, the reduced sub- 70 model is attained, and denoted by Ĵm n = {ĵ 1,..., ĵ mn }, where m n = #(Ĵm n ). Step 2 (Trim). Backward stepwise elimination is applied to exclude the irrelevant variables in the truncated submodel by comparing the respective ˆΓ MCCV under the same n v and b conditions as Step 2. The selected submodel after trimming is defined by ˆN n = {ĵ l : MCCV MCCV ˆΓ n {ĵ (Ĵˆkn l }) > ˆΓ n ), 1 l ˆk (Ĵˆkn n }. For the Step 1, many well-known screening methods can be considered. For example, Lasso 75 (Tibshirani (1996)), LARS (Efron et al. (2004)), Adaptive Lasso (Zou (2006)), SIS (Fan and Lv (2008)), and OGA (Ing and Lai (2011)). Step 2 utilizes the cross-validation to trim off those involved irrelevant variables, and the procedure of Step 1 and 2 is abbreviated as HDCV. For the real application, different screening methods M 1, M 2,..., M s may be considered in the HDCV procedure, and their repesctive submodels are ˆN n,m1,..., ˆN n,ms. Such submodel 80 information are further integrated in the following Step 3 by the comparison of their correspoding MCCV and then an appropriate model is suggested. Step 3 (Combination). The selected final model after competing s trimmed models, ˆN n,m1,..., ˆN n,ms where M respects a specified screening method at Step 1, is defined by N f = arg min α { ˆNn,M1,..., ˆN n,ms }ˆΓ MCCV (defined by (2)). Ĵ α,n Through the procedure HDCV, the heavy combinatorial searching problem can be avoided. In 85 addition, when the sure screening property (i.e., including all relevant variables with probability approaching 1) in Step 1 is promised, under some regular conditions, the consistency of the variable selection can further be confirmed The High Dimension Balanced Incomplete CV(n v ) method Similar as Shao (1993), the variable selection consistency of the suggested procedure is veri- 90 fied first under the balanced sampling and then random sampling. Assume that m n is the number of the selected variables after the sure screening and the covariance matrix R J = E(XJ T X J). Also assume (C1) E ε q < and max 1 j pn E x tj q < for some large q, (C2) p n = O(n s 1 ) for some s 1 > 0, 95 (C3) m n = O(n s 2 ) = o(n 1/2 ξ c ) for some 0 < s 2 < 1/3, any ξ > 0 (C4) n c = O(n s 3 ), with 0 < s 3 < 1 s 2, (C5) max 1 #(J) mn R 1 J 1 = O(1). Note that (C5) implies max 1 #(J) mn R 1 J 2 = O(1), and (C6) #(B) = O(n s 4 ) for some s 4 > In (C1), only qth moment bound assumptions on the measurement error and covariate variables are required. Although the exponentially increase of p n as the sample size (n) is not suggested in (C2), this assumption still works in practice. (C3) is a restriction for the size of surviving covariate variables after the screen procedure, and (C4) is used for the train sample size. (C5) means that the minimum eigenvalue of the covariance matrix of covariate variables must be 105 bounded. (C6) is a condition for the replication size of the cross-validation procedure.

4 WEI-CHENG HSIAO, WEI-YING WU AND CHING-KANG ING The following theorem shows that the proposed model selection procedure, surely screening + HDCV, is variable selection consistent when (C1)-(C6) holds. Assume that m n variables are selected from the screening method, which is denoted by Ĵm n and J α0 is the model including all relevant variables and no irrelevant variables. THEOREM 1. Suppose that (C1)-(C6) hold. For any submodel Jˆα Ĵm n with #(Jˆα ) = dˆα. (a) If J α0 Jˆα, then ˆΓ BICV ˆα,n = n 1 e T e + n 1 c dˆα σ 2 + o p (n 1 c ); (b) If J α0 is not contained in Jˆα, then there exists some positive constant φ such that ˆΓ BICV ˆα,n n 1 e T e + ˆα,n + o p (1), where P ( ˆα,n > φ) 1 as n High Dimension Balanced Incomplete CV(n v ) method As mentioned previously, the assumption of the balance collection is sometimes hard to be satisfied. The collection of subsets attained by randomly drawing is more feasible in practice, and is denoted by R. In the following theorem, for the random collection, the consistency of the proposed variable selection procedure will be presented under conditions (C1)-(C6), and an additional condition for the repetition size b similar with Theorem 2 of Shao (1993). THEOREM 2. Suppose the conditions of theorem 1 hold and R is a collection of b randomly selected subsets of {1, 2,..., n} that have size n v. Assume b satisfies b 1 n 2 c n 2+ξ m 2 n 0, for an arbitrary small ξ > 0. Assume Jˆα Ĵm n with #(Jˆα ) = dˆα. (a) If J α0 Jˆα then, ˆΓ MCCV ˆα,n = 1 e T s e s + n 1 c dˆα σ 2 + o p (n 1 c ). n v b s R (b) If J α0 is not contained in Jˆα, then there exists some positive constant φ such that ˆΓ MCCV ˆα,n 1 e T s e s + ˆα,n + o p (1), n v b where P ( ˆα,n > φ) 1. s R SIMULATION STUDIES In this section, four simulation examples are applied to examine the performance of the HDCV based on various screening methods - OGA, ISIS-SCAD, LARS and the adaptive Lasso with three k-fold sizes: 5, 10 and 15, which are abbreviateed by OGA, ISIS-SCAD, LARS, and Adaptive-Lasso (k), respectively. To implement ISIS-SCAD, LARS, and adative Lasso, we use the SIS (Fan et al. 2010), glmnet (Friedman, Hastie, and Tibshirani, 2010), LARS (Hastie and Efron, 2007), parcor (Kraemer and Schaefer, 2010) packages in R. Because OGA and LARS are selected stepwise, the high-dimensional information criterion in Ing and Lai (2011) is employed to improve the screening performance, and the penalty w n in our simulation studies is chosen by log(n), which is corresponding to HDBIC in Ing and Lai (2011). For the Step 2 of HDCV (Trim), in the the cross-validation setting, the training set n c = n 0.6, and the randomly drawing replcations b = n 1.5, where n is the sample size which guarantee the conditions (C3) and (C6). Each result is based on 1000 data sets.

5 On High-Dimensional Cross-Validation 5 Table 1. Frequency of including all 3 relevant variables (correct), of selecting exactly (E), of selecting all relevant variables and i irreleant variables (E+i) with E if i > 5. ρ Method E E+1 E+2 E+3 E+4 E+5 E Correct MSPE 0.5 OGA OGA+Trim OGA+HDBIC+Trim ISIS-SCAD ISIS-SCAD+Trim LARS LARS+Trim LARS+HDBIC+Trim Adaptive Lasso (5) Adaptive Lasso (5)+Trim Adaptive Lasso (10) Adaptive Lasso (10)+Trim Adaptive Lasso (15) Adaptive Lasso (15)+Trim OGA OGA+Trim OGA+HDBIC+Trim ISIS-SCAD ISIS-SCAD+Trim LARS LARS+Trim LARS+HDBIC+Trim Adaptive Lasso (5) Adaptive Lasso (5)+Trim Adaptive Lasso (10) Adaptive Lasso (10)+Trim Adaptive Lasso (15) Adaptive Lasso (15)+Trim Define the mean squared prediction errors MSPE = ( 1000 l=1 p j=1 β j x (l) n+1,j ŷ(l) n+1 )2, in which x (l) n+1,1,, x(l) n+1,p are the regressors associated with y(l) n+1, the new outcome in the lth 140 simulation run, and ŷ (l) n+1 denotes the prediction of y(l) n+1. The first example is showed in Fan and Lv (2008) and the second and the third examples are found in Ing and Lai (2011). In the final example, we generate the data using covariates of a real dataset from a semiconductor manufacturing company and location and dispersion model. Example 1. The following model is employed, 145 q p y t = β j x tj + β j x tj + ε t, t = 1,..., n, (3) j=1 j=q+1 where β q+1 = = β p = 0, and ε t are i.i.d. N(0, σ 2 ) independent with the x tj. We consider that p = 1000, q = 3, n = 100 and β 1 = β 2 = β 3 = 5 with the standard Gaussian noise. The sample of (X 1,..., X p ) with size n was drawn from a multivariate normal distribution N(0, Σ)

6 6 WEI-CHENG HSIAO, WEI-YING WU AND CHING-KANG ING Table 2. Frequency of including all 5 relevant variables (correct), of selecting exactly (E), of selecting all relevant variables and i irreleant variables (E+i) with E if i > 5. η n p Method E E+1 E+2 E+3 E+4 E+5 E Correct MSPE OGA OGA+HDBIC+Trim Adaptive Lasso (5) Adaptive Lasso (5)+Trim Adaptive Lasso (10) Adaptive Lasso (10)+Trim Adaptive Lasso (15) Adaptive Lasso (15)+Trim N f (n c) N f (0.8n c) OGA OGA+HDBIC+Trim Adaptive Lasso (5) Adaptive Lasso (5)+Trim Adaptive Lasso (10) Adaptive Lasso (10)+Trim Adaptive Lasso (15) Adaptive Lasso (15)+Trim N f (n c) N f (0.8n c) whose covariance matrix Σ = (σ ij ) p p has entries σ ii = 1 and σ ij = ρ, i j, and two different models are defined by ρ = 0.5, 0.9. As shown in Table 1, not only irrelevant variables but also relevant ones are removed after the addition of Trim in ISIS-SCAD and LARS. OGA+Trim can not delete any variables, and OGA+HDBIC+Trim can cancel all irrelevant variables and Adaptive Lasso (k) + Trim can keep all relevant variables in all simulations. In addition, the MSPE of OGA+HDBIC+Trim or Adaptive Lasso (k)+trim is close to the oracle value of qσ 2 /n = In the following simulation works, we only focus on OGA+HDBIC+Trim and Adaptive Lasso (k)+trim because of their superior performance. Example 2. In this example, we consider the covariates x tj = d tj + ηw t, 1 j p in which η 0 and (d t1,..., d tp, w t ), 1 t n, are i.i.d. normal with mean (1,..., 1, 0) T and the identity covariance matrix I. Moreover, Corr(x tj, x tk ) = η 2 /(1 + η 2 ) increases with η > 0. We choose q = 5, (β 1,..., β 5 ) = (3, 3.5, 4, 2.8, 3.2), and assume that σ = 1 and (3) holds. The cases η = 0, 2, and (n, p) = (50, 1000) and (100, 2000) are considered here to accommodate a much larger number of candidate variables and allow substantial correlations among them. To combine the information from OGA and Adaptive Lasso, Step 3 (Combination) is further applied on the submodels after trimming and the finally chosen model with minimum MCCV is denoted by N f (n c ), where n c is the training size used in Step 2. Since the remaining irreleant variables and relevant variables in trimmed model may be highly correlated, we also consider a smaller training size (0.8n c ) in Step 3, and the model finally selected is the abbreviateed by N f (0.8n c ).

7 On High-Dimensional Cross-Validation 7 Table 3. Frequency of including all 10 relevant variables and selecting all relevant variables (E+i) with E if i > 5. Method E E+1 E+2 E+3 E+4 E+5 E Correct MSPE OGA OGA+HDBIC+Trim Adaptive Lasso (5) Adaptive Lasso (5)+Trim Adaptive Lasso (10) Adaptive Lasso (10)+Trim Adaptive Lasso (15) Adaptive Lasso (15)+Trim N f (n c) N f (0.8n c) It is shown in Table 2 that while the sample size is 50, after applying Step 3, the frequency 170 of identifying the parsimonious model increases from 863 in the Adaptive Lasso (15) to 955 for N f (0.8n c ) (or 894 for N f (n c )). When the sample size is 100, the N f (0.8n c ) s frequency of identifying the parsimonious model is similar to the performance of the OGA+HDBIC+Trim or Adaptive Lasso (k)+trim. Example 3. In this example, we set σ = 1 and x t1,..., x tq are i.i.d. standard normal, and 175 x tj = d tj + b x q x tl, for q + 1 j p, l=1 where b x = {3/(4q)} 0.5 and (d tq+1,..., d tp ) T are independent and identically multivariate normal distribution with mean 0 and covariance matrix (1/4)I and independent of x tj for 1 j q. We set (n, p) = (400, 4000) with randomly generated covariate, and let q = 10, (β 1,..., β 10 ) = (3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75). This example is considered in Ing and Lai (2011) to illustrates an inherent difficulty for analyz- 180 ing high-dimensional data when irrelevant variables have substantial correlations with relevant ones. As mentioned by Ing and Lai (2011), under this simulation settings, Adaptive Lasso lose the sure screening property which relies on an initial estimate based on Lasso may actually perform worse, and such phenomena could also be observed in Table 3. Also, Ing and Lai (2011) show that the first iteration of OGA selects an irrelevant variable 185 which remains in the OGA path until the last iteration. OGA+HDBIC+Trim is able to choose all relevant variables without including any irrelevant variables in all 1,000 simulations, and its MSPE is close to the oracle value of qσ 2 /n = Similarly, N f (n c ) or N f (0.8n c ) also chooses the smallest correct model in all simulations. Example 4. The covariates we consider is chosen from a real dataset provided by United Mi- 190 croelectronics Corporation: A semiconductor manufacturing company in Taiwan. The original real data are collected through over 900 equipment and 300 Defective Rates. 405 covariates are used after deleting replicated variables and extreme variables, and the response values are generated from the following location and dispersion model, 1/ y t = β 0 + β j x tj + α 0 + α j x tj + ε t, t = 1,..., 300. j=1 j=1

8 8 WEI-CHENG HSIAO, WEI-YING WU AND CHING-KANG ING Table 4. Frequency of including all 2 relevant variables (correct) of the location model, of selecting exactly (E), of selecting all relevant variables and i irreleant variables (E+i) with E if i > 5. Model Method E E+1 E+2 E+3 E+4 E+5 E Correct Location OGA OGA+HDBIC+Trim Adaptive Lasso Adaptive Lasso5+Trim Adaptive Lasso Adaptive Lasso10+Trim Adaptive Lasso Adaptive Lasso15+Trim N f (0.8n c) Table 5. Frequency of including unique relevant variables (correct) of the dispersion model, of selecting exactly (E), of selecting all relevant variables and i irreleant variables (E+i) with E if i > 5. Residuals are based on N f (0.8n c ) at the first stage. Model Method E E+1 E+2 E+3 E+4 E+5 E Correct Dispersion OGA OGA+HDBIC+Trim Adaptive Lasso (5) Adaptive Lasso (5)+Trim Adaptive Lasso (10) Adaptive Lasso (10)+Trim Adaptive Lasso (15) Adaptive Lasso (15)+Trim N f (0.8n c) We set (β 0, β 4, β 159, α 0, α 160 ) = (1, 2.3, 4.5, 3, 2.5) and the others are zero, and ε t s are i.i.d. standard normal distribution independent with the covariate x tj. Two-stage model selection procedure is implemented in this location and dispersion model. We select relevant variables of the location model at the first stage and calculate fitted values, ŷ t, Ĵ, t = 1,..., 300 where Ĵ is the set of the selected variables. The responses at the second stage ] come from the log-transformation of square residuals, i.e., log [(y t ŷ t, Ĵ )2, t = 1,..., 300. The relevant variables of the dispersion model are further searched on these transformated data. For two-stage model selection procedure, N f (0.8n c ) is applied in the combination step. In 1000 simulations of Table 4, OGA+HDBIC+Trim in identifying the sparse location model is better than Adaptive Lasso (k)+trim. N f (0.8n c ) has 93.5% percentage of selecting parsimonious location model. In Table 5, the percentage that N f (0.8n c ) correctly catches all relevant variables is 89.9% (About 80% for OGA+HDBIC+Trim and Adaptive Lasso (10)+Trim). Combining Table 4 and Table 5, for N f (0.8n c ), the percentages of identifying the parsimonious location and dispersion model is more than 84% ( = ). Therefore, our recommend procedure can be flexibly applied to select the exact correct location and dispersion model.

9 On High-Dimensional Cross-Validation 9 REFERENCES 210 AKAIKE, H. (1994). A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control 19, CHEN, J. & CHEN, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95, EFRON, B. (1986). How Biased is the Apparent Error Rate of a Prediction Rule? Journal of the American Statistical 215 Association 81, EFRON, B., HASTIE, T., JOHNSTONE, I. & TIBSHIRANI, R. (2004). Least angle regression (with discussion). The Annals of Statistics 32, FAN. J. & LV, J. (2008). Sure independence screening for ultra-high dimension feature space (with discussion). Journal of the Royal Statistical Society: Series B 70, FRIEDMAN, J., HASTIE, T. & TIBSHIRANI, R. (2010). glmnet: Lasso and elastic-net regularized generalized linear models. R package version HASTIE, T. & EFRON, B. (2007). lars: Least Angle Regression, Lasso and Forward Stagewise. R package version KRAEMER, N. & SCHAEFER, J. (2010). parcor: Regularized estimation of partial correlation matrices. R package 225 version ING, C. K. & LAI, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica 21, SCHWARZ, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, SHAO, J. (1993). Linear model selection by cross-validation. Journal of the American Statistical Association 88, TIBSHIRANI, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B 58, ZOU, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 101,

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Cross Validation, Monte Carlo Cross Validation, and Accumulated Prediction Errors: Asymptotic Properties

Cross Validation, Monte Carlo Cross Validation, and Accumulated Prediction Errors: Asymptotic Properties Cross Validation, Monte Carlo Cross Validation, and Accumulated Prediction Errors: Asymptotic Properties Ching-Kang Ing Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan Outline 1

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Tuning Parameter Selection in L1 Regularized Logistic Regression

Tuning Parameter Selection in L1 Regularized Logistic Regression Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection Daniel Coutinho Pedro Souza (Orientador) Marcelo Medeiros (Co-orientador) November 30, 2017 Daniel Martins Coutinho

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Large Sample Theory For OLS Variable Selection Estimators

Large Sample Theory For OLS Variable Selection Estimators Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models

Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models arxiv:1309.2068v1 [stat.me] 9 Sep 2013 Yi Yu and Yang Feng Abstract In this paper, for Lasso penalized linear regression

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Performance of Cross Validation in Tree-Based Models

Performance of Cross Validation in Tree-Based Models Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Conjugate direction boosting

Conjugate direction boosting Conjugate direction boosting June 21, 2005 Revised Version Abstract Boosting in the context of linear regression become more attractive with the invention of least angle regression (LARS), where the connection

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Consistent Model Selection Criteria on High Dimensions

Consistent Model Selection Criteria on High Dimensions Journal of Machine Learning Research 13 (2012) 1037-1057 Submitted 6/11; Revised 1/12; Published 4/12 Consistent Model Selection Criteria on High Dimensions Yongdai Kim Department of Statistics Seoul National

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Risk estimation for high-dimensional lasso regression

Risk estimation for high-dimensional lasso regression Risk estimation for high-dimensional lasso regression arxiv:161522v1 [stat.me] 4 Feb 216 Darren Homrighausen Department of Statistics Colorado State University darrenho@stat.colostate.edu Daniel J. McDonald

More information

Spatial Lasso with Applications to GIS Model Selection

Spatial Lasso with Applications to GIS Model Selection Spatial Lasso with Applications to GIS Model Selection Hsin-Cheng Huang Institute of Statistical Science, Academia Sinica Nan-Jung Hsu National Tsing-Hua University David Theobald Colorado State University

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Greedy and Relaxed Approximations to Model Selection: A simulation study

Greedy and Relaxed Approximations to Model Selection: A simulation study Greedy and Relaxed Approximations to Model Selection: A simulation study Guilherme V. Rocha and Bin Yu April 6, 2008 Abstract The Minimum Description Length (MDL) principle is an important tool for retrieving

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-11-291R2 Title The Stepwise Response Refinement Screener (SRRS) Manuscript ID SS-11-291R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.2011.291 Complete

More information

A simulation study of model fitting to high dimensional data using penalized logistic regression

A simulation study of model fitting to high dimensional data using penalized logistic regression A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Accepted author version posted online: 12 Aug 2015.

Accepted author version posted online: 12 Aug 2015. This article was downloaded by: [Institute of Software] On: 13 August 2015, At: 12:15 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

LASSO-type penalties for covariate selection and forecasting in time series

LASSO-type penalties for covariate selection and forecasting in time series LASSO-type penalties for covariate selection and forecasting in time series Evandro Konzen 1 Flavio A. Ziegelmann 2 Abstract This paper studies some forms of LASSO-type penalties in time series to reduce

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Robust variable selection through MAVE

Robust variable selection through MAVE This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Robust

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information