The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
|
|
- Ashley Dean
- 6 years ago
- Views:
Transcription
1 The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010
2 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM
3 Setting y i = x i β + ε i, where ε 1,, ε n are i.i.d. mean 0 and variance { σ 2. } A = j : βj 0 and A = p 0 < p. 1 n X T X C, where C is a positive definite matrix. [ ] C11 C C = 12, where C C 21 C 11 is a p 0 p 0 matrix. 22
4 Definition of Oracle Procedures We call δ an oracle procedure if ˆβ(δ) (asymptotically) has the following oracle properties: { 1. Identifies the right subset model, j : ˆβ } j 0 = A. 2. ( ) n ˆβ(δ)A βa d N (0, Σ ), where Σ is the covariance matrix knowing the true subset model.
5 Definition of LASSO (Tibshirani, 1996) ˆβ (n) y p = arg min x 2 jβ j + λn β j=1 p β j. j=1 λ n varies with n. A n = { j : } (n) ˆβ j 0. LASSO variable selection is consistent iff lim n P (A n = A) = 1.
6 Proposition 1: Proposition 1 If λ n / n λ 0 0, then lim sup n P (A n = A) c < 1, where c is a constant depending on the true model.
7 Theorem 1: Necessary Condition for Consistency of LASSO Theorem 1 Suppose that lim n P (A n = A) = 1. Then there exists some sign vector s = (s 1,, s p0 ) T, s j = 1 or 1, such that C21 C11 1 s 1. (1)
8 Corollary 1: Interesting Case of Corollary 1 Suppose that p 0 = 2m and p = p 0 + 1, so there is one irrelevant predictor. Let C 11 = (1 ρ 1 ) I + ρ 1 J 1, where J 1 is the matrix of 1 s and C 12 = ρ 2 1 and C 22 = 1. If 1 p 0 1 < ρ 1 < 1 p 0 and 1 + (p 0 1) ρ 1 < ρ 2 < (1 + (p 0 1) ρ 1 ) /p 0, then condition (1) cannot be satisfied. Thus LASSO variable selection is inconsistent.
9 Corollary 1: Interesting Case of m=1, p0=3 rho rho1
10 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Definition of ˆβ (n) y p = arg min x 2 jβ j + λn β j=1 p ŵ j β j. j=1 weight vector ŵ = 1/ ˆβ γ (data-dependent) and γ > 0. ˆβ is a root-n-consistent estimator to β, e.g. ˆβ = ˆβ(ols). { } A (n) n = j : ˆβ j 0.
11 e Adaptive Lasso Introduction Definition 1421 (a) (c) Oracle Properties (b) Computations Relationship: Nonnegative Garrote Extensions: GLM Penalty Function of LASSO, SCAD and (d) (c) (e) (d) (f) (e) Figure 1. Plot of Thresholding Functions With (f) λ = 2 for (a) the Hard; (b) Bridge L.5; (c) the Lasso; (d) the SCAD; (e) the Ada and (f) the Adaptive Lasso, γ = Oracle Inequality and Near-Minimax Optimality For the minimax arguments, we consider ple estimation problem discussed by Donoho Presented As shownby bydongjun Donoho and Chung Johnstone (1994), The Adaptive the l 1 shrinkage Lasso and (1994). Its Oracle Suppose Properties that we are Hui given Zou n independ (2006),
12 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Remarks: The data-dependent ŵ is the key for its oracle properties. As n grows, the weights for zero-coefficient predictors get inflated, while the weights for nonzero-coefficient predictors converge to a finite constant. In the view of Fan and Li, 2001 (presented by Yang Zhao), adaptive lasso satisfies three properties of good penalty function: unbiasedness, sparsity, and continuity.
13 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Theorem 2: Oracle Properties of Theorem 2 Suppose that λ n / n 0 and λ n n (γ 1)/2. Then the adaptive LASSO must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A βa d N ( 0, σ 2 C11 1 ).
14 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Computations of estimates can be solved by the LARS algorithm (Efron et al., 2004). The entire solution path can be computed at the same order of computation of a single OLS fit. Tuning: If we use ˆβ(ols), then use 2-dimensional CV to find an optimal pair of (γ, λ n ). Or use 3-dimensional CV to find an optimal triple ( ˆβ, γ, λ). ˆβ(ridge) may be used from the best ridge regression fit when collinearity is a concern.
15 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Definition of Nonnegative Garrote (Breiman, 1995) ˆβ j (garrote) = c j ˆβj (ols), where a set of nonnegative scaling factor {c j } is to minimize subject to c j 0, j. y p j=1 x j ˆβ j (ols) c j 2 + λn p j=1 c j, A sufficiently large λ n shrinks some c j to exact 0, i.e. ˆβ j (garrote) = 0. Yuan and Lin (2007) also studied the consistency of the nonnegative garrote.
16 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Garrote: Formulation and Consistency Formulation y p ˆβ (garrote) = arg min x 2 p jβ j + λn β j=1 j=1 ŵj β j subject to β j ˆβj (ols) 0, j, where γ = 1, ŵ = 1/ ˆβ (ols). Corollary 2: Consistency of Nonnegative Garrote If we choose a λ n such that λ n / n 0 and λ n, then nonnegative garrote is consistent for variable selection.
17 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM for GLM ( ) ( ˆβ (n) (glm) = arg min y i (xi T β + φ xi T β weight vector ŵ = 1/ ˆβ(mle) γ for some γ > 0. )) p β + λ n j=1 ŵj β j. f (y x, θ) = h (y) exp (yθ φ (θ)), where θ = x T β. [ ] The Fisher information matrix I (β I11 I ) = 12, where I 21 I 22 I 11 is a p 0 p 0 matrix. Then I 11 is the Fisher information matrix with the true submodel known.
18 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Theorem 4: Oracle Properties of for GLM Theorem { 4 } Let A (n) n = j : ˆβ j (glm) 0. Suppose that λ n / n 0 and λ n n (γ 1)/2. Then, under some mild regularity conditions, the adaptive LASSO estimate ˆβ (n) (glm) must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A (glm) β A d N ( 0, I11 1 ).
19 Experiments for Setting We let y = x T β + N(0, σ 2 ), where the true regression coefficients are β = (5.6, 5.6, 5.6, 0). The predictors x i (i = 1,, n) are i.i.d. N(0, C), where C is the C matrix in Corollary 1 with ρ 1 =.39 and ρ 2 =.23 (red point).
20 ˆβ (n) )) 1. A n Experiments for s estimates e estimated 001). and n = 80. Introduction Numerical In bothexperiments models we and Discussion simulated 100 training datasets for each combination of (n,σ 2 ). All of the training and tuning were done on the training set. We also collected independent test datasets of 10,000 observations to compute the RPE. To estimate the standard error of the RPE, we generated a bootstrapped sample from the 100 RPEs, then calculated the bootstrapped Table 1. Simulation Model 0: The Probability of Containing the True Model in the Solution Path e to comd the nonrious linear mputed the the LARS so. We imto compute n = 60, σ = 9 n= 120, σ = 5 n= 300, σ = 3 lasso adalasso(γ =.5) adalasso(γ = 1) adalasso(γ = 2) adalasso(γ by cv) NOTE: In this table adalasso is the adaptive lasso, and γ by cv means that γ was selected by five-fold cross-validation from three choices: γ =.5, γ = 1, and γ = 2.
21 General Observations Comparison: LASSO,, SCAD, and nonnegative garrote. p = 8 and p 0 = 3. Consider a few large effects (n = 20, 60) and many small effects (n = 40, 80). LASSO performs best when the SNR is low., SCAD, and and nonnegative garrote outperforms LASSO with a medium or low level of SNR. tends to be more stable than SCAD. LASSO tends to select noise variables more often than other methods.
22 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Theorem 2: Oracle Properties of Theorem 2 Suppose that λ n / n 0 and λ n n (γ 1)/2. Then the adaptive LASSO must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A βa d N ( 0, σ 2 C11 1 ).
23 Proof of Theorem 2: Asymptotic Normality Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Let β = β + u/ n and Ψ n (u) = y p j=1 x j ( βj + u ) n 2 p + λ n n j=1 ŵj β j + u n. n Let û (n) = arg min Ψ n (u); then û (n) = ( ) n ˆβ (n) β. Ψ n (u) Ψ n (0) = V (n) 4 (u), where V (n) 4 (u) = u T ( 1 n X T X ) u 2 εt X n u + λ ( n p n j=1 ŵj n βj + u n ) β n j
24 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Proof of Theorem 2: Asymptotic Normality (conti.) Then, V (n) 4 (u) d V 4 (u) for every u, where { u T V 4 (u) = A C 11 u A 2uA T W A if u j = 0, j / A otherwise and W A = N ( 0, σ 2 ) (n) C 11. V 4 is convex, and the unique minimum of V 4 is ( C11 1 W A, 0 ) T. Following the epi-convergence results of Geyer (1994), we have û (n) A d C11 1 W A and û (n) A C d 0. Hence, we prove the asymptotic normality part.
25 Proof of Theorem 2: Consistency Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM The asymptotic normality result indicates that j A, ˆβ (n) j p βj ; thus P (j A n) 1. Then it suffices to show that j / A, P (j A n) 0. Consider ( the event j A n. By the KKT optimality conditions, 2xj T y X ˆβ ) (n) = λ n ŵ j. λ n ŵ j / n = λ n n (γ 1)/2 / n ˆβ j γ p and 2 xt j (y X ˆβ (n) ) n = 2 xt j X n(β ˆβ (n) ) n + 2 xt j ε n and each of these two terms converges to some normal distribution. Thus P ( j A ) ( n P (2xj T y X ˆβ (n)) ) = λ n ŵ j 0.
26 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Corollary 2: Consistency of Nonnegative Garrote Formulation y p ˆβ (garrote) = arg min x 2 p jβ j + λn β j=1 j=1 ŵj β j subject to β j ˆβj (ols) 0, j, where γ = 1, ŵ = 1/ ˆβ (ols). Corollary 2: Consistency of Nonnegative Garrote If we choose a λ n such that λ n / n 0 and λ n, then nonnegative garrote is consistent for variable selection.
27 Proof of Corollary 2 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM ˆβ (n) Let ˆβ (n) be the adaptive LASSO estimates. By Theorem 2, is an oracle estimator if λ n / n 0 and λ n. To show the consistency, it suffices to show that ˆβ (n) satisfies the sign constraint with probability tending to 1. Pick any j. If j A, then ( ) 2 ˆβ (n) (γ = 1) j ˆβ (ols)j p βj > 0. If j / A, then ( ) ( ) P ˆβ (n) (γ = 1) j ˆβ (ols)j 0 P ˆβ (n) (γ = 1) j = 0 1. In ( ) either case, P ˆβ (n) (γ = 1) j ˆβ (ols)j 0 1 for any j = 1, 2,, p.
28 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Theorem 4: Oracle Properties of for GLM Theorem { 4 } Let A (n) n = j : ˆβ j (glm) 0. Suppose that λ n / n 0 and λ n n (γ 1)/2. Then, under some mild regularity conditions, the adaptive LASSO estimate ˆβ (n) (glm) must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A (glm) β A d N ( 0, σ 2 I11 1 ). f (y x, θ) = h (y) exp (yθ φ (θ)), where θ = x T β.
29 Theorem 4: Regularity Conditions Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM 1. The Fisher information matrix is finite and positive definite, [ ( I (β ) = E φ x T β ) ] xx T. 2. There is a sufficiently large enough open set O that contains β such that β O, ( φ x β) T M (x) < and for all 1 j, k, l p. E [M (x) x j x k x l ] <
30 Proof of Theorem 4: Asymptotic Normality Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Let β = β + u/ n. Define Γ n (u) = n { ( ( i=1 yi x T i β + u/ n )) + φ ( xi T p +λ n β j=1 j + u j / n ( β + u/ n ))} Let û (n) = arg min u Γ n (u); then û (n) = n ( β (n) (glm) β ). Using the Taylor expansion, we have Γ n (u) Γ n (0) = H (n) (u), where H (n) (u) = A (n) 1 + A (n) 2 + A (n) 3 + A (n) 4, with A (n) 1 = n [ i=1 yi φ ( xi T β )] xi T u n, A (n) 2 = n i=1 1 ( 2 φ xi T β ) u T x i xi T n u, A (n) 3 = ŵj ( λn p β n j=1 n j + β un n j ),
31 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Proof of Theorem 4: Asymptotic Normality (conti.) and A (n) 4 = n 3/2 ) n (x i=1 1 6 (x φ i T β T i u ) 3, where β is between β and β + u/ n. Then, by the regularity condition 1 and 2, H (n) (u) d H(u) for every u, where { u T H (u) = A I 11 u A 2uA T W A if u j = 0, j / A otherwise and W A = N (0, I 11 ). H (n) is convex, and the unique minimum of H is ( I11 1 W A, 0 ) T. Following the epi-convergence results of Geyer (1994), we have û (n) A d I11 1 W A and û (n) A C d 0, and the asymptotic normality part is proven.
32 Proof of Theorem 4: Consistency Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM The asymptotic normality result indicates that j A, P (j A n) 1. Then it suffices to show that j / A, P (j A n) 0. Consider the event j A n. By the KKT optimality conditions, n i=1 x ij (y i φ ( )) ˆβ (n) (glm) xi T = λ n ŵ j. n x ij (y ( )) i=1 i φ xi T ˆβ (n) (glm) / n = B (n) 1 + B (n) 2 + B (n) 3 with B (n) 1 = n i=1 x ( ij yi φ ( xi T β )) / n, B (n) 2 = ( 1 n n i=1 x ( ij φ xi T β ) xi T ) ( n β ˆβ ) (n) (glm), ( B (n) 3 = 1 ( )) ( n n i=1 x ij φ xi T β xi T ( n β ˆβ 2 (glm))) (n) / n, where β is between ˆβ (n) (glm) and β.
33 Proof of Theorem 4: Consistency (conti.) Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM B (n) 1 and B (n) 2 converge to some normal distributions and B (n) 3 = O p (1/ n). λ n ŵ j / n = λ n n (γ 1)/2 / n ˆβ j (glm) γ p. Thus P ( j A ) n ( ( n P( x ij i=1 y i φ xi T and this completes the proof. )) ˆβ (n) (glm) = λ n ŵ j ) 0.
Lecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationOn Model Selection Consistency of Lasso
On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationRelaxed Lasso. Nicolai Meinshausen December 14, 2006
Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with
More informationarxiv: v1 [stat.me] 30 Dec 2017
arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationThe Iterated Lasso for High-Dimensional Logistic Regression
The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of
More informationRobust Variable Selection Through MAVE
Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationConsistent Group Identification and Variable Selection in Regression with Correlated Predictors
Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Dhruv B. Sharma, Howard D. Bondell and Hao Helen Zhang Abstract Statistical procedures for variable selection
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationLogistic Regression with the Nonnegative Garrote
Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011
More informationBolasso: Model Consistent Lasso Estimation through the Bootstrap
Francis R. Bach francis.bach@mines.org INRIA - WILLOW Project-Team Laboratoire d Informatique de l Ecole Normale Supérieure (CNRS/ENS/INRIA UMR 8548) 45 rue d Ulm, 75230 Paris, France Abstract We consider
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationSpatial Lasso with Applications to GIS Model Selection
Spatial Lasso with Applications to GIS Model Selection Hsin-Cheng Huang Institute of Statistical Science, Academia Sinica Nan-Jung Hsu National Tsing-Hua University David Theobald Colorado State University
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationRegression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays
Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationCONTRIBUTIONS TO PENALIZED ESTIMATION. Sunyoung Shin
CONTRIBUTIONS TO PENALIZED ESTIMATION Sunyoung Shin A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree
More informationAn Improved 1-norm SVM for Simultaneous Classification and Variable Selection
An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationVARIABLE SELECTION IN ROBUST LINEAR MODELS
The Pennsylvania State University The Graduate School Eberly College of Science VARIABLE SELECTION IN ROBUST LINEAR MODELS A Thesis in Statistics by Bo Kai c 2008 Bo Kai Submitted in Partial Fulfillment
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationForward Regression for Ultra-High Dimensional Variable Screening
Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationLearning with Sparsity Constraints
Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationModel Selection and Estimation in Regression with Grouped Variables 1
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1
More informationLeast Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding
arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called
More informationAsymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/
More informationCONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION INDU SEETHARAMAN
CONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION by INDU SEETHARAMAN B.S., University of Madras, India, 2001 M.B.A., University of Madras, India, 2003 A REPORT submitted
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationBolasso: Model Consistent Lasso Estimation through the Bootstrap
Francis R. Bach FRANCIS.BACH@MINES.ORG INRIA - WILLOW Project-Team, Laboratoire d Informatique de l Ecole Normale Supérieure, Paris, France Abstract We consider the least-square linear regression problem
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationRobust variable selection through MAVE
This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Robust
More informationVariable selection in high-dimensional regression problems
Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationOn the Oracle Property of the Adaptive LASSO in Stationary and Nonstationary Autoregressions. Anders Bredahl Kock. CREATES Research Paper
On the Oracle Property of the Adaptive LASSO in Stationary and Nonstationary Autoregressions Anders Bredahl Kock CREAES Research Paper 0-05 Department of Economics and Business Aarhus University Bartholins
More informationAdaptive Lasso for correlated predictors
Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction
More informationASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS
ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationOn High-Dimensional Cross-Validation
On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationGenomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.
Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationPENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract
PENALIZED PRINCIPAL COMPONENT REGRESSION by Ayanna Byrd (Under the direction of Cheolwoo Park) Abstract When using linear regression problems, an unbiased estimate is produced by the Ordinary Least Squares.
More informationNonnegative Garrote Component Selection in Functional ANOVA Models
Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu
More informationSparse survival regression
Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk
More informationSpatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University
Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under
More informationLASSO-type penalties for covariate selection and forecasting in time series
LASSO-type penalties for covariate selection and forecasting in time series Evandro Konzen 1 Flavio A. Ziegelmann 2 Abstract This paper studies some forms of LASSO-type penalties in time series to reduce
More informationSaharon Rosset 1 and Ji Zhu 2
Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.
More informationTwo Tales of Variable Selection for High Dimensional Regression: Screening and Model Building
Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection
More informationVariable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function
Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2011-03-10 Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function
More informationarxiv: v2 [math.st] 9 Feb 2017
Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction
More informationarxiv: v2 [stat.ml] 22 Feb 2008
arxiv:0710.0508v2 [stat.ml] 22 Feb 2008 Electronic Journal of Statistics Vol. 2 (2008) 103 117 ISSN: 1935-7524 DOI: 10.1214/07-EJS125 Structured variable selection in support vector machines Seongho Wu
More informationSparsity and the Lasso
Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationLeast Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding
Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Kun Yang Trevor Hastie April 0, 202 Abstract Variable selection in linear models plays a pivotal role in modern statistics.
More informationCOMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso
COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationInstitute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR
DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable
More informationTuning Parameter Selection in Regularized Estimations of Large Covariance Matrices
Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia
More informationLUO, BIN, M.A. Robust High-dimensional Data Analysis Using A Weight Shrinkage Rule. (2016) Directed by Dr. Xiaoli Gao. 73 pp.
LUO, BIN, M.A. Robust High-dimensional Data Analysis Using A Weight Shrinkage Rule. (2016) Directed by Dr. Xiaoli Gao. 73 pp. In high-dimensional settings, a penalized least squares approach may lose its
More informationASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS
The Annals of Statistics 2008, Vol. 36, No. 2, 587 613 DOI: 10.1214/009053607000000875 Institute of Mathematical Statistics, 2008 ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION
More informationConsistent Selection of Tuning Parameters via Variable Selection Stability
Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics
More informationLarge Sample Theory For OLS Variable Selection Estimators
Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for
More information