The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Size: px
Start display at page:

Download "The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA"

Transcription

1 The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010

2 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM

3 Setting y i = x i β + ε i, where ε 1,, ε n are i.i.d. mean 0 and variance { σ 2. } A = j : βj 0 and A = p 0 < p. 1 n X T X C, where C is a positive definite matrix. [ ] C11 C C = 12, where C C 21 C 11 is a p 0 p 0 matrix. 22

4 Definition of Oracle Procedures We call δ an oracle procedure if ˆβ(δ) (asymptotically) has the following oracle properties: { 1. Identifies the right subset model, j : ˆβ } j 0 = A. 2. ( ) n ˆβ(δ)A βa d N (0, Σ ), where Σ is the covariance matrix knowing the true subset model.

5 Definition of LASSO (Tibshirani, 1996) ˆβ (n) y p = arg min x 2 jβ j + λn β j=1 p β j. j=1 λ n varies with n. A n = { j : } (n) ˆβ j 0. LASSO variable selection is consistent iff lim n P (A n = A) = 1.

6 Proposition 1: Proposition 1 If λ n / n λ 0 0, then lim sup n P (A n = A) c < 1, where c is a constant depending on the true model.

7 Theorem 1: Necessary Condition for Consistency of LASSO Theorem 1 Suppose that lim n P (A n = A) = 1. Then there exists some sign vector s = (s 1,, s p0 ) T, s j = 1 or 1, such that C21 C11 1 s 1. (1)

8 Corollary 1: Interesting Case of Corollary 1 Suppose that p 0 = 2m and p = p 0 + 1, so there is one irrelevant predictor. Let C 11 = (1 ρ 1 ) I + ρ 1 J 1, where J 1 is the matrix of 1 s and C 12 = ρ 2 1 and C 22 = 1. If 1 p 0 1 < ρ 1 < 1 p 0 and 1 + (p 0 1) ρ 1 < ρ 2 < (1 + (p 0 1) ρ 1 ) /p 0, then condition (1) cannot be satisfied. Thus LASSO variable selection is inconsistent.

9 Corollary 1: Interesting Case of m=1, p0=3 rho rho1

10 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Definition of ˆβ (n) y p = arg min x 2 jβ j + λn β j=1 p ŵ j β j. j=1 weight vector ŵ = 1/ ˆβ γ (data-dependent) and γ > 0. ˆβ is a root-n-consistent estimator to β, e.g. ˆβ = ˆβ(ols). { } A (n) n = j : ˆβ j 0.

11 e Adaptive Lasso Introduction Definition 1421 (a) (c) Oracle Properties (b) Computations Relationship: Nonnegative Garrote Extensions: GLM Penalty Function of LASSO, SCAD and (d) (c) (e) (d) (f) (e) Figure 1. Plot of Thresholding Functions With (f) λ = 2 for (a) the Hard; (b) Bridge L.5; (c) the Lasso; (d) the SCAD; (e) the Ada and (f) the Adaptive Lasso, γ = Oracle Inequality and Near-Minimax Optimality For the minimax arguments, we consider ple estimation problem discussed by Donoho Presented As shownby bydongjun Donoho and Chung Johnstone (1994), The Adaptive the l 1 shrinkage Lasso and (1994). Its Oracle Suppose Properties that we are Hui given Zou n independ (2006),

12 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Remarks: The data-dependent ŵ is the key for its oracle properties. As n grows, the weights for zero-coefficient predictors get inflated, while the weights for nonzero-coefficient predictors converge to a finite constant. In the view of Fan and Li, 2001 (presented by Yang Zhao), adaptive lasso satisfies three properties of good penalty function: unbiasedness, sparsity, and continuity.

13 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Theorem 2: Oracle Properties of Theorem 2 Suppose that λ n / n 0 and λ n n (γ 1)/2. Then the adaptive LASSO must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A βa d N ( 0, σ 2 C11 1 ).

14 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Computations of estimates can be solved by the LARS algorithm (Efron et al., 2004). The entire solution path can be computed at the same order of computation of a single OLS fit. Tuning: If we use ˆβ(ols), then use 2-dimensional CV to find an optimal pair of (γ, λ n ). Or use 3-dimensional CV to find an optimal triple ( ˆβ, γ, λ). ˆβ(ridge) may be used from the best ridge regression fit when collinearity is a concern.

15 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Definition of Nonnegative Garrote (Breiman, 1995) ˆβ j (garrote) = c j ˆβj (ols), where a set of nonnegative scaling factor {c j } is to minimize subject to c j 0, j. y p j=1 x j ˆβ j (ols) c j 2 + λn p j=1 c j, A sufficiently large λ n shrinks some c j to exact 0, i.e. ˆβ j (garrote) = 0. Yuan and Lin (2007) also studied the consistency of the nonnegative garrote.

16 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Garrote: Formulation and Consistency Formulation y p ˆβ (garrote) = arg min x 2 p jβ j + λn β j=1 j=1 ŵj β j subject to β j ˆβj (ols) 0, j, where γ = 1, ŵ = 1/ ˆβ (ols). Corollary 2: Consistency of Nonnegative Garrote If we choose a λ n such that λ n / n 0 and λ n, then nonnegative garrote is consistent for variable selection.

17 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM for GLM ( ) ( ˆβ (n) (glm) = arg min y i (xi T β + φ xi T β weight vector ŵ = 1/ ˆβ(mle) γ for some γ > 0. )) p β + λ n j=1 ŵj β j. f (y x, θ) = h (y) exp (yθ φ (θ)), where θ = x T β. [ ] The Fisher information matrix I (β I11 I ) = 12, where I 21 I 22 I 11 is a p 0 p 0 matrix. Then I 11 is the Fisher information matrix with the true submodel known.

18 Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions: GLM Theorem 4: Oracle Properties of for GLM Theorem { 4 } Let A (n) n = j : ˆβ j (glm) 0. Suppose that λ n / n 0 and λ n n (γ 1)/2. Then, under some mild regularity conditions, the adaptive LASSO estimate ˆβ (n) (glm) must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A (glm) β A d N ( 0, I11 1 ).

19 Experiments for Setting We let y = x T β + N(0, σ 2 ), where the true regression coefficients are β = (5.6, 5.6, 5.6, 0). The predictors x i (i = 1,, n) are i.i.d. N(0, C), where C is the C matrix in Corollary 1 with ρ 1 =.39 and ρ 2 =.23 (red point).

20 ˆβ (n) )) 1. A n Experiments for s estimates e estimated 001). and n = 80. Introduction Numerical In bothexperiments models we and Discussion simulated 100 training datasets for each combination of (n,σ 2 ). All of the training and tuning were done on the training set. We also collected independent test datasets of 10,000 observations to compute the RPE. To estimate the standard error of the RPE, we generated a bootstrapped sample from the 100 RPEs, then calculated the bootstrapped Table 1. Simulation Model 0: The Probability of Containing the True Model in the Solution Path e to comd the nonrious linear mputed the the LARS so. We imto compute n = 60, σ = 9 n= 120, σ = 5 n= 300, σ = 3 lasso adalasso(γ =.5) adalasso(γ = 1) adalasso(γ = 2) adalasso(γ by cv) NOTE: In this table adalasso is the adaptive lasso, and γ by cv means that γ was selected by five-fold cross-validation from three choices: γ =.5, γ = 1, and γ = 2.

21 General Observations Comparison: LASSO,, SCAD, and nonnegative garrote. p = 8 and p 0 = 3. Consider a few large effects (n = 20, 60) and many small effects (n = 40, 80). LASSO performs best when the SNR is low., SCAD, and and nonnegative garrote outperforms LASSO with a medium or low level of SNR. tends to be more stable than SCAD. LASSO tends to select noise variables more often than other methods.

22 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Theorem 2: Oracle Properties of Theorem 2 Suppose that λ n / n 0 and λ n n (γ 1)/2. Then the adaptive LASSO must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A βa d N ( 0, σ 2 C11 1 ).

23 Proof of Theorem 2: Asymptotic Normality Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Let β = β + u/ n and Ψ n (u) = y p j=1 x j ( βj + u ) n 2 p + λ n n j=1 ŵj β j + u n. n Let û (n) = arg min Ψ n (u); then û (n) = ( ) n ˆβ (n) β. Ψ n (u) Ψ n (0) = V (n) 4 (u), where V (n) 4 (u) = u T ( 1 n X T X ) u 2 εt X n u + λ ( n p n j=1 ŵj n βj + u n ) β n j

24 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Proof of Theorem 2: Asymptotic Normality (conti.) Then, V (n) 4 (u) d V 4 (u) for every u, where { u T V 4 (u) = A C 11 u A 2uA T W A if u j = 0, j / A otherwise and W A = N ( 0, σ 2 ) (n) C 11. V 4 is convex, and the unique minimum of V 4 is ( C11 1 W A, 0 ) T. Following the epi-convergence results of Geyer (1994), we have û (n) A d C11 1 W A and û (n) A C d 0. Hence, we prove the asymptotic normality part.

25 Proof of Theorem 2: Consistency Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM The asymptotic normality result indicates that j A, ˆβ (n) j p βj ; thus P (j A n) 1. Then it suffices to show that j / A, P (j A n) 0. Consider ( the event j A n. By the KKT optimality conditions, 2xj T y X ˆβ ) (n) = λ n ŵ j. λ n ŵ j / n = λ n n (γ 1)/2 / n ˆβ j γ p and 2 xt j (y X ˆβ (n) ) n = 2 xt j X n(β ˆβ (n) ) n + 2 xt j ε n and each of these two terms converges to some normal distribution. Thus P ( j A ) ( n P (2xj T y X ˆβ (n)) ) = λ n ŵ j 0.

26 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Corollary 2: Consistency of Nonnegative Garrote Formulation y p ˆβ (garrote) = arg min x 2 p jβ j + λn β j=1 j=1 ŵj β j subject to β j ˆβj (ols) 0, j, where γ = 1, ŵ = 1/ ˆβ (ols). Corollary 2: Consistency of Nonnegative Garrote If we choose a λ n such that λ n / n 0 and λ n, then nonnegative garrote is consistent for variable selection.

27 Proof of Corollary 2 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM ˆβ (n) Let ˆβ (n) be the adaptive LASSO estimates. By Theorem 2, is an oracle estimator if λ n / n 0 and λ n. To show the consistency, it suffices to show that ˆβ (n) satisfies the sign constraint with probability tending to 1. Pick any j. If j A, then ( ) 2 ˆβ (n) (γ = 1) j ˆβ (ols)j p βj > 0. If j / A, then ( ) ( ) P ˆβ (n) (γ = 1) j ˆβ (ols)j 0 P ˆβ (n) (γ = 1) j = 0 1. In ( ) either case, P ˆβ (n) (γ = 1) j ˆβ (ols)j 0 1 for any j = 1, 2,, p.

28 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Theorem 4: Oracle Properties of for GLM Theorem { 4 } Let A (n) n = j : ˆβ j (glm) 0. Suppose that λ n / n 0 and λ n n (γ 1)/2. Then, under some mild regularity conditions, the adaptive LASSO estimate ˆβ (n) (glm) must satisfy the following: 1. Consistency in variable selection: lim n P (A n = A) = Asymptotic normality: ( ) n ˆβ (n) A (glm) β A d N ( 0, σ 2 I11 1 ). f (y x, θ) = h (y) exp (yθ φ (θ)), where θ = x T β.

29 Theorem 4: Regularity Conditions Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM 1. The Fisher information matrix is finite and positive definite, [ ( I (β ) = E φ x T β ) ] xx T. 2. There is a sufficiently large enough open set O that contains β such that β O, ( φ x β) T M (x) < and for all 1 j, k, l p. E [M (x) x j x k x l ] <

30 Proof of Theorem 4: Asymptotic Normality Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Let β = β + u/ n. Define Γ n (u) = n { ( ( i=1 yi x T i β + u/ n )) + φ ( xi T p +λ n β j=1 j + u j / n ( β + u/ n ))} Let û (n) = arg min u Γ n (u); then û (n) = n ( β (n) (glm) β ). Using the Taylor expansion, we have Γ n (u) Γ n (0) = H (n) (u), where H (n) (u) = A (n) 1 + A (n) 2 + A (n) 3 + A (n) 4, with A (n) 1 = n [ i=1 yi φ ( xi T β )] xi T u n, A (n) 2 = n i=1 1 ( 2 φ xi T β ) u T x i xi T n u, A (n) 3 = ŵj ( λn p β n j=1 n j + β un n j ),

31 Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM Proof of Theorem 4: Asymptotic Normality (conti.) and A (n) 4 = n 3/2 ) n (x i=1 1 6 (x φ i T β T i u ) 3, where β is between β and β + u/ n. Then, by the regularity condition 1 and 2, H (n) (u) d H(u) for every u, where { u T H (u) = A I 11 u A 2uA T W A if u j = 0, j / A otherwise and W A = N (0, I 11 ). H (n) is convex, and the unique minimum of H is ( I11 1 W A, 0 ) T. Following the epi-convergence results of Geyer (1994), we have û (n) A d I11 1 W A and û (n) A C d 0, and the asymptotic normality part is proven.

32 Proof of Theorem 4: Consistency Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM The asymptotic normality result indicates that j A, P (j A n) 1. Then it suffices to show that j / A, P (j A n) 0. Consider the event j A n. By the KKT optimality conditions, n i=1 x ij (y i φ ( )) ˆβ (n) (glm) xi T = λ n ŵ j. n x ij (y ( )) i=1 i φ xi T ˆβ (n) (glm) / n = B (n) 1 + B (n) 2 + B (n) 3 with B (n) 1 = n i=1 x ( ij yi φ ( xi T β )) / n, B (n) 2 = ( 1 n n i=1 x ( ij φ xi T β ) xi T ) ( n β ˆβ ) (n) (glm), ( B (n) 3 = 1 ( )) ( n n i=1 x ij φ xi T β xi T ( n β ˆβ 2 (glm))) (n) / n, where β is between ˆβ (n) (glm) and β.

33 Proof of Theorem 4: Consistency (conti.) Theorem 2: Oracle Properties of Corollary 2: Consistency of Nonnegative Garrote Theorem 4: Oracle Properties of for GLM B (n) 1 and B (n) 2 converge to some normal distributions and B (n) 3 = O p (1/ n). λ n ŵ j / n = λ n n (γ 1)/2 / n ˆβ j (glm) γ p. Thus P ( j A ) n ( ( n P( x ij i=1 y i φ xi T and this completes the proof. )) ˆβ (n) (glm) = λ n ŵ j ) 0.

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Relaxed Lasso. Nicolai Meinshausen December 14, 2006 Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Dhruv B. Sharma, Howard D. Bondell and Hao Helen Zhang Abstract Statistical procedures for variable selection

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

Bolasso: Model Consistent Lasso Estimation through the Bootstrap

Bolasso: Model Consistent Lasso Estimation through the Bootstrap Francis R. Bach francis.bach@mines.org INRIA - WILLOW Project-Team Laboratoire d Informatique de l Ecole Normale Supérieure (CNRS/ENS/INRIA UMR 8548) 45 rue d Ulm, 75230 Paris, France Abstract We consider

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Spatial Lasso with Applications to GIS Model Selection

Spatial Lasso with Applications to GIS Model Selection Spatial Lasso with Applications to GIS Model Selection Hsin-Cheng Huang Institute of Statistical Science, Academia Sinica Nan-Jung Hsu National Tsing-Hua University David Theobald Colorado State University

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

CONTRIBUTIONS TO PENALIZED ESTIMATION. Sunyoung Shin

CONTRIBUTIONS TO PENALIZED ESTIMATION. Sunyoung Shin CONTRIBUTIONS TO PENALIZED ESTIMATION Sunyoung Shin A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

VARIABLE SELECTION IN ROBUST LINEAR MODELS

VARIABLE SELECTION IN ROBUST LINEAR MODELS The Pennsylvania State University The Graduate School Eberly College of Science VARIABLE SELECTION IN ROBUST LINEAR MODELS A Thesis in Statistics by Bo Kai c 2008 Bo Kai Submitted in Partial Fulfillment

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Model Selection and Estimation in Regression with Grouped Variables 1

Model Selection and Estimation in Regression with Grouped Variables 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1

More information

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

CONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION INDU SEETHARAMAN

CONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION INDU SEETHARAMAN CONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION by INDU SEETHARAMAN B.S., University of Madras, India, 2001 M.B.A., University of Madras, India, 2003 A REPORT submitted

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Bolasso: Model Consistent Lasso Estimation through the Bootstrap

Bolasso: Model Consistent Lasso Estimation through the Bootstrap Francis R. Bach FRANCIS.BACH@MINES.ORG INRIA - WILLOW Project-Team, Laboratoire d Informatique de l Ecole Normale Supérieure, Paris, France Abstract We consider the least-square linear regression problem

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Robust variable selection through MAVE

Robust variable selection through MAVE This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Robust

More information

Variable selection in high-dimensional regression problems

Variable selection in high-dimensional regression problems Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

On the Oracle Property of the Adaptive LASSO in Stationary and Nonstationary Autoregressions. Anders Bredahl Kock. CREATES Research Paper

On the Oracle Property of the Adaptive LASSO in Stationary and Nonstationary Autoregressions. Anders Bredahl Kock. CREATES Research Paper On the Oracle Property of the Adaptive LASSO in Stationary and Nonstationary Autoregressions Anders Bredahl Kock CREAES Research Paper 0-05 Department of Economics and Business Aarhus University Bartholins

More information

Adaptive Lasso for correlated predictors

Adaptive Lasso for correlated predictors Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs. Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract PENALIZED PRINCIPAL COMPONENT REGRESSION by Ayanna Byrd (Under the direction of Cheolwoo Park) Abstract When using linear regression problems, an unbiased estimate is produced by the Ordinary Least Squares.

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

LASSO-type penalties for covariate selection and forecasting in time series

LASSO-type penalties for covariate selection and forecasting in time series LASSO-type penalties for covariate selection and forecasting in time series Evandro Konzen 1 Flavio A. Ziegelmann 2 Abstract This paper studies some forms of LASSO-type penalties in time series to reduce

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function

Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2011-03-10 Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction

More information

arxiv: v2 [stat.ml] 22 Feb 2008

arxiv: v2 [stat.ml] 22 Feb 2008 arxiv:0710.0508v2 [stat.ml] 22 Feb 2008 Electronic Journal of Statistics Vol. 2 (2008) 103 117 ISSN: 1935-7524 DOI: 10.1214/07-EJS125 Structured variable selection in support vector machines Seongho Wu

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Kun Yang Trevor Hastie April 0, 202 Abstract Variable selection in linear models plays a pivotal role in modern statistics.

More information

COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso

COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

LUO, BIN, M.A. Robust High-dimensional Data Analysis Using A Weight Shrinkage Rule. (2016) Directed by Dr. Xiaoli Gao. 73 pp.

LUO, BIN, M.A. Robust High-dimensional Data Analysis Using A Weight Shrinkage Rule. (2016) Directed by Dr. Xiaoli Gao. 73 pp. LUO, BIN, M.A. Robust High-dimensional Data Analysis Using A Weight Shrinkage Rule. (2016) Directed by Dr. Xiaoli Gao. 73 pp. In high-dimensional settings, a penalized least squares approach may lose its

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS The Annals of Statistics 2008, Vol. 36, No. 2, 587 613 DOI: 10.1214/009053607000000875 Institute of Mathematical Statistics, 2008 ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION

More information

Consistent Selection of Tuning Parameters via Variable Selection Stability

Consistent Selection of Tuning Parameters via Variable Selection Stability Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics

More information

Large Sample Theory For OLS Variable Selection Estimators

Large Sample Theory For OLS Variable Selection Estimators Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for

More information