Variable selection via generalized SELO-penalized linear regression models

Size: px
Start display at page:

Download "Variable selection via generalized SELO-penalized linear regression models"

Transcription

1 Appl. Math. J. Chinese Univ. 2018, 33(2): Variable selection via generalized SELO-penalized linear regression models SHI Yue-yong 1,3 CAO Yong-xiu 2 YU Ji-chang 2 JIAO Yu-ling 2, Abstract. The seamless-l 0 (SELO) penalty is a smooth function on [0, ) that very closely resembles the L 0 penalty, which has been demonstrated theoretically and practically to be effective in nonconvex penalization for variable selection. In this paper, we first generalize SELO to a class of penalties retaining good features of SELO, and then propose variable selection and estimation in linear models using the proposed generalized SELO (GSELO) penalized least squares (PLS) approach. We show that the GSELO-PLS procedure possesses the oracle property and consistently selects the true model under some regularity conditions in the presence of a diverging number of variables. The entire path of GSELO-PLS estimates can be efficiently computed through a smoothing quasi-newton (SQN) method. A modified BIC coupled with a continuation strategy is developed to select the optimal tuning parameter. Simulation studies and analysis of a clinical data are carried out to evaluate the finite sample performance of the proposed method. In addition, numerical experiments involving simulation studies and analysis of a microarray data are also conducted for GSELO-PLS in the high-dimensional settings. Consider the linear regression model 1 Introduction y = Xβ + ϵ, (1) where y = (y 1, y 2,, y n ) T R n is a response vector, X = (x ij ) R n d is a design matrix, β = (β1, β2,, βd )T R d is a vector of underlying regression coefficients, and ϵ = (ϵ 1, ϵ 2,, ϵ n ) T R n is a vector of random errors. We assume without loss of generality that y is centered and the columns of X are centered and n-normalized, i.e., n i=1 y i = 0, n i=1 x ij = 0 and n 1 n i=1 x2 ij = 1. We also assume that β is sparse in the sense that only Received: Revised: MR Subject Classification: 62F12, 62J05, 62J07. Keywords: continuation, coordinate descent, BIC, LLA, oracle property, SELO, smoothing quasi-newton. Digital Object Identifier(DOI): Supported by the National Natural Science Foundation of China ( , , , ) and the Fundamental Research Funds for the Central Universities (CUGW150809). Correspondence author.

2 146 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 a relatively small portion of the components of β are nonzero, and our goal is to reconstruct the unknown vector β. Let A = {j; βj 0} be the true model and suppose that s = A is the size of the true model (i.e., the sparsity level of β ), where A denotes the cardinality of A. To achieve sparsity in linear models, the penalization (or regularization) method, which optimizes a loss function term plus a penalty function term, has been widely used in the literature (cf., e.g., [27, 8, 10, 31-32, 29]). In this paper, we consider the following so-called SELO-penalized least squares (PLS) problem: { ˆβ := ˆβ(λ, τ) = arg min Q n (β) = 1 β R p 2n y Xβ 2 + d j=1 } p λ,τ (β j ), (2) ( ) where denotes the L 2 norm on the Euclidean space and p λ,τ (β j ) = λ log(2) log βj β j +τ + 1 is the SELO penalty proposed by Dicker et al. [7]. λ and τ are two positive tuning (or regularization) parameters. In particular, λ is the sparsity tuning parameter obtaining sparse solutions and τ is the shape (or concavity) tuning parameter making SELO L 0 (τ 0+), where L 0 admits p λ (β j ) = λi( β j = 0). ˆβ = ˆβ(λ, τ) in (2) is called a SELO-PLS (SPLS) estimator. L 0 regularization [9] directly penalizes the number of variables in the regression models, so it enjoys a nice interpretation of the best subset selection, but it is not continuous at 0, and is computationally infeasible when d is moderately large. SELO is a good surrogate for L 0 since it can explicitly mimic L 0 via small τ values, and is more stable than L 0 due to the continuity of its penalty function. Figure 1 depicts SELO penalties for a few τ s while fixing λ = 1. Figure 1: Plot of SELO penalty functions. τ = 0 (L 0, thick solid), τ = 0.1 (dotdash), τ = 1 (dashed), τ = 10 (dotted), and τ = (thin solid). Since the introduction of the SELO for the linear models (LM) [7], the methodology has been extended to generalized linear models (GLM) [16], multivariate panel count data [30]

3 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 147 and quantile regression [6] among others. Under LM, Dicker et al [7] show that SELO-LM estimators enjoy the oracle property [8] when both d and n tend to infinity with d/n 0, and outperform other penalized estimators by various metrics in numerical simulations. They propose a SELO-LM-BIC procedure to select the tuning parameters and showed it is consistent for model selection. Under GLM, Li et al [16] show that the SELO-GLM procedure enjoys the oracle property when both d and n tend to infinity with d 5 /n 0. They also establish the model selection consistency results via a SELO-GLM-BIC procedure. It is noteworthy that both SELO-LM and SELO-GLM estimators can be efficiently calculated by coordinate descent (CD) algorithms. Zhang et al [30] develop the SELO penalized estimating equation approach to conduct the regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects, where the dimensionality d is assumed to be fixed. They use a BIC procedure to select tuning parameters and apply the classical Newton-Raphson algorithm for numerical experiments. Ciuperca [6] introduces and studies the SELO quantile estimator in a linear model when both d and n tend to infinity with d/n 0, and derives the convergence rate, oracle properties and BIC model selection consistency result of corresponding estimators. In this paper, we propose to use a generalized SELO (GSELO) penalized method to make variable selection and parameter estimation in linear models. First, we generalize the SELO penalty to a class of penalties (i.e., GSELO penalties) closely resembling the L 0 penalty and retaining good features of SELO. Second, based on the proposed GSELO penalties, we develop the GSELO-PLS procedure for linear models on variable selection and parameter estimation. We give consistency and asymptotically normality properties for GSELO-PLS and show it performs as well as an oracle estimator when both d and n tend to infinity with d/n 0. Third, we implement a smoothing quasi-newton (SQN) method with backtracking line search technique, which has superlinear convergence rate and is insensitive to choices of initial values, and it can avoid calculating the sequence of the inverse of the Hessian matrix compared with the modified Newton-Raphson algorithm [8], to calculate the proposed GSELO-PLS estimates. In particular, we couple our algorithm with a continuation strategy on the regularization parameter, i.e., given a decreasing sequence of parameter {λ g } g, we apply the algorithm to solve the λ g+1 -problem with the initial guess from the λ g -problem. The idea of continuation is well established for the iterative algorithms with the purpose of warm starting and globalizing the convergence. We adopt a modified BIC (MBIC) to select a suitable tuning parameter during the continuation process. Finally, we conduct numerical experiments to evaluate the performance of GSELO- PLS in high dimensions. To deal with the high dimensional issue, We first employ a local linear approximation (LLA) [33] to the nonconvex GSELO penalties and then resort to a existing Gauss-Seidel type coordinate descent algorithm in [2] to obtain the solution path. Numerically, when coupled with the continuation strategy and a high-dimensional BIC (HBIC), the overall GSELO-PLS-HBIC procedure for high-dimensional data is very efficient. The remainder of this paper is organized as follows. In Section 2, we first describe our proposed GSELO method and then establish asymptotic theoretical results of the GSELO-PLS

4 148 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 procedure. In Section 3, we present the algorithm for computing the GSELO-PLS estimator, the standard error formulae for the estimated coefficients and a modified BIC coupled with a continuation strategy to select the optimal tuning parameter. Simulation studies are conducted in Section 4 to evaluate the finite sample performance of the proposed method, which is further illustrated with a real clinical data. In Section 5, we numerically study the behaviors of the proposed GSELO-PLS estimators in high dimensions, including the computational issues, the choice of the tuning parameter, simulation studies and analysis of a microarray data. We conclude the paper with Section 6. Proofs of the theorems are provided in the Appendix. 2 Generalized SELO-penalized linear regression models 2.1 Methodology Let P denote all GSELO penalties. f is an arbitrary function that satisfies the following two hypotheses: (H1) f(x) is a continuous function w.r.t x, which has the first and second derivative in [0, 1]; (H2) f (x) 0 on the interval [0, 1] and lim x 0 f(x) x = 1. Then a GSELO penalty p λ,τ ( ) P is given as p λ,τ (β j ) = λ f(1) f( β j β j + τ ), where λ (sparsity) and τ (concavity) are two positive tunning parameters. Remark 1. (H1) is needed to guarantee the continuity of penalty functions and (H2) is used to make the penalties in P resemble the L 0. Obviously, it easily follows that SELO is a member of P as long as we take f(x) = log(x+1). Table 1 lists some representatives of P and Figure 2 displays them with τ = 1 and 0.01 respectively. Table 1: Representatives of GSELO Name Types of functions f(x) p λ,τ (β j ) LIN linear x λ βj β j +τ SELO logarithmic log(x + 1) λ log(2) log( β j β j +τ + 1) EXP exponential 1 exp( x) λ 1 exp( 1) [1 exp( β j SIN trigonometric sin(x) λ sin(1) sin( βj β j +τ ) β j +τ )] ATN inverse trigonometric arctan(x) λ arctan(1) arctan( β j β j +τ )

5 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 149 Remark 2. It is noteworthy that LIN is actually the transformed L 1 penalty studied by Nikolova [21], which enlightens Lv and Fan [18] on proposing the SICA approach for sparse recovery and model selection. p λ,τ(βj) λ =1 τ =1 L 0 LIN SELO EXP SIN ATN p λ,τ(βj) λ =1 τ =0.01 L 0 LIN SELO EXP SIN ATN β j β j Figure 2: Left panel: λ = 1, τ = 1; Right panel: λ = 1, τ = L 0 (thick solid), L 1 (solid), LIN (dashed), SELO (dotted), EXP (dotdash), SIN (longdash) and ATN (twodash). Based on the proposed GSELO penalty, corresponding GSELO-PLS estimator can be given as { ˆβ := ˆβ(λ, τ) = arg min Q n (β) = 1 β R 2n y Xβ 2 + d where p λ,τ ( ) P. d j=1 } p λ,τ (β j ), (3) 2.2 Theoretical results We establish theoretical results of the GSELO-PLS estimator based on the following regularity conditions. (C1) n and dσ 2 /n 0. (C2) There exist positive constants r, R R such that r < γ min (n 1 X T X) < γ max (n 1 X T X) < R, where γ min (n 1 X T X) and γ max (n 1 X T X) are the smallest and largest eigenvalues of n 1 X T X respectively. (C3) τ = O( σ 2 /(dn)) and λτ[n/(dσ 2 )] 3/2. (C4) ρ n/(dσ 2 ), λ/ρ 2 0, where ρ = min j A β j.

6 150 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 (C5) lim max d n 1 i n j=1 x2 ij = 0. (C6) E( ϵ i /σ 2+δ ) < M for some δ > 0 and M <. Remark 3. Conditions (C1)-(C6) coincide with the conditions in [7]. Please see more details therein. Theorem 1 (Existence of GSELO-PLS estimator). Under hypotheses (H1)-(H2) and conditions (C1)-(C6), then, with probability tending to one, there exists a local minimizer ˆβ of Q n (β), defined in (3), such that ˆβ β = O p ( dσ 2 /n), where denotes the Euclidean norm of a vector. Theorem 2 (Oracle property). Under hypotheses (H1)-(H2) and conditions (C1)-(C6), then, with probability tending to 1, the n/(dσ 2 )-consistent local minimizer ˆβ in Theorem 1 must be such that (i) lim P ({j; ˆβj 0} = A) = 1. n (ii) nb n (n 1 X T A X A/σ 2 ) 1/2 ( ˆβ A β A ) N(0, G) in distribution, where B n is an arbitrary q A matrix such that B n B T n G. To save space, we only state the main results here and relegate the proofs to the Appendix. Interested readers can refer to [10,7] for more details. 3.1 Algorithm 3 Computation Dicker et al. [7] use a coordinate descent (CD) algorithm procedure, which amounts to finding the roots of certain cubic equations, for obtaining SELO estimates. However, among the GSELO penalties taken into consideration in this paper (i.e., LIN, SELO, EXP, SIN and ATN in Table 1), only LIN and SELO can be implemented using the CD algorithm in [7]. To illustrate this point, we consider the one-dimensional PLS problem { ˆβ = arg min Q(β) = 1 } β R 2 (β β 0) 2 + p λ,τ (β), (4) where β 0 R is a constant and p λ,τ (β) is a penalty in Table 1. CD procedures ask for finding the nonzero stationary points (or critical points) of the objective function Q(β). Direct computation of Q (β) = 0 gives (LIN) β β 0 + λ τ sgn(β) ( β + τ) 2 = 0, (SELO) β β 0 + λ τ sgn(β) log(2) (2 β + τ)( β + τ) = 0 or (SIN) β β 0 + λ ( ) β τ sgn(β) sin(1) cos β + τ ( β + τ) 2 = 0.

7 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 151 It follows that LIN and SELO can be transformed into cubic equations, while SIN can t, neither can EXP and ATN share the same spirit as SIN. Thus, for the sake of the uniformity of computation, we use the smoothing quasi-newton (SQN) method [22,19,24] to optimize Q n (β) in (3). Since GSELO penalty functions are singular at the origin, we first smooth the penalty functions by replacing β j with βj 2 + ε, where ε is a small positive quantity. It follows that βj 2 + ε β j when ε 0. Then, we solve { ˆβ = ˆβ(λ, τ, ε) = arg min Q ε n(β) = 1 β R 2n y Xβ 2 + d d j=1 } p λ,τ,ε (β j ) instead of (3) by using the DFP quasi-newton with backtracking linear search algorithm, where p λ,τ,ε (β j ) = p λ,τ ( βj 2 + ε). In practice, taking ε = 0.01 gives good results. We summarize the SQN-DFP procedure in Algorithm 1. More theoretical results about smoothing methods for nonsmooth and noconvex minimization can be found in [4,5]. Remark 4. Like the local quadratic approximation (LQA) algorithm in Fan and Li [8], the sequence β k obtained from SQN-DFP may not be sparse for any fixed k and hence is not directly suitable for variable selection. In practice, we set β k j = 0 if βk j < ε 0 for some sufficiently small tolerance level ε 0. (5) Algorithm 1 SQN-DFP Input: initial values β 0 R d and H 0 = I d (d d identity matrix); linear search parameters ρ (0, 1) and η (0, 1 2 ); stop tolerance δ. 1: for k = 0, 1, 2,, k max do 2: compute g k = Q ε n(β k ), 3: if g k δ, then 4: stop, output β k as the estimate of β in (5), 5: else 6: compute direction d k = H k g k. 7: end if 8: for m = 0, 1, 2,, m max do 9: compute βm k = β k + ρ m d k, 10: if Q ε n(βm) k Q ε n(β k ) + ηρ m gk T d k, then 11: stop, output α k = ρ m. 12: end if 13: end for 14: compute β k+1 = β k + α k d k, g k+1 = Q ε n(β k+1 ), g k = g k+1 g k, β k = β k+1 β k. 15: if ( β k ) T g k 0, then 16: H k+1 = H k ; 17: else 18: H k+1 = H k H k g k g T k H k gk T H k g k + βk ( β k ) T ( β k ) T g k. 19: end if 20: end for Output: ˆβ, the estimate of β in equation (5).

8 152 Appl. Math. J. Chinese Univ. Vol. 33, No Covariance estimation Following [7], we estimate the covariance matrix (i.e., standard errors) for ˆβ by using a sandwich formula ĉov( ˆβÂ) = ˆσ 2 {X Ṱ A X  + n ε Â,Â( ˆβ)} 1 X Ṱ A X  {XṰ A X  + n ε Â,Â( ˆβ)} 1, (6) where ˆσ 2 = (n ŝ) 1 y X ˆβ 2, ŝ = Â,  = {j; ˆβ j 0} and ε (β) = diag{p λ,τ,ε(β 1 )/ β 1,, p λ,τ,ε(β d )/ β d }. (7) For variables with ˆβ j = 0, the estimated standard errors are Tuning parameter selection As suggested in [7], we fix τ = 0.01 and use a modified BIC (MBIC) procedure to tune λ via { ˆλ = arg min MBIC( ˆβ) = log(ˆσ 2 ) + k } n λ n ŝ, (8) where ˆβ = ˆβ(λ, τ), ˆσ 2 = (n ŝ) 1 y X ˆβ 2, ŝ =  and k n is a positive number that depends on the sample size n and satisfies k n log(n). In our numerical experiments, we set k n = log(n). Since solving (5) is a nonconvex optimization problem, we coupe SQN-DFP with a continuation strategy on the tuning parameter for efficient computation. To be precise, one needs a starting value λ 0 for the parameter λ and a decreasing factor µ (0, 1) to obtain a decreasing sequence {λ g } g, where λ g = λ 0 µ g, and then run Algorithm 1 to solve the λ g+1 -problem initialized with the solution of λ g -problem. Summarizing the idea leads to Algorithm 2. See [14] and the references therein for more details. In practice, we use λ 0 = λ max, where λ max is an initial guess of λ, supposedly large, that shrinks all β j s to zero, and set λ min = 1e 5λ max, then divide the interval [λ min, λ max ] into G (the number of grid points) equally distributed subintervals in the logarithmic scale. Numerically, µ is determined by G. Clearly, a large G value implies a large decreasing factor µ. For sufficient resolution of the solution path, G usually takes G 50 (e.g., G = 100 or 200). Implementing Algorithm 1 for each value of τ and the sequence λ max = λ 0 > λ 1 > > λ G = λ min to be considered gives the entire GSELO-PLS solution path. Then we select the optimal λ from the candidate set Λ = {λ 1, λ 2,, λ G } using MBIC (8). Remark 5. Dicker et al. [7] show that the SELO-PLS-MBIC procedure consistently identifies the true model with diverging number of parameters under some regularity conditions. It can be proved that the GSELO-PLS-MBIC procedure is also consistent with model selection by using similar arguments used in the proof of Theorem 2 of Dicker et al. [7], and thus the detailed proof is omitted here. Remark 6. In Algorithm 1, we set (ρ, η, m max ) = (0.55, 0.4, 20) following [19]. Due to the continuation strategy, the maximum number of outer iterations k max is not necessary to be large in practice, so we set k max = 50 in Algorithm 1. This procedure makes it possible

9 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 153 to substantially reduce the computational cost without noticeable loss of the accuracy of the solutions. Algorithm 2 Continuation strategy Input: Given λ 0 and µ (0, 1). Let β(λ 0 ) = 0. 1: for g = 1, 2, 3,, G do 2: Apply Algorithm 1 to problem (5) with λ g = λ 0 µ g, initialized with β 0 = β(λ g 1 ). 3: Compute MBIC values. 4: end for Output: Select λ by (8). 4.1 Simulation studies 4 Numerical experiments We present simulation studies to examine the finite sample properties of GSELO-PLS-MBIC. All codes, available from the authors, are written in Matlab and all experiments are performed in MATLAB R2010b on a quad-core laptop with an Intel Core i5 CPU (2.60 GHz) and 8 GB RAM running Windows 8.1 (64 bit). We simulate N = 1000 data sets from the linear model (1). β is a d 1 vector with β1 = 3, β2 = 1.5, β3 = 2 and the other βj s being 0. Thus, s = 3. The rows of the n d matrix X are sampled as i.i.d. copies from N(0, Σ) with Σ = (0.5 j k ) for 1 j, k d. The components of the n 1 vector ϵ are sampled from N(0, 1). In order to emphasize the dependency of the number of parameters on the sample size, we consider two sample sizes: n = 100 and n = 200 with d = n/(2 log n), where x denotes the integer part of x for x 0. To evaluate the variable selection performance of the proposed method, we consider the average model size N 1 N s=1 Â(s) (MS), the proportion of correct models N 1 N s=1 I{Â(s) = A} (CM), the average l absolute error N 1 N s=1 ˆβ (s) β (AE), the average l 2 relative error N 1 N s=1 ( ˆβ (s) β 2 / β 2 ) (RE) and the average model error N 1 N s=1 ( ˆβ (s) β ) T Σ( ˆβ (s) β ) (ME). Simulation results for variable selection are summarized in Table 2. Since LIN, SELO, EXP, SIN and ATN all belong to the GSELO penalty family, it can be seen from Table 2 that five penalties behave quite similar to each other in all considered criteria. With respect to MS, although all methods tend to slightly overestimate the true model, they can select the true model quite well with reasonably small errors in terms of CM, AE, RE and ME. The results given in Table 3 are obtained under the same situation as in Table 2 but for the estimation of the regression parameter β. With respect to parameter estimation, Table 3 presents the average of estimated nonzero coefficients (Mean), the average of estimated standard error (ESE) and the sample standard deviations (SSD). From Table 3, we can see that Means are close to corresponding true values, and ESEs agree well with SSDs, which indicates that the proposed covariance estimation formula is reasonable and reliable.

10 154 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 Table 2: Simulation results for variable selection. d = n/(2 log n). (n, d) method MS CM AE RE ME (100,10) LIN % SELO % EXP % SIN % ATN % (200,18) LIN % SELO % EXP % SIN % ATN % Table 3: Simulation results for parameter estimation. d = n/(2 log n). β 1 = 3 β 2 = 1.5 β 3 = 2 (n, d) method Mean ESE SSD Mean ESE SSD Mean ESE SSD (100,10) LIN SELO EXP SIN ATN (200,18) LIN SELO EXP SIN ATN Analysis of clinical data We illustrate GSELO-PLS-MBIC by an analysis of a prostate cancer data set from [26]. This data set examines the correlation between the level of prostate specific antigen and a number of clinical measures in 97 men with prostate cancer who were about to receive a radical prostatectomy. It has been analyzed by many texts on data mining (cf., e.g., [32,13]) and can be publicly available from the R package ElemStatLearn [13]. There are 97 observations (n = 97) and 9 variables (one quantitative response and d = 8 predictors) in the prostate cancer data set. The goal is to predict the response lpsa (log of prostate specific antigen) from predictors including lcavol (log cancer volume), lweight (log prostate weight), age, lbph (log of benign prostatic hyperplasia amount), svi (seminal vesicle invasion), lcp (log of capsular penetration), gleason (Gleason score) and pgg45 (percent of Gleason scores 4 or 5). There are lots of different model fitting and tuning parameter selection procedures that are being carried out on the prostate cancer data, which makes it challenging to choose which one to use as the underlying true model is generally unknown in real data analyses. However, as previously stated, the minimizer of L 0 procedure (i.e., best subset selection) is the optimal solution and, if available, can be used as a gold standard for the evaluation of other approaches.

11 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 155 Hereafter, we regard the best subset model with the lowest BIC as the true model in order to assess the approaches. Five GSELO procedures (i.e., LIN, SELO, EXP, SIN and ATN) proposed in the previous sections are applied to the prostate cancer data. Additionally, the LS (computed by R built-in function lm) and LASSO (computed by R function cv.glmnet with the lambda.1se rule and set.seed=0 from R package glmnet [11]) solutions are also provided for comparison purposes. The estimated regression parameters and the predictive mean squared errors (PMSE) calculated by n 1 n i=1 (ŷ i y i ) 2 are provided in Table 4. One can see that five GSELO penalties behave similarly, and they all select lcavol and lweight. In particular, SIN and ATN can recover exactly the best subset selection result, which shows the good performance of the proposed GSELO-PLS-MBIC procedure. Table 4: Analysis of the prostate cancer data set. Estimated coefficients of different methods applied to the prostate data. The zero entries correspond to variables omitted. Term LS Best Subset LASSO LIN SELO EXP SIN ATN Intercept lcavol lweight age lbph svi lcp gleason pgg PMSE High dimensional case In this section, we discuss how GSELO-PLS can be applied to high dimensional data in which d > n. For solving (3) in high dimensions, we first employ the local linear approximation (LLA) [33] to p λ,τ ( ) P: p λ,τ (β j ) p λ,τ (β k j ) + p λ,τ (β k j )( β j β k j ), (9) where βj k are the kth estimates of β j, j = 1, 2,, d, and p λ,τ (β j) means the derivative of p λ,τ (β j ) with respect to β j. Given β k of β, we find the next estimate via where ω k+1 j β k+1 = arg min{ 1 β 2n y Xβ d j=1 ω k+1 j β j }, (10) = p λ,τ (βk j ). Then we use a Gauss-Seidel type coordinate descent (CD) algorithm in [2] for solving (10). We summarize the LLA-CD procedure in Algorithm 3. For LLA-CD, we also couple it with the continuation strategy on the regularization parameter, in order to obtain accurate solutions.

12 156 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 Algorithm 3 LLA-CD Input: X R n d, y R n, β 0 R d, τ, λ, δ (tolerance) and k max (the maximum number of iterations). 1: for k = 0, 1, 2, do 2: while k < k max do 3: for j = 1, 2,, d do 4: Calculate z j = n 1 x T j r j = n 1 x T j r + βj k, where r = y Xβ k, r j = y X jβ j, k j is introduced to refer to the portion that remains after the jth column or element is removed, and r j is the partial residuals of x j. 5: Update β k+1 j S(z j, ω k+1 j ), where ω k+1 j = p λ,τ (βj k ) and S(t, λ) = sgn(t)( t λ) + is the soft-thresholding operator. 6: Update r r (β k+1 j βj k )x j. 7: end for 8: if β k+1 β k < δ then 9: break, ˆβ = β k+1. 10: else 11: Update k k : end if 13: end while 14: end for Output: ˆβ, the estimate of β in equation (10). We use the sandwich formula (6) to estimate the covariance matrix for the LLA-CD estimates ˆβ by replacing ε (β) in (7) with (β) = diag{p λ,τ ( β 1 )/ β 1,, p λ,τ ( β d )/ β d }. Since d is larger than n, MBIC in (8) breaks down in the tuning of λ. Thus we adopt a highdimensional BIC (HBIC) proposed by Wang et al [28] to select the optimal tuning parameter ˆλ during the continuation process, which reads ˆλ = arg min{hbic(λ) = log( y X ˆβ(λ) 2 /n) + C n log(d) M(λ) }, (11) λ Λ n where Λ is a subset of (0, + ), M(λ) = {j : ˆβj (λ) 0} and M(λ) denotes the cardinality of M(λ), and C n = log(log n). 5.1 Simulation studies in high dimensions We illustrate the finite sample properties of GSELO-PLS-HBIC in high dimensions with simulation studies. The implementation setting is the same as in Section 4.1 but for two sample sizes n = 100 and n = 200 with d = n log(n)/2. The results for variable selection and parameter estimation are reported in Table 5 and Table 6, respectively. It can be seen from the tables that five GSELO penalties still perform reasonably well in terms of both variable selection and parameter estimation when d is larger than n. Since the sparsity level of β is fixed in our simulations (i.e., s = 3), the better performance appears to be associated with larger sample sizes.

13 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 157 Table 5: Simulation results for variable selection. d = n log(n)/2. (n, d) method MS CM AE RE ME (100,230) LIN % SELO % EXP % SIN % ATN % (200,529) LIN % SELO % EXP % SIN % ATN % Table 6: Simulation results for parameter estimation. d = n log(n)/2. β 1 = 3 β 2 = 1.5 β 3 = 2 (n, d) method Mean ESE SSD Mean ESE SSD Mean ESE SSD (100,230) LIN SELO EXP SIN ATN (200,529) LIN SELO EXP SIN ATN Analysis of microarray data We analyze the eyedata set which is publicly available in R package flare [15] to illustrate the application of GSELO-PLS-HBIC in high-dimensional settings. This data set is a gene expression data from the microarray experiments of mammalian eye tissue samples of [23]. The response variable y is a numeric vector of length 120 giving expression level of gene TRIM32 which causes Bardet-Biedl syndrome (BBS). The design matrix X is a matrix which represents the data of 120 rats with 200 gene probes. We want to find the gene probes that are most related to TRIM32 in sparse high-dimensional regression models. For this dataset, we consider ncvreg [2] (10-fold cv.ncvreg with seed=0) as the gold standard for comparison purposes. Table 7 lists the results of GSELO (LIN, SELO, EXP, SIN and ATN) and ncvreg. From Table 7, six sets identify 5, 3, 6, 4, 3 and 4 probes respectively and have 3 in common. Notably, for those common probes, although the magnitudes of estimates are not equal, they have the same signs, which suggests similar biological conclusions. In addition, they have similar PMSEs which implies they can give results of comparable accuracy.

14 158 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 Table 7: Analysis of the eyedata set. Estimated coefficients of different methods are provided. The zero entries correspond to variables omitted. No. Term Probe ncvreg LIN SELO EXP SIN ATN Intercept β β β β β β β β β β PMSE Concluding remarks In this paper, we propose the GSELO-PLS procedure for linear models on variable selection and parameter estimation issues. We generalize the SELO to the GSELO and thus put the popular SELO penalization method within a more general framework. We impose the asymptotic properties of the proposed GSELO-PLS estimator in a setting where the dimension of covariates growing with the sample size. The consistency and the oracle property of proposed estimators are proved under some regularity conditions. When coupled with a continuation strategy and a modified BIC tuning parameter selector, our overall proposed procedure is very efficient and accurate. In addition, when d is larger than n, we use a LLA-CD algorithm and a high-dimensional BIC, combined with a continuation strategy on the regularization parameter, to compute the GSELO solution paths in high dimensions. The results of simulation studies and real data examples demonstrate the effectiveness of our proposed approach. As a natural extension of the SELO, the proposed GSELO method automatically inherits all merits of SELO, and can be directly used to acquire existing results of those SELO-based literatures, i.e., linear models [7], generalized linear models [16], multivariate panel count data [30] and quantile regression [6]. By the connection between SICA and GSELO, heuristically, it is attractive for us to consider using GSELO to do variable selection for other realms in future, such as Cox models [24,25] and additive hazards models [17]. Moreover, in regression problems, variables can often be thought of as grouped. According to the group exponential LASSO in [1] for bi-level variable selection, it would be interesting to extend the GSELO results for structured sparsity penalized models, which we also leave for future research. Appendix We follow steps similar to the proofs of Dicker et al. [7]. Hereafter, we use p(β j ) other than p λ,τ (β j ) to denote the penalty in GSELO for the sake of simplicity in notation.

15 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 159 Proof of Theorem 1. Let α n = dσ 2 /n. It is sufficient to show that, for any given ε > 0, there exists a large constant C such that P { inf Q n(β + Cα n u) Q n (β )} 1 ε. (12) u =1 Define D n (u) = Q n (β + Cα n u) Q n (β ). We have D n (u) 1 2n C2 αn Xu n Cα nϵ T Xu + [p(βj + Cα n u j ) p(βj )] = I 1 + I 2 + I 3, j K(u) where K(u) = {j : p(βj + Cα nu j ) p(βj ) < 0}. By (C2), I 1 = 1 2n C2 αn Xu 2 2 γ min(n 1 X T X) 2 C 2 α 2 n = O p (C 2 α 2 n), (13) I 2 = 1 n Cα nϵ T Xu Cα n n ϵt X n Cα n n σ 2 γ max (n 1 X T X) = O p (Cα 2 n). (14) (C4) implies ρ/α n. This and the fact p( ) is concave on [0, ) imply that p(β j + Cα n u j ) p(β j ) Cα n u j p (β j + Cα n u j ), when n is sufficiently large. It follows that I 3 Cα n u j p (βj + Cα n u j ) (Here, p (t) means the derivative with respect to t.) j K(u) = j K(u) j K(u) λ Cα n u j f(1) f βj ( + Cα nu j βj + Cα nu j + τ ) τ ( βj + Cα nu j + τ) 2 τ Cα n u j O(1)λ (ρ + τ) 2 j K(u) = O(Cα n ) λ ρ 2 (τ d) = O(Cα n )o(1)o(α n ) = o(cα 2 n) Cα n u j O(1)λ τ ρ 2 Cα no(1) λτ ρ 2 d u by (C3)-(C4) and (H1)-(H2). From (13), (14) and (15), if C > 0 is large enough, I 2 and I 3 are dominated by I 1, which is positive. This proves (12). (15) Proof of Theorem 2 (i). β R d with β β Cα n, where C is any positive constant. For ε n Cα n > 0, it suffices to show, for all j A c, we have Q n (β) > 0, β j for 0 < β j < ε n, (16) Q n (β) < 0, β j for ε n < β j < 0, (17) with probability tending to one as n. By some algebras, Q n (β) = 1 β j n x j(y Xβ) + p ( β j ) sgn(β j ) = I 1 + I 2. Note that E( XT ϵ n ) = 0, then 2 ) = tr{e( XT ϵ ϵ T X σ2 X )} = n n n tr(xt n by (C2). It follows that 1 n XT (y Xβ) = O p ( E( XT ϵ n ) = σ2 O(d) = O(dσ2 n n ) dσ 2 n ) = O p(α n ), and further I 1 = o p (α n ). On

16 160 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 the other hand, p ( β j )/α n = 1 f(1) f β j ( β j + τ ) λτ/α n ( β j + τ) 2. Since β j Cα n with j A c, and (C3) implies α n /τ and λτ/α n ( β j + τ) 2 λτ/α n λτ = O( (Cα n + τ) 2 αn 3 ), we have p ( β j )/α n together with (H1) and (H2). Thus, the sign of Q n (β) = α n {o p (1) + p ( β j )/α n sgn(β j )} β j is completely determined by the sign of β j when n is large, and they always have same signs. Hence, (16) and (17) follow. Proof of Theorem 2 (ii). By (i) of Theorem 2, {j; ˆβj 0} = A. As the local minima of Q n (β), ˆβ must satisfy Q n (β) β= = 0, β A ˆβ i.e., 1 n XT A (y X ˆβ) + p A ( ˆβ) = 0, which implies ˆβ A βa = (X T AX A ) 1 X T Aϵ (n 1 X T AX A ) 1 p A( ˆβ). It follows that nbn (n 1 X T AX A /σ 2 ) 1/2 ( ˆβ A β A) = B n (σ 2 X T AX A ) 1/2 X T Aϵ nb n (σ 2 X T AX A ) 1/2 p A( ˆβ) = I 1 I 2 and I 2 = n/σ 2 B n (X T AX A /n) 1/2 p A( ˆβ). (18) By (C2), we have B n (X T AX A /n) 1/2 p A( ˆβ) = O p ( p A( ˆβ) ). (19) On the other hand, j A = {j; ˆβ j 0}, ( p ( ˆβ j ) = λ f(1) f ˆβ ) j τ sgn( ˆβ j ) ˆβ j + τ ( ˆβ j + τ), 2 and it follows p ( ˆβ) = O p ( λτ ρ ) according to (H1)-(H2) and (C3)-(C4). It is noteworthy that 2 v d v holds for all v R d, and then we have p A ( ˆβ) p ( ˆβ) = O p ( d λτ ρ ). Then, 2 (18), (19) and (C3)-(C4) imply I 2 = n/σ 2 O p ( d λτ ρ 2 ) = O p( nd/σ 2 λτ ρ 2 ) = O p( λ ρ 2 ) = o p(1). Thus, to complete proof (ii), it suffices to show d I 1 N(0, G), (20) according to the Slutsky s theorem. Denote I 1 = n w i,n, where w i,n = B n (σ 2 X T A X A) 1/2 x i,a ϵ i. Fix δ 0 > 0 and let η i,n = x T i,a (XT A X A) 1/2 Bn T B n (X T A X A) 1/2 x i,a. Then, using similar procedures in [7], we can show n E( w i,n 2 ; w i,n 2 > δ 0 ) 0. i=1 i=1

17 SHI Yue-yong et al. Variable selection via generalized SELO-penalized linear regression models 161 by (C5) and the fact n η i,n = tr(bn T B n ) tr(g) <. Thus, the Lindeberg condition is i=1 satisfied and (20) holds. References [1] P Breheny. The group exponential lasso for bi-level variable selection, Biometrics, 2015, 71(3): [2] P Breheny, J Huang. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, Ann Appl Stat, 2011, 5(1): [3] E J Candes, M B Wakin, S P Boyd. Enhancing sparsity by reweighted l1 minimization, J Fourier Anal Appl, 2008, 14(5): [4] X Chen. Superlinear convergence of smoothing quasi-newton methods for nonsmooth equations, J Comput Appl Math, 1997, 80(1): [5] X Chen. Smoothing methods for nonsmooth, nonconvex minimization, Math Program, 2012, 134(1): [6] G Ciuperca. Model selection in high-dimensional quantile regression with seamless L 0 penalty, Statist Probab Lett, 2015, 107: [7] L Dicker, B Huang, X Lin. Variable selection and estimation with the seamless-l0 penalty, Statist Sinica, 2013, 23: [8] J Fan, R Li. Variable selection via nonconcave penalized likelihood and its oracle properties, J Amer Statist Assoc, 2001, 96(456): [9] J Fan, J Lv. A selective overview of variable selection in high dimensional feature space, Statist Sinica, 2010, 20(1): [10] J Fan, H Peng. Nonconcave penalized likelihood with a diverging number of parameters, Ann Statist, 2004, 32(3): [11] J Friedman, T Hastie, R Tibshirani. Regularization paths for generalized linear models via coordinate descent, J Stat Softw, 2010, 33(1): [12] C Gao, N Wang, Q Yu, Z Zhang. A feasible nonconvex relaxation approach to feature selection, In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, [13] T Hastie, R Tibshirani, J Friedman. The Elements of Statistical Learning, Springer, Berlin, [14] Y Jiao, B Jin, X Lu. A primal dual active set with continuation algorithm for the l 0 -regularized optimization problem, Appl Comput Harmon Anal, 2015, 39: [15] X Li, T Zhao, X Yuan, H Liu. The flare package for high dimensional linear regression and precision matrix estimation in R, J Mach Learn Res, 2015, 16: [16] Z Li, S Wang, X Lin. Variable selection and estimation in generalized linear models with the seamless L0 penalty, Canad J Statist, 2012, 40(4): [17] W Lin, J Lv. High-dimensional sparse additive hazards regression, J Amer Statist Assoc, 2013, 108(501):

18 162 Appl. Math. J. Chinese Univ. Vol. 33, No. 2 [18] J Lv, Y Fan. A unified approach to model selection and sparse recovery using regularized least squares, Ann Statist, 2009, 37(6A): [19] C F Ma. Optimization method and its Matlab program design, Science Press, Beijing, [20] R Mazumder, J Friedman, T Hastie. SparseNet: Coordinate descent with nonconvex penalties, J Amer Statist Assoc, 2011, 106(495): [21] M Nikolova. Local strong homogeneity of a regularized estimator, SIAM J Appl Math, 2000, 61(2): [22] J Nocedal, S Wright. Numerical optimization, 2nd ed, Springer, New York, [23] T Scheetz, K Kim, R Swiderski, A Philp, T Braun, K Knudtson, A Dorrance, G DiBona, J Huang, T Casavant, V Sheffield, E Stone. Regulation of gene expression in the mammalian eye and its relevance to eye disease, Proc Natl Acad Sci USA, 2006, 103(39): [24] Y Y Shi, Y X Cao, Y L Jiao, Y Y Liu. SICA for Cox s proportional hazards model with a diverging number of parameters, Acta Math Appl Sin Engl Ser, 2014, 30(4): [25] Y Y Shi, Y L Jiao, L Yan, Y X Cao. A modified BIC tuning parameter selector for SICA-penalized Cox regression models with diverging dimensionality, J Math, 2017, 37(4): [26] T A Stamey, J N Kabalin, J E McNeal, I M Johnstone, F Freiha, E A Redwine, N Yang. Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients, J Urol, 1989, 141(5): [27] R Tibshirani. Regression shrinkage and selection via the lasso, J R Stat Soc Ser B Stat Methodol, 1996, 58(1): [28] L Wang, Y Kim, R Li. Calibrating nonconvex penalized regression in ultra-high dimension, Ann Statist, 2013, 41(5), [29] C H Zhang. Nearly unbiased variable selection under minimax concave penalty, Ann Statist, 2010, 38(2): [30] H Zhang, J Sun, D Wang. Variable selection and estimation for multivariate panel count data via the seamless-l0 penalty, Canad J Statist, 2013, 41(2): [31] H Zou. The adaptive lasso and its oracle properties, J Amer Statist Assoc, 2006, 101(476): [32] H Zou, T Hastie. Regularization and variable selection via the elastic net, J R Stat Soc Ser B Stat Methodol, 2005, 67(2): [33] H Zou, R Li. One-step sparse estimates in nonconcave penalized likelihood models, Ann Statist, 2008, 36(4): School of Economics and Management, China University of Geosciences, Wuhan , China. 2 School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan , China. 3 Center for Resources and Environmental Economic Research, China University of Geosciences, Wuhan , China yulingjiaomath@whu.edu.cn

HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY

HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY Vol. 38 ( 2018 No. 6 J. of Math. (PRC HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY SHI Yue-yong 1,3, CAO Yong-xiu 2, YU Ji-chang 2, JIAO Yu-ling 2 (1.School of Economics and Management,

More information

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Statistica Sinica 23 (2013), 929-962 doi:http://dx.doi.org/10.5705/ss.2011.074 VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Lee Dicker, Baosheng Huang and Xihong Lin Rutgers University,

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Consistent Selection of Tuning Parameters via Variable Selection Stability

Consistent Selection of Tuning Parameters via Variable Selection Stability Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the

More information

Lecture 4: Newton s method and gradient descent

Lecture 4: Newton s method and gradient descent Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

THE Mnet METHOD FOR VARIABLE SELECTION

THE Mnet METHOD FOR VARIABLE SELECTION Statistica Sinica 26 (2016), 903-923 doi:http://dx.doi.org/10.5705/ss.202014.0011 THE Mnet METHOD FOR VARIABLE SELECTION Jian Huang 1, Patrick Breheny 1, Sangin Lee 2, Shuangge Ma 3 and Cun-Hui Zhang 4

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

SCIENCE CHINA Information Sciences. Received December 22, 2008; accepted February 26, 2009; published online May 8, 2010

SCIENCE CHINA Information Sciences. Received December 22, 2008; accepted February 26, 2009; published online May 8, 2010 . RESEARCH PAPERS. SCIENCE CHINA Information Sciences June 2010 Vol. 53 No. 6: 1159 1169 doi: 10.1007/s11432-010-0090-0 L 1/2 regularization XU ZongBen 1, ZHANG Hai 1,2, WANG Yao 1, CHANG XiangYu 1 & LIANG

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding Kun Yang Trevor Hastie April 0, 202 Abstract Variable selection in linear models plays a pivotal role in modern statistics.

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

arxiv: v1 [math.st] 7 Dec 2018

arxiv: v1 [math.st] 7 Dec 2018 Variable selection in high-dimensional linear model with possibly asymmetric or heavy-tailed errors Gabriela CIUPERCA 1 Institut Camille Jordan, Université Lyon 1, France arxiv:1812.03121v1 [math.st] 7

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS. November The University of Iowa. Department of Statistics and Actuarial Science

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS. November The University of Iowa. Department of Statistics and Actuarial Science ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Shuangge Ma 2, and Cun-Hui Zhang 3 1 University of Iowa, 2 Yale University, 3 Rutgers University November 2006 The University

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Quantile Regression for Analyzing Heterogeneity. in Ultra-high Dimension

Quantile Regression for Analyzing Heterogeneity. in Ultra-high Dimension Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension Lan Wang, Yichao Wu and Runze Li Abstract Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION

A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION YULING JIAO, BANGTI JIN, XILIANG LU, AND WEINA REN Abstract. In this paper, we consider the problem of recovering a sparse

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery

A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery Jian Huang Yuling Jiao Bangti Jin Jin Liu Xiliang Lu Can Yang January 5, 2018 Abstract In this paper, we consider the problem of

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

A Constructive Approach to L 0 Penalized Regression

A Constructive Approach to L 0 Penalized Regression Journal of Machine Learning Research 9 (208) -37 Submitted 4/7; Revised 6/8; Published 8/8 A Constructive Approach to L 0 Penalized Regression Jian Huang Department of Applied Mathematics The Hong Kong

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Consistent Model Selection Criteria on High Dimensions

Consistent Model Selection Criteria on High Dimensions Journal of Machine Learning Research 13 (2012) 1037-1057 Submitted 6/11; Revised 1/12; Published 4/12 Consistent Model Selection Criteria on High Dimensions Yongdai Kim Department of Statistics Seoul National

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

STK Statistical Learning: Advanced Regression and Classification

STK Statistical Learning: Advanced Regression and Classification STK4030 - Statistical Learning: Advanced Regression and Classification Riccardo De Bin debin@math.uio.no STK4030: lecture 1 1/ 42 Outline of the lecture Introduction Overview of supervised learning Variable

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function

Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2011-03-10 Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors Patrick Breheny Department of Biostatistics University of Iowa Jian Huang Department of Statistics

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent user! 2009 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerome Friedman and Rob Tibshirani. user! 2009 Trevor

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Self-adaptive Lasso and its Bayesian Estimation

Self-adaptive Lasso and its Bayesian Estimation Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

M-estimation in high-dimensional linear model

M-estimation in high-dimensional linear model Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University

More information

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS R3 Statistica Sinica Preprint No: SS-2015-0413.R3 Title Regularization after retention in ultrahigh dimensional linear regression models Manuscript ID SS-2015-0413.R3 URL http://www.stat.sinica.edu.tw/statistica/

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net

Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net Exploiting Covariate Similarity in Sparse Regression via the Pairwise Elastic Net Alexander Lorbert, David Eis, Victoria Kostina, David M. Blei, Peter J. Ramadge Dept. of Electrical Engineering, Dept.

More information

Sparse Learning and Distributed PCA. Jianqing Fan

Sparse Learning and Distributed PCA. Jianqing Fan w/ control of statistical errors and computing resources Jianqing Fan Princeton University Coauthors Han Liu Qiang Sun Tong Zhang Dong Wang Kaizheng Wang Ziwei Zhu Outline Computational Resources and Statistical

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent KDD August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. KDD August 2008

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information