The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

Size: px

Start display at page:

Download "The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010"

Mildred Evans
5 years ago
Views:

1 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang

2 Penalized regression methods Penalized methods have emerged as attractive approaches for high-dimensional regression problems Many types of penalties have been proposes; most relevant to this talk are the lasso, ridge (L2), and minimax concave penalty (MCP) These penalties all introduce stability into high-dimensional models by penalizing large values of regression coefficients Lasso and MCP conduct variable selection as well, by shrinking some of the coefficients all the way to zero

3 Elastic net The methods have drawbacks as well: Lasso/MCP behave erratically in the presence of highly correlated variables Ridge introduces considerable bias toward 0 To mitigate these drawbacks, Zou and Hastie (2005) proposed the elastic net (ENet), which combines the lasso and ridge penalties If the balance between the two penalties is chosen well, the elastic net manages to combine the strengths of the two approaches, while minimizing their drawbacks

4 MNet However, several authors (Zhang 2010, Breheny 2009, Mazumder 2009) have demonstrated that the MCP has many advantages over the lasso most importantly, it achieves the oracle property, unlike the lasso Other authors (Zou and Zhang 2009) have shown that the shortcomings of the lasso prevent the elastic net from achieving the oracle property Thus, we propose combining the MCP and ridge penalties, and call the resulting estimator MNet

5 Summary Our main findings are as follows: Asymptotically, the MNet estimator is selection consistent and equivalent to the oracle ridge regression estimator Simulations and real data applications indicate that for finite sample sizes, MNet outperforms ENet in terms of variable selection, prediction, and estimation accuracy, while producing more sparse models is available via our R package, ncvreg

6 Lasso vs. MCP lasso P MCP P' λ 0 0 β 0

7 Lasso vs. MCP lasso P MCP P' λ 0 γλ λ 0 λ γλ β 0 λ γλ

8 MNet We define the MNet estimator as the value ˆβ which minimizes Q(β) = L(y, Xβ) + p M(β j λ 1, γ) λ 2 β 2, i=1 where M is the MCP penalty In this talk, we take L to be the least squares loss function, but the method is easily extended to other loss functions

9 Orthonormal design The nature of this estimator can be most clearly seen in the special case of an orthonormal design matrix, where it can be expressed in closed form Letting z j = x j y/n denote the unpenalized solution, ˆβ j = { S(zj,λ 1 ) 1+λ 2 1/γ if z j γλ 1 (1 + λ 2 ) z j 1+λ 2 if z j > γλ 1 (1 + λ 2 ), where S(z j, λ 1 ) is the soft-thresholding operator Because S(z j, λ 1 ) is also the lasso solution to this problem, we can see the effect of the MCP and ridge modifications, which rescale the lasso solution in different directions

10 Grouping effect Like the ENet, the MNet produces a grouping effect, meaning that it tends to select (or drop) strongly correlated features together as a group This can be expressed formally as Proposition Letting ρ jk be the sample correlation between x j and x k, ˆβ j ˆβ k ζ 1 ρ jk for ρ jk 0, ˆβ j + ˆβ k ζ 1 + ρ jk for ρ jk < 0, where ζ is an expression involving λ 2, γ, and RSS/n. Thus, the difference between two coefficients is bounded by a quantity determined by the correlation between their respective features

11 Computation MNet models can be fit efficiently using coordinate descent algorithms, which minimize the objective function with respect to one covariate at a time until convergence The MNet objective function is convex in each coordinate direction, allowing the following proposition to be established: Proposition Let {β (m) } denote the sequence of coefficients produced at each iteration. For all m = 0, 1, 2,..., ( Q β (m+1)) ( Q β (m)). Furthermore, the sequence is guaranteed to converge to a point β that is both a local minimum and a global coordinate-wise minimum of Q.

12 Conditions possesses attractive theoretical properties under suitable regularity conditions The most important of these conditions are Error terms {ε i } are iid with Eε = 0, Vε <, and P( ε > x) K exp{ Cx α }, with α 1 The sparse Riesz condition: letting X A = (x j j A), we require that the eigenvalues of n 1 X A X A are bounded away from zero on a lower dimensional subset of the parameter space with dimensionality d > d 0, the dimensionality of the true model

13 Main theoretical result Let ˆβ O denote the oracle ridge estimator, and let β 0 denote the true value of the regression coefficients: Theorem Suppose the aforementioned regularity conditions are satisfied; then as n, the following two results hold: { } (i) P sign(ˆβ) = sign(β 0 ) 1 } (ii) P {ˆβ = ˆβO 1 Note that this theorem does not require that p < n

14 Simulation results: Estimation Mnet Enet p 1 = 8 p 1 = 8 ρ = 0 ρ = p 1 = 8 ρ = Relative MSE 0.0 p 1 = 32 ρ = 0 p 1 = 32 ρ = 0.3 p 1 = 32 ρ = β

15 Simulation results: Selection FDR p 1 = 8 ρ = p 1 = 32 ρ = 0 Mnet Enet p 1 = 8 ρ = 0.3 p 1 = 32 ρ = 0.3 p 1 = 8 ρ = 0.8 p 1 = 32 ρ = β

16 Rat eye expression data We applied MNet and ENet to an eqtl study of the gene TRIM32, the expression of which has been has been linked to Bardet-Biedl syndrome: Model CV Average size error magnitude ENet MNet genes were selected in common between the two methods

17 Conclusions The elastic net and MCP have both been shown to be improvements over the lasso; the MNet unifies these two advances, and has many attractive properties: Selection consistency Oracle estimation On real and simulated data sets, MNet produces models that are more sparse and less biased toward zero Computationally efficient, even for large p Software available (ncvreg)

Stability and the elastic net

Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for