Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Size: px

Start display at page:

Download "Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables"

Jason Patrick
6 years ago
Views:

1 Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010

2 Motivations Fan and Li (2001), Zou and Li (2008) works Convex penalties (e.g quadratic penalties) : make trade-off between bias and variance, can create unnecessary biases when the true parameters are large and cannot produce parsimonious models. Nonconcave penalties (e.g: SCAD penalty,fan 1997 and hard thresholding penalty, Antoniadis 1997) Variables selection in high dimension (correlated variables) Penalized likelihood framework

3 Ideal procedure for variable selection Unbiasedness: The resulting estimator is nearly unbiasedness when the true unkwown parameter is large to avoid excessive estimation bias. Sparsity: Estimating a small coefficient as zero, to reduce model complexity. Continuity: The resulting estimator is continuous in the data to avoid instability in model prediction.

4 The Smoothly Clipped Absolute Deviation (SCAD) Penalty The SCAD penalty noted J λ (.) satisfies all three requirements (unbiasedness,sparsity,continuity) and is defined by J λ (0) = 0 and for β j > 0 J λ ( β j ) = λi( β j λ) + (aλ β j ) + I( β j > λ), (1) a 1 where (z) + = max(z, 0), a > 2 and λ > 0. SCAD possesses oracle properties.

5 Generalities Let (x i, y i ), i = 1,..., n an i.i.d random variables sample where x i IR p, y i IR. The conditional log-likelihood function knowing x i is: l i (β) = l i (β, φ) = l i (x t i β, y i, φ) (2) where φ is the dispersion parameter, supposed known. We want to estimate β maximizing: Pl(β) = n p l i (β) n J λ ( β j ), (3) i=1 j=1

6 The penalized likelihood is nonconcave and nondifferentiable Maximization problem Alternative: Approximation of the SCAD penalty by convex functions Iterative algorithms LQA Algorithm: Fan and Li (2001) n β (k+1) = argmax β l i (β) n i=1 p j=1 J λ ( β(k) j ) 2 β (k) j β 2 j. (4) When β (k) j < ɛ 0 put ˆβ j = 0 Two drawbacks: Choice of ɛ 0 and definitive exclusion of variables.

7 LLA Algorithm: Zou and Li (2008) n β (k+1) = argmax β l i (β) n i=1 p j=1 J λ ( β(k) j ) β j. (5) The one step LLA estimations are good as estimations obtained after the fully iterative LLA. The well known LARS algorithm is used when computing the solution. Therefore, as with LASSO (Tibshirani, 1996) there is a problem of selection in the case p >> n.

8 Our contribution: MLLQA Algorithm β (k+1) = argmax β n l i (β) n i=1 p ωj 1 β j n 2 j=1 p j=1 ω 2 j,τ β2 j. (6) Where ωj 1 and ωj,τ 2 depend on J λ ( β(0) j ), β (0) j and eventually τ > 0. β (0) is the Maximum Likelihood Estimator. The second term is for selection. The third one guarantees grouping effect as with the elastic net (Zou and Hastie, 2005). For the convergence we prove that MLLQA is an instance of MM algorithms (Hunter and Li 2005).

9 Augmented data problem We show that solving problem (6) is equivalent to find: 1 p β = argmin β 2 Y X β 2 +n ωj 1 βj. Y IR n+p, X of dimension (n + p) p and (Y, X ) depend on data (Y, X). Proposition Solving the problem (3) via one-step MLLQA algorithm is equivalent to One-step LLA on augmented data. j=1 (7)

10 Oracle and Statistical Properties of the one step MLLQA estimator Let β(ose) be the one-step estimator β (1) and β 0 the true model parameter. Assume β 0 = (β 01,..., β 0p ) T = (β T 10, βt 20 )T and β 20 = 0. Under some regularity conditions we have the following theorem: Theorem If nλ n and λ n 0, β(ose) is Sparse: with probability tending to 1, β(ose) 2 = 0. Asymptotically normal: n( β(ose) 1 β 10 ) N(0, I 1 1 (β 10)) Continuity: the minimum of the function β +J λ ( β ) must be attained at zero (Fan and Li 2001).In the case of one-step it suffices that J λ ( β ) be continuous for β > 0 to have the continuity of β(ose).

11 Grouping effect: case of correlated variables Assume that the response variable is centered and the predictors are standardized. If β (0) i = β (0) j 0 i, j {1,..., p} we then have: 1. D λ,τ,β (0)(i, j) β(0) j +τ 2(1 ρ) nj λ ( β(0) j ) 2. x i = x j β i = β j Where ρ = x t i x j and D λ,τ,β (0)(i, j) = β i β j Y 1

12 Linear Model In this example, simulation data were generated from the linear regression model, y = x T β + ɛ, where β = (3, 1.5, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0) T, ɛ N (0, 1) and x is multivariate normal distribution with zero mean and covariance between the ith and jth elements being ρ i j with ρ {0.5, 0.7, 0.9}.The sample size is set to be 50 and 100.For each case we repeated the simulation 500 times.

13 n = 50 No. of Zeros Proportion of Method MRME C IC Underfit Correctfit Overfit ρ =.5 LLA MLLQA ρ =.7 LLA MLLQA ρ =.9 LLA MLLQA

14 n = 100 No. of Zeros Proportion of Method MRME C IC Underfit Correctfit Overfit ρ =.5 LLA MLLQA ρ =.7 LLA MLLQA ρ =.9 LLA MLLQA

15 Using convexe approximation of SCAD penalty, we ve transformed our initial problem in one-step LLA on augmented data. This approach is adapted in the high dimensional setting (p >> n).so, allows the selection of more than n variables. We considered one-step estimator as final estimation because it s naturally adopt sparse representation and has oracle properties. Our approach improves one-step LLA results in the case (p < n).

16 EFRON, B., HASTIE, T., JOHNSTONE, I. and TIBSHIRANI, R.(2004): Least angle regression. The annals of statistics (32), FAN, J. and LI, R.(2001): Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association (96), HUNTER, D. and Li, R.(2005): Variable selection using MM algorithms. The annals of statistics, volume 33, TIBSHIRANI, R.(1996): Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (58), ZOU, H. and HASTIE, T.(2005): Regularization and variable selection via the elastic-net. Journal of the Royal Statistical Society (67), ZOU, H. and LI, R.(2008): One-step sparse estimates in nonconcave penalized likelihood co-authors models. Mkhadri Abdallah The and annals N Guessan of Assi

17 Thank you for your attention!!! MERCI DE VOTRE ATTENTION!!!

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized