SUPPLEMENTARY MATERIAL FOR: ADAPTIVE ROBUST VARIABLE SELECTION

Size: px

Start display at page:

Download "SUPPLEMENTARY MATERIAL FOR: ADAPTIVE ROBUST VARIABLE SELECTION"

Aubrie McCormick
5 years ago
Views:

1 SUPPLEMENTARY MATERIAL FOR: ADAPTIVE ROBUST VARIABLE SELECTION By Jianqing Fan, Yingying Fan and Emre Barut Princeton University, University of Southern California and IBM T.J. Watson Research Center APPENDIX A: ADDITIONAL PROOFS A.1. Proof of Theorem 3. This proof is motivated by Pollard In the proof we use C > 0 to denote a generic positive constant. Let θ = V 1 n β 1 β 1, then β 1 = β 1 +V n θ. Letting G n θ = L n β 1, 0 L n β 1, 0, we have that A.1 G n θ = ρ τ ε Z n θ 1 ρ τ ε 1 + nλ n d0 β 1 + V n θ 1 d 0 β 1 1, where we have used the shorthand notation that ρ τ u 1 = n ρ τ u i for any vector u = u 1,, u n T. Since L n β 1, 0 is minimized at β 1 = β o 1, it follows that G n θ is minimized at θ n = V 1 n β o 1 β 1. We consider θ over the convex open set B 0 n = {θ R s : θ 2 < c 6 s}, with some constant c 6 > 0 independent of s. The idea of the proof is to approximate the stochastic function G n θ by a quadratic function, whose minimizer is shown to possess the asymptotic normality. Since G n θ and the quadratic approximation are close, the minimizer of G n θ enjoys the same asymptotic normality. Now, we proceed to prove Theorem 3. Decompose G n θ into its mean and centralized stochastic component: A.2 G n θ = Q n θ + T n θ, where Q n θ = E[G n θ] and A.3 T n θ = ρ τ ε Z n θ 1 ρ τ ε 1 E [ ] ρ τ ε Z n θ 1 ρ τ ε 1. 1

2 2 FAN ET AL. We first deal with the mean component Q n θ. Since θ 2 < c 6 s over the set B 0 n, it follows from the Cauchy-Schwarz inequality and the assumption of the theorem that H 1/2 Z n θ θ 2 max H 1/2 Z ni 2 = o s 3 log s 1. i Then, since f u is bounded in a small neighborhood of 0, by using a similar argument as equation 7.3 of the main paper and noting that n f i0 Z T niθ 2 = θ T Z T n HZ n θ = θ 2 2 we can show that E [ ] A.4 ρ τ ε Z n θ 1 ρ τ ε 1 = θ n f 2 i0 Z T n,iθ 3 + o n f i0 Z T n,iθ 3. Furthermore, since n f i0 Z T niθ 2 = θ 2 2 < c2 6s, it follows that A.5 n f i0 Z T n,iθ 3 C H 1/2 Z n θ n f i 0 Z T niθ 2 = o s 2 log s 1. Next, we deal with the penalty term in the expected value Q n θ. Since 1 n ST HS has bounded eigenvalues by Condition 2, it follows from the assumption of the theorem that, for any θ B 0 n, V n θ V n θ 2 Cn 1/2 θ 2 = o min {1 j s} β j. Hence, sgnβ 1 + V n θ = sgnβ 1 and A.6 d 0 β 1 + V n θ 1 d 0 β 1 1 = d T 0 V n θ, where d 0 is a s-vector with j-th component d j sgnβj. Combining A.4 A.6 yields A.7 Q n θ = θ nλ n dt 0 V n θ + o 1, where o is uniformly over all θ B 0 n. We now deal with the stochastic part T n θ. Define D = ρ τ ε 1,, ρ τ ε n T, W n = Z T n D, and R n θ = ρ τ ε Z n θ 1 ρ τ ε 1 W T n θ.

3 ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENT 3 Then E[W T n θ] = 0 and A.8 T n θ = W T n θ + r n θ, where r n θ = R n θ E[R n θ]. Here, W T n θ can be regarded as the first order approximation of ρ τ ε Z n θ 1 ρ τ ε 1. We next show r n θ is uniformly small. By Lemma 3, there exists a sequence b n such that for any ϵ > 0, A.9 P r n θ ϵ exp Cϵb n slog s. Define λ n θ = G n θ nλ n dt 0 V n θ W T n θ and λθ = θ 2 2. Then by definition A.1, λ n θ and λθ are both convex functions on the set B 0 n. Furthermore, by definition, we can write r n θ as r n θ = λ n θ λθ o1. Since θ 2 < c 6 s for all θ B0 n, by Condition 1 we have that for any θ 1, θ 2 B 0 n, λθ1 λθ 2 = θ1 + θ 2 T θ 1 θ 2 θ 1 + θ 2 2 θ 1 θ 2 2 Cs θ 1 θ 2. Thus, the above result and A.9 indicate that conditions in Lemma 4 are satisfied. Then, for any compact set K s = { θ 2 c 4 s} B0 n with 0 < c 4 < c 6 some constant, A.10 sup r n θ = o p 1. θ K s Combining A.2, A.7 and A.8 we can write that A.11 A.12 G n θ = θ nλ n dt 0 V n θ + W T n θ + r n θ + o 1 = θ η n 2 2 η n r n θ + o 1, where η n = 1 2 nλn V n d0 + W n. By a classic weak convergence result, it is easy to see that c T Z T n Z n 1/2 W n D N0, τ1 τ, for any c R s satisfying c T c = 1. It follows immediately that A.13 c T Z T n Z n 1/2 η n nλ nv n d0 D N 0, τ1 τ.

4 4 FAN ET AL. We only need to show that the minimizer θ of G n θ is close to η n, i.e., for any ϵ > 0, A.14 P θ η n 2 ϵ 0. Hence Theorem 3 will follow from A.13 and Slutsky s lemma. We now proceed to prove A.14. First, let B 1 n be a ball with center η n and radius ϵ. Since c T W n has asymptotic normal distribution N0, 1 for any c R s with c T c = 1, and nv n V T n = 1 n ST HS 1 has bounded eigenvalues, by definition, we can bound η n as A.15 η n Wn 2 + nλ n V T n d 0 2 Op s + Cλ n n d0 2 = C s O p1, where the last step is by the assumption λ n n d0 2 = O s of the theorem. Since c 6 in the definition of B 0 n can be chosen to be much larger than C/2, it follows that for each fixed s, the compact set K s = { θ 2 c 4 s} B0 n with c 4 large enough can cover the ball B 1 n with probability arbitrarily close to 1. Therefore, by A.10 A.16 n sup r n θ sup r n θ = o p 1. θ B 1 n θ K s Now, we are ready to prove A.14. Consider the behavior of G n θ outside of the ball B 1 n. Let θ = η n + κu R s be a vector outside the ball B 1 n, where u R s is a unit vector and κ is a constant satisfying κ > ϵ, with ϵ the radius of B 1 n. Define θ as the boundary point of B 1 n that lies on the line segment connecting η n and θ. Then we can write θ = η n + ϵu = 1 ϵ/κη n + ϵθ/κ. By the convexity of G n, A.12 and A.16, ϵ κ G nθ + 1 ϵ Gn η κ n G n θ ϵ 2 η n 2 2 n ϵ 2 + G n η n 2 n. Since ϵ < κ, it follows that for large enough n, A.17 inf { θ ηn >ϵ} G n θ G n η n + κ ϵ [ϵ2 o p 1] > G n η n. This establishes A.14 and proves Theorem 3.

5 ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENT 5 A.2. Proof of Theorem 5. The proof of Theorem 5 follows from those of Theorems 3 and 4. We use C to denote a generic constant in the proof. By Theorem 4, with asymptotic probability one, there exists a global minimizer β = β T 1, 0 T T of L n β and β 1 β 1 2 a n. Next we study the asymptotic normality of β 1. Following A.1 in the proof of Theorem 3, define G n θ = L n V n θ + β 1, 0 L n β 1, 0, where V n and θ are the same as in the proof of Theorem 3. Then θ n = V 1 n β 1 β 1 is a global minimizer of G n θ. The idea of the proof is to show that G n θ in A.1 and G n θ are uniformly close to each other. Since G n θ can be well approximated by a sequence of quadratic functions, Gn θ can be well approximated by the same sequence of quadratic functions. Thus, the minimizer of Gn θ enjoys the same asymptotic properties as that of G n θ. We now proceed to prove that G n θ and G n θ are uniformly close. To this end, first note that for any β 1 with β 1 2 < C s, A.18 L n β 1, 0 L n β 1, 0 nλ n β 1 2 d 0 d 0 2 Cnλ n s d 0 d 0 2. For 1 j s, by the mean-value theorem, A.19 d j ˆd j = p λ n βj p λ n β j ini = p λ n β j βj ini β j, where β j lies on the segment connecting βj and β ini j. By Condition 5 and the triangle inequality, with asymptotic probability one β j β j β j β j > β j C 2 slog p/n > 2 1 min j<s β j. This together with Condition 6 ensures that p λ n β j = o p s 1 λ 1 n n log p 1/2. Thus, in view of A.19, d 0 d 0 2 o p s 1 λ 1 n n log p 1/2 β ini 1 β 1 2 o p s 1/2 λ 1 n n 1. Since θ 2 C s ensures β 1 2 C s, the above inequality combined with A.18 entails that sup G n θ G n θ = sup θ B 0 n β 1 2 C L n β 1, 0 L n β 1, 0 = o p 1, s

6 6 FAN ET AL. where B 0 n is defined in Theorem 3. Therefore, by the above result and A.17, for any θ B 0 n and θ η n 2 > ϵ with ϵ > 0 arbitrarily small, inf G n θ { θ η n 2 >ϵ} inf G nθ θ η n 2 >ϵ sup θ B 0 n G n η n + κ ϵ [ϵ2 o p 1] o p 1 G n η n + κ ϵ [ϵ2 o p 1] o p 1. G n θ G n θ Then, it follows immediately that the minimizer θ n η n 2 ϵ with asymptotic probability one. Thus θ n η n = o p 1. The proof of Theorem 5 is completed. A.3. Lemmas. Lemma 3. Assume conditions of Theorem 3 hold. Let R n,i θ = ρ τ ε i Z T n,iθ ρ τ ε i +ρ τ ε i Z T n,iθ and R n θ = n R niθ. Then for any ϵ > 0, P R n θ E[R n θ] ϵ exp Cϵb n s 2 log s, where b n is some diverging sequence such that b n s 7/2 log s max i Z ni 2 0, and C > 0 is some constant. Proof. Let ξ i = R n,i θ E[R n,i θ]. Then R n θ E[R n θ] = n ξ i. Since R n,i θ s are independent, by Markov s inequality we obtain that for any ϵ > 0 and t > 0, [ P R n θ E[R n θ] ϵ e tϵ E exp t A.20 = exp tϵ t n ] ξ i n n E[R n,i θ] E[exptR n,i θ]. We next study E[R ni θ] and E [ exp tr ni θ ] in A.20. Using a similar argument to that for A.4 we can prove that E[R n,i θ] = E[ρ τ ε i Z T n,iθ ρ τ ε i ] = f i 0Z T n,iθ 2 + O Z T n,iθ 3, where O is uniformly over all i. Thus, it follows from the definition of Z n,i that n t E[R n,i θ] = t θ O n A.21 t Z T n,iθ 3.

7 ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENT 7 Now, we consider E [ exp tr ni θ ]. If Z T n,iθ > 0, then by definition R n,i θ = Z T n,iθ ε i 1{0 ε i Z T n,iθ}. By Condition 1 and Taylor expansion it follows that E [ exp tr n,i θ ] 1 + exptz T n,iθ 1 P 0 ε i Z T n,iθ 1 + f i 0t Z T n,iθ 2 + Ot 2 Z T n,iθ 3. When Z T n,iθ < 0, we can get the same result using a similar argument. Since n 1 + x i exp n x i for x i > 0, in view of A.5 and the above inequality we obtain that A.22 n E [ exp tr n,i θ ] n exp E [ exp tr n,i θ 1 ] Substituting A.21 and A.22 into A.20 gives A.23 exp t θ O n t 2 Z T n,iθ 3. P R n θ E[R n θ] ϵ exp n tϵ + O t 2 Z T n,iθ 3. Choosing t = 2s 2 log sb n with b n such that b n s 7/2 log s max i Z ni 2 0, and using similar idea to that for A.5 we obtain that t n Z T n,iθ 3 Ct max Z T n,iθ i n f i 0 Z T n,iθ 2 Cts 3/2 max Z n,i 2 0. i Plugging this into A.23 yields that P R n θ E[R n θ] ϵ exp Cϵb n s 2 log s. Repeating the same argument for P R n θ E[R n θ] ϵ the proof. completes Lemma 4. Let λθ be a positive function defined on a convex, open subset Θ s = {θ R s : θ 2 < c 6 s} of R s, and {λ n θ : θ Θ s } be a sequence of random convex functions defined on Θ s, where c 6 > 0 is some constant. Suppose that there exists some b n such that for every θ Θ s, the following holds for all ϵ > 0 P λ n θ λθ ϵ c 9 exp c 7 s 2 log sb n ϵ,

8 8 FAN ET AL. where c 7, c 9 are two positive constants. Let K s be a compact set in R s such that K s = { θ 2 c 4 s} Θs, where c 4 < c 6 is some positive constant. If, for some constant c 8 > 0, λθ 1 λ 2 θ 2 c 8 s θ 1 θ 2 for any θ 1, θ 2 Θ s, then sup θ K s λ n θ λθ = o p 1. Proof. The proof is an extension of the convexity lemma in Pollard The basic idea is to prove that K s can be covered by a number of cubes, and λ n θ and λθ are uniformly close over the set of vertices of these cubes. Within each cube, values of both λ n θ and λθ do not change significantly. Thus λ n θ and λθ are uniformly close over K s. In this proof we use C to denote some generic positive constant. We proceed to prove the lemma. Since λθ 1 λθ 2 c 8 s θ 1 θ 2 for any θ 1, θ 2 Θ s, it follows that for a fixed ϵ > 0, the function λθ varies by less than ϵ/s over each cube of side δ ϵ/s 2 c 8 that intersects K s. Note that K s can be covered by less than 2c 4 s/δ s = 2c 4 c 8 s 5/2 s such cubes. Then in total, there are less than 2 s 2c 4 c 8 s 5/2 s vertices. Denote by V s the set of all such vertices whose cubes intersect K s. Since c 6 can be much larger than c 4 and the edge of each cube, δ = ϵ/c 8 s 2, is small and decreases with s, all vertices in V s fall in Θ s as well. Thus by the pointwise convergence assumption in the Lemma, it is easy to derive that for any ϵ > 0, as b n, P max θ V s λ n θ λθ ϵ/s c 6 exp Cs logcs c 7 slog sb n ϵ 0. Therefore, A.24 M n max θ V s λ n θ λθ = o p ϵ/s. For any θ K s, it will fall into a cube and thus can be written as the convex combination of this cube s vertices {θ i } in V s, that is, θ = i α iθ i with α i [0, 1. Then by the convexity of λ n θ and A.24, λ n θ α i λ n θ i i α i { λ n θ i λθ i + λθ i λθ + λθ} max λ n θ i λθ i + ϵ/s + λθ M n + ϵ/s + λθ. i Thus, A.25 P sup λ n θ λθ > 2ϵ/s 0. θ K s

9 ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENT 9 Next we prove the lower bound. Each θ in K s lies within a δ-cube with a vertex θ 0 in D s, thus s θ = θ 0 + δ i e i with δ i < δ, where e 1,, e s denote s coordinate directions. Without loss of generality suppose 0 δ i < δ for each i. Define θ i to be the vertex θ 0 δe i in D s. Then θ 0 can be written as a convex combination of θ and θ i : δ θ 0 = δ + j δ θ + j i δ i δ + j δ θ i. j δ Denote by a = δ+ j δ and a δ i = i j δ+ j δ. Since 0 δ j < δ, it follows that j the coefficient for θ can be bounded as a 1 1+s. Since λ n is convex, by A.24 we obtain that aλ n θ λ n θ 0 i a i λ n θ i λθ 0 i a i λθ i 2M n λθ ϵ s i a i λθ + ϵ s 2M n aλθ 2ϵ s 2M n. Since a > 1 + s 1, it follows from the above inequality and A.24 that Therefore, A.26 λ n θ λθ 2ϵ s 2M 2ϵs + 1 ns + 1 o1. s P inf λ n θ λθ < 31 + s ϵ θ K s s 0. Since we can choose ϵ arbitrarily small, the uniform convergence result follows easily by combining A.25 with A.26. A.4. Proof of Proposition 1. Since s is finite, the summation of the probability below is of the same order as the maximum of the probability below. The distribution result equation 3.1 in the main paper entails that P 1 s n ST ε > z = P max 1 j s xt j ε > nz P x T j ε > Cnz s x j α αc α Cnz α, j=1 j=1

10 10 FAN ET AL. where C > 0 is some generic constant. For any sequence b n such that b n, by letting z = n 1 b n bn with b n = s j=1 x j α α 1/α, we have that P n 1 S T ε > n 1 b n bn 0. In other words, n 1 S T ε = O p n 1 b n. Hence, for the following condition, βm + λ n sgn β M = β M + n 1 S T ε, to have a solution β = β 1,, β s T with β j > 0 for all j = 1,, s, the necessary conditions are β 0 nb 1 n and λ n < β 0. Combining these conditions we have A.27 λ n < β 0 n 1 b n bn, with some diverging sequence b n. We next check if the following condition is satisfied, Q T ε nλ n. Combining 3.1 and A.27 ensures that for j > s, P x T j ε > nλ n P x T j ε > b n bn x j α αc α b n bn α. Since we assumed supp x j supp x i = for any i, j {s+1,, p}, it follows that Q T ε is a vector of independent random variables with components x T j ε. Then, P Q T ε > nλ n = 1 P Q T ε nλ n = 1 1 P x T j ε > nλ n j>s 1 1 C xj α αb n bn α = 1 exp log 1 C x j α αb n bn α. j>s Since log1 + x x for all x > 1, we have that log 1 C x j α αb n bn α C x j α αb n bn α. j>s j>s Combining the above two inequalities, if j>s x j α αb n bn α c 0 /C 0, ], then P Q T ε > nλ n 1 e c 0. That is, with probability at least 1 e c 0, Q T ε nλ n, fails to hold and Lasso does not have the model selection oracle property. In fact, if b n On 1 j>s x j α α 1/α b 1 n, or equivalently by A.27, β 0 On 1 x j α α 1/α, j>s j>s

11 ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENT 11 then c 0 /C 0, ]. In other words, unless we have A.28 nβ 0 { j>s x j α α} 1/α, Lasso is not able to recover the true model and the correct sign. Finally, since x j 2 = n, supp x j = On 1/2 and max ij x ij = On 1/4, the nonzero components of Q are all of the same order On 1/4. Consequently, x j α α must be all of the same order On 2+α/4. Then, n{ j>s x j α α} 1/α = O n α and the condition A.28 above becomes n α β 0. APPENDIX B: REAL DATA EXAMPLE In this section, we use expression quantitative trait locus eqtl mapping to illustrate the performance of R-Lasso and AR-Lasso. eqtl studies aim at finding the variations of genotype in a certain part of a chromosome that are associated with the gene expression levels. In this study, we conducted a cis-eqtl mapping for the gene CHRNA6, cholinergic receptor, nicotinic, alpha 6. CHRNA6 is located on the 8th chromosome, in the cytogenetic location 8p11. CHRNA6 is thought to be related to activation of dopamine releasing neurons with nicotine Thorgeirsson et al., Therefore, CHRNA6 has been the subject of many nicotine addiction studies on people with western European heritage Saccone et al., 2009; Thorgeirsson et al., The data are from 90 individuals from the international HapMap project The International HapMap Consortium, 2005, all with western Europe ancestry. The data are available on ftp://ftp.sanger.ac.uk/pub/genevar/. The normalized expression data was generated with an Illumina Sentrix Human-6 Expression Bead Chip Stranger et al., The SNPs under investigation are located at 1 megabase upstream and downstream of the transcription start site TSS of CHRNA6; in this range, there were 554 SNPs. The additive coding for SNPs was employed, with 0, 1 and 2 representing the major, heterozygous, and minor populations, respectively. We further screened the SNPs using a variation of the independent screening method SIS of Fan and Lv We kept the top 100 SNPs that had correlation with the gene expression levels. Finally, we applied Lasso, SCAD, R-Lasso and AR- Lasso to the screened variables. The quantile parameter, τ was set to 0.5 for R-Lasso and AR-Lasso, corresponding to the median regression. The tuning parameter for all methods was chosen using five-fold cross validation. The selected SNPs as well as their regression coefficients and distances from the main transcription site are given in Table 1.

12 12 FAN ET AL. Table 1 Selected SNPs for the eqtl study. SNP Lasso SCAD R-Lasso AR-Lasso Distance from TSS in kb rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs It is seen that robust regression methods R-Lasso and AR-Lasso found more of the variables to be significant. R-Lasso and AR-Lasso selected 21 and 15 variables, respectively, whereas Lasso and SCAD only found 15 and 5 of the variables to be significant. Only 4 SNPs were included in all of the models, rs , rs , rs , rs Furthermore, none of these SNPs were covered in the previous study Saccone et al., We speculate that the difference is due to the fact that the previous studies focused on SNPs that are only 50 kb upstream and downstream. Additionally, these studies did not consider multiple regression which makes significant use of the correlation structure in the data. In addition, among all the SNPs

13 ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENT 13 Quantiles of Input Sample Standard Normal Quantiles a Residuals from Lasso Quantiles of Input Sample Standard Normal Quantiles b Residuals from SCAD Quantiles of Input Sample Standard Normal Quantiles c Residuals from R-Lasso Quantiles of Input Sample Standard Normal Quantiles d Residuals from AR-Lasso Fig 1: QQ-plots of residuals for different methods in the eqtl study that are chosen by these four methods, only one of them rs appears in the paper by Saccone et al. 2009, and only R-Lasso and AR-Lasso found this SNP to be important. As it was observed in the finite sample simulations, SCAD and AR-Lasso consistently chose a smaller set of variables than their counterparts. Furthermore, almost two thirds of the selected SNPs lie to the left of the transcription site. We would also like to note that the residuals from the fitted regressions had very heavy right tails. This suggests that, at least for this particular eqtl study, it is a lot more reasonable to use methods based on quantile regression. The QQ-plots of the residuals from different regression methods are shown in Figure 1.

14 14 FAN ET AL. REFERENCES Fan, J. and Lv, J Sure independence screening for ultrahigh dimensional feature space with discussion. J. Roy. Statist. Soc. Ser. B The International HapMap Consortium A haplotype map of the human genome. Nature Nolan, J. P Stable Distributions - Models for Heavy Tailed Data. Birkhauser. In progress, Chapter 1 online at academic2.american.edu/ jpnolan. Pollard, D Asymptotics for least absolute deviation regression estimators. Econometric Theory Saccone, N.L., Saccone, S.F., Hinrichs, A.L., Stitzel, J.A., Duan, W., Pergadia, M.L., Agrawal, A., Breslau, N., Grucza, R.A., Hatsukami, D., Johnson, E.O., Madden, P.A.F., Swan, G.E., Wang, J.C., Goate, A.M., Rice, J.P., Bierut, L.J Multiple distinct risk loci for nicotine dependence identified by dense coverage of the complete family of nicotinic receptor subunit CHRN genes. Am. J. Med. Genet. Part B 150 B Stranger, B., Forrest, S., Dunning, M., Ingle, C., Beazley, C., Thorne, N., Redon, R., Bird, C., de Grassi, A., Lee, C., Tyler-Smith, C., Carter, N., Scherer, S., Tavar, S., Deloukas, P., Hurles, M. and Dermitzakis, E Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes. Science Thorgeirsson, T. et al Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nat. Genet J. Fan Department of Operations Research and Fin Eng Princeton University Princeton, New Jersey USA jqfan@princeton.edu E. Barut IBM T.J. Watson Research Center Yorktown Heights, New York USA abarut@us.ibm.com Y. Fan Data Sciences and Operations Department Marshall School of Business University of Southern California Los Angeles, California USA fanyingy@marshall.usc.edu

Penalized composite quasi-likelihood for ultrahigh dimensional variable selection

J. R. Statist. Soc. B (2011) 73, Part 3, pp. Penalized composite quasi-likelihood for ultrahigh dimensional variable selection Jelena Bradic and Jianqing Fan Princeton University, USA and Weiwei Wang University