A class of generalized ridge estimator for high-dimensional linear regression

Size: px

Start display at page:

Download "A class of generalized ridge estimator for high-dimensional linear regression"

Katherine Hines
5 years ago
Views:

1 A class of generalized ridge estimator for high-dimensional linear regression Advisor: akeshi Emura Presenter: Szu-Peng Yang June 3, 04 Graduate Institute of Statistics, NCU

2 Outline Introduction Methodology Numerical analysis Conclusion

3 Introduction Introduction Methodology Numerical analysis Conclusion 3

4 Background Model y X β ε n n p p n n I, ε ~ N 0, Least square estimator LSE X X X y Advantage: unbiasedness, minimum variance Disadvantage: high-dimensionality p n Performance in terms of MSE? 4

5 Background High-dimensionality Microarray data p 394 n gene signatures EGFR index from 4 patients 5

6 Ridge regression Ridge estimator Hoerl and Kennard 970 X X I X y, 0. Shrinkage parameter nearest point to 0 Shrink toward 0 0 β ˆ Infinitely many solution in LSE 6

7 Ridge regression heorem Existence heorem, Hoerl and Kennard, 970 here always exists a 0 MSE{ } MSE. such that Note: ˆ ˆ MSE β E{ β β β } Let P be orthogonal matrix such that X X P P where is a diagonal matrix. hen α,..., p P, and max β max{,..., }. p Figure he plot of the MSE of the ridge estimator. 7

8 Ridge regression Is the ridge estimator adapt to p n case? - Whittaker, hompson and Denham Zhao, Rødland and Sørlie, et al. 0 - Cule, Vieis and De Iorio 0 SNP data, Significance testing 8

9 Generalized ridge regression Generalized ridge estimator Hoerl and Kennard 970 W X X W X y, W is a diagonal matrix. - Allen McLachlan Loesgen 990 9

10 Generalized ridge regression Benefit. different weights for different regressors. Performance in terms of MSE McLachlan, 980 Problem: No application in p n case. 0

11 Methodology Introduction Methodology Numerical analysis Conclusion

12 Baysian interpretation Good Bayesian interpretation Loesgen, 990: Prior β ~ 0, W N p y Xβ ε, ε ~ Nn 0, I Bayes estimator posterior mean E[ β X, y ] X X W X y

13 Baysian interpretation β,..., p,,..., p are independent W is a diagonal matrix and W diag w,..., wp. W W precision of prior Precision of prior knowledge 0 w 3

14 Baysian interpretation How to choose shrinkage parameters?. More likely 0 w small w large. More likely 0 w large w small 4

15 Proposed method Idea Define W diag w,..., wp, w if if 0 0, [ 0,], 0,,..., p. Problem:,..., p unknown estimate W from data. 5

16 Proposed method Estimator Initial estimate β,..., p, ˆ x y 0 x x ˆ for,..., p where, for,..., p, are the columns of X. x ˆ 0 p ˆ 0 p se 0 p ˆ 0 0 p Under the sparse model β 0, ˆ 0 / ˆ 0 se β ~ N 0, 6

17 Proposed method Estimator Proposed estimator, { X X Wˆ } X y, 0, where Wˆ diag{ wˆ,..., wˆ p } and wˆ / if ˆ 0 otherwise, / se 0, for,..., p. hresholding parameter 7

18 Proposed method Parameters estimation Allen s PRESS 974 generalized cross-validation GCV Golub, Heath and Wahba, 979 Minimizer of GCV ˆ 0 / ˆ 0 se β ~ N 0, Choose among D { 0, 3/00,..., 300 /00 } ˆ able Formula of GCV function Ridge estimator V n { I A A } y r{ I n X X X I X A } Proposed estimator V, { I A, } y n A r{ I n, X{ X X W } X ˆ A, } 8

19 MSE comparison Mean square error matrix MSE r{ MSEM } - renkler Loesgen 990 MSEM E{ -β -β } C dd Mean square error C Cov MSE E{ β β } E{ E }{ E } d Bias E β 9

20 MSE comparison Lemma renkler, 985 Suppose A is a symmetric matrix, is an vector and is a positive real number. hen Aaa is n.n.d. if and only if p A is n.n.d. a Av for some v R 3 where A is the generalized inverse of n n a A. n a A a Note: the diagonals of a nonnegative definite n.n.d. matrix are nonnegative. 0

21 MSE comparison i X MSEM X W i X y MSEM for i,. C d C d d d heorem MSEM ˆ MSEM β conditions hold: C C is.n.n.d. / is n.n.d. if all the three C C d is full rank d 3 d C C dd d

22 MSE comparison Example β, 0, 0 W I diag, W diag.5,.5, 0 X X I For simplicity 33 C / C diag 4, 96.5 C C d d {.5 4 }.5 33 diag, d Result: C C d d d 5 {.5 4 }.5 Figure he plot of MSE ˆ MSE β against. not too large, /5 MSEM ˆ MSEM β d C C d d d n.n.d.

23 Significance testing Cule, Vineis and De Iorio, 0 Hypothesis H : 0 v.s. H : 0,,..., p 0 Wald test statistics Z ˆ ˆ, ˆ se{ ˆ ˆ, ˆ } wo-sided P-value H 0 ~ N 0,,,..., p P Pr Z Z or Z Z Pr Z Z 3

24 Numerical analysis Introduction Methodology Numerical analysis Conclusion 4

25 Simulation Model setting Sparse model - Emura, Chen and Chen 0 - Ing and Lai 0 - Binder, et al. 009 Correlated regressors y Xβ ε, ε ~ Nn 0, I q r β b / q,..., b / q, d / r,..., d / r, p qr 0,..., 0 Corr 0.5 Corr 0.5 5

26 Simulation Model setting n 00, p { 50,00,50, 00 }, q r 0 Case I Case II b d b d 5 0 Case III Case IV b 5, d 5 b 0, d 0 6

27 Simulation MSE comparison MSE E{ β β } MSE, * E{, * β, * β } Figure he plot of the MSE curve with b=d=5. 7

28 Simulation MSE comparison MSE ˆ E{ ˆ β ˆ β } MSE ˆ, ˆ E{ ˆ, ˆ β ˆ, ˆ β } MSE E β,,, p able Simulation results based on 500 replicates with b=d=5. setting estimate E ˆ E ˆ MSE ˆ MSE ˆ MSE p p 50 p 00 p 50 p 00 Ridge Proposed Ridge Proposed Ridge Proposed Ridge Proposed

29 Simulation Significance testing Hypothesis Wald test statistic H 0 v.s. H 0 0, 50 : 50, 50 : 50 Note: H 0, , is true Z ˆ ˆ 50 ˆ, se{ ˆ ˆ, ˆ } H 50 N 50 0, ~ 50 0, ype I error Z 500 s I Z s Z / the upper / percent point of / N 0, 9

30 Simulation Significance testing 0.05 able 3 Simulation results for significance testing based on 500 replicates with b=d=5. E ˆ sd ˆ E Z sd Z ype I error p p p p Note: ˆ ˆ ˆ s E / s 500. ˆ ˆ s ˆ sd / 499 s ˆ s E Z Z Z / s 500 s 4. ˆ sd Z50 Z50 Z50 / 499 s 50 30

31 Simulation Significance testing Hypothesis Wald test statistic H 0 v.s. H 0 0, :, : Note: H, 0, is true Z ˆ ˆ ˆ, se{ ˆ ˆ, ˆ } H N 0, ~ 0, Power 500 s I Z s 500 Z / Z the upper / percent point of / N 0, 3

32 Simulation Significance testing 0.05 able 4 Simulation results for significance testing based on 500 replicates with b=d=5. Note: E ˆ sd ˆ E Z sd Z Power p p p p ˆ ˆ ˆ s E / 500 s 500. ˆ ˆ s ˆ sd / 499 s ˆ s E Z Z Z / 500 s 500 s 4. ˆ sd Z Z Z / 499 s 3

33 NSCLC data Epithelial lung cancer Small cell lung cancer Non-small cell lung cancer: Squamous cell carcinoma Large cell carcinoma adenocarcinoma y Xβ ε 394 selected gene signatures EGFR index from 4 patients n 4, p

34 NSCLC data 4-fold cross validation 3 4 rain rain est 3 rain 4 3 patients 3 patients 3 patients 3 patients x i ith row of X ˆ k β estimator based on all the data not in k Prediction error PE 4 4 k i k y i x i k 34

35 NSCLC data 4-fold cross validation able 5 PE comparison of the ridge regression and the proposed method over 00 random cross validation. No. of replicate proposed ˆ proposed ˆ ridge PE ˆridge proposed PE Average

36 NSCLC data Prediction Estimation Regressors selection Prediction 6 patients raining sample 6 patients esting sample 36

37 NSCLC data Prediction able 6 he 0 most strongly associated genes Figure 3 he plots of y i against x est rain i Ridge Proposed method No. Gene symbol Coefficient P-value Gene symbol Coefficient P-value FGA E-07 FGA E-07 AKRB E-07 AKRB E-06 3 CPS E-05 CPS E-05 4 KR6A E-05 FGG E-05 5 MSMB MSMB FGG KR6A CYPB7P CYPB7P SERPINB FGB FGB CYPB CYPB SERPINB LOC GPR SERPINB LOC GPR CRP HSD7B SERPINB CRP SLC6A DKK HSD7B SLC34A DKK MUC MUC CPN CPN SLC6A PE

38 Conclusion Introduction Methodology Numerical analysis Conclusion 38

39 Conclusion Proposed estimator under high-dimensionality Able to utilize prior knowledge Good Baysian interpretation and theoretical calculation Reduction of the MSE Significance testing Apply to microarray data 39

40 Reference Allen, D. M he relationship between variable selection and data augmentation and a method for prediction. echnometrics 6, 5-7. Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 5, Cule, E., Vineis, P. and De Iorio, M. 0. Significance testing in ridge regression for genetic data. BMC Bioinformatics, 37. Emura,., Chen, Y. H., and Chen, H. Y. 0. Survival prediction based on compound covariate under cox proportional hazard models. PLoS ONE 7, e4767. Golub, G. H., Heath, M. and Wahba, G Generalized cross-validation as a method for choosing a good ridge parameter. echnometrics, 5-3. Hoerl, A. E. and Kennard, R. W Ridge regression: biased estimation for nonorthogonal problems. echnometrics, Ing, C.-K. and Lai,.-L. 0. A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, Loesgen, K.-H A generalization and Bayesian interpretation of ridge-type estimators with good prior means. Statistical Papers 3, McLachlan, G. J On the mean square error associated with adaptive generalized ridge regression. Biometrical Journal, 5-9. rekler, G Mean square error matrix comparisons of estimators in linear regression. Communications in Statistics A4, Whittaker, J. C., hompson, R., and Denham, M. C., 000. Marker-assisted selection using ridge regression. Genetical Research 75, Zhao, X., Rødland, E. A., and Sørlie,., et al. 0. Combining gene signatures improves prediction of breast cancer survival. PLoS ONE 6, e

41 hank you for your listening. 4

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION

EFFICIENCY of the PRINCIPAL COMPONEN LIU- YPE ESIMAOR in LOGISIC REGRESSION Authors: Jibo Wu School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, China, linfen52@126.com