A class of generalized ridge estimator for high-dimensional linear regression
|
|
- Katherine Hines
- 5 years ago
- Views:
Transcription
1 A class of generalized ridge estimator for high-dimensional linear regression Advisor: akeshi Emura Presenter: Szu-Peng Yang June 3, 04 Graduate Institute of Statistics, NCU
2 Outline Introduction Methodology Numerical analysis Conclusion
3 Introduction Introduction Methodology Numerical analysis Conclusion 3
4 Background Model y X β ε n n p p n n I, ε ~ N 0, Least square estimator LSE X X X y Advantage: unbiasedness, minimum variance Disadvantage: high-dimensionality p n Performance in terms of MSE? 4
5 Background High-dimensionality Microarray data p 394 n gene signatures EGFR index from 4 patients 5
6 Ridge regression Ridge estimator Hoerl and Kennard 970 X X I X y, 0. Shrinkage parameter nearest point to 0 Shrink toward 0 0 β ˆ Infinitely many solution in LSE 6
7 Ridge regression heorem Existence heorem, Hoerl and Kennard, 970 here always exists a 0 MSE{ } MSE. such that Note: ˆ ˆ MSE β E{ β β β } Let P be orthogonal matrix such that X X P P where is a diagonal matrix. hen α,..., p P, and max β max{,..., }. p Figure he plot of the MSE of the ridge estimator. 7
8 Ridge regression Is the ridge estimator adapt to p n case? - Whittaker, hompson and Denham Zhao, Rødland and Sørlie, et al. 0 - Cule, Vieis and De Iorio 0 SNP data, Significance testing 8
9 Generalized ridge regression Generalized ridge estimator Hoerl and Kennard 970 W X X W X y, W is a diagonal matrix. - Allen McLachlan Loesgen 990 9
10 Generalized ridge regression Benefit. different weights for different regressors. Performance in terms of MSE McLachlan, 980 Problem: No application in p n case. 0
11 Methodology Introduction Methodology Numerical analysis Conclusion
12 Baysian interpretation Good Bayesian interpretation Loesgen, 990: Prior β ~ 0, W N p y Xβ ε, ε ~ Nn 0, I Bayes estimator posterior mean E[ β X, y ] X X W X y
13 Baysian interpretation β,..., p,,..., p are independent W is a diagonal matrix and W diag w,..., wp. W W precision of prior Precision of prior knowledge 0 w 3
14 Baysian interpretation How to choose shrinkage parameters?. More likely 0 w small w large. More likely 0 w large w small 4
15 Proposed method Idea Define W diag w,..., wp, w if if 0 0, [ 0,], 0,,..., p. Problem:,..., p unknown estimate W from data. 5
16 Proposed method Estimator Initial estimate β,..., p, ˆ x y 0 x x ˆ for,..., p where, for,..., p, are the columns of X. x ˆ 0 p ˆ 0 p se 0 p ˆ 0 0 p Under the sparse model β 0, ˆ 0 / ˆ 0 se β ~ N 0, 6
17 Proposed method Estimator Proposed estimator, { X X Wˆ } X y, 0, where Wˆ diag{ wˆ,..., wˆ p } and wˆ / if ˆ 0 otherwise, / se 0, for,..., p. hresholding parameter 7
18 Proposed method Parameters estimation Allen s PRESS 974 generalized cross-validation GCV Golub, Heath and Wahba, 979 Minimizer of GCV ˆ 0 / ˆ 0 se β ~ N 0, Choose among D { 0, 3/00,..., 300 /00 } ˆ able Formula of GCV function Ridge estimator V n { I A A } y r{ I n X X X I X A } Proposed estimator V, { I A, } y n A r{ I n, X{ X X W } X ˆ A, } 8
19 MSE comparison Mean square error matrix MSE r{ MSEM } - renkler Loesgen 990 MSEM E{ -β -β } C dd Mean square error C Cov MSE E{ β β } E{ E }{ E } d Bias E β 9
20 MSE comparison Lemma renkler, 985 Suppose A is a symmetric matrix, is an vector and is a positive real number. hen Aaa is n.n.d. if and only if p A is n.n.d. a Av for some v R 3 where A is the generalized inverse of n n a A. n a A a Note: the diagonals of a nonnegative definite n.n.d. matrix are nonnegative. 0
21 MSE comparison i X MSEM X W i X y MSEM for i,. C d C d d d heorem MSEM ˆ MSEM β conditions hold: C C is.n.n.d. / is n.n.d. if all the three C C d is full rank d 3 d C C dd d
22 MSE comparison Example β, 0, 0 W I diag, W diag.5,.5, 0 X X I For simplicity 33 C / C diag 4, 96.5 C C d d {.5 4 }.5 33 diag, d Result: C C d d d 5 {.5 4 }.5 Figure he plot of MSE ˆ MSE β against. not too large, /5 MSEM ˆ MSEM β d C C d d d n.n.d.
23 Significance testing Cule, Vineis and De Iorio, 0 Hypothesis H : 0 v.s. H : 0,,..., p 0 Wald test statistics Z ˆ ˆ, ˆ se{ ˆ ˆ, ˆ } wo-sided P-value H 0 ~ N 0,,,..., p P Pr Z Z or Z Z Pr Z Z 3
24 Numerical analysis Introduction Methodology Numerical analysis Conclusion 4
25 Simulation Model setting Sparse model - Emura, Chen and Chen 0 - Ing and Lai 0 - Binder, et al. 009 Correlated regressors y Xβ ε, ε ~ Nn 0, I q r β b / q,..., b / q, d / r,..., d / r, p qr 0,..., 0 Corr 0.5 Corr 0.5 5
26 Simulation Model setting n 00, p { 50,00,50, 00 }, q r 0 Case I Case II b d b d 5 0 Case III Case IV b 5, d 5 b 0, d 0 6
27 Simulation MSE comparison MSE E{ β β } MSE, * E{, * β, * β } Figure he plot of the MSE curve with b=d=5. 7
28 Simulation MSE comparison MSE ˆ E{ ˆ β ˆ β } MSE ˆ, ˆ E{ ˆ, ˆ β ˆ, ˆ β } MSE E β,,, p able Simulation results based on 500 replicates with b=d=5. setting estimate E ˆ E ˆ MSE ˆ MSE ˆ MSE p p 50 p 00 p 50 p 00 Ridge Proposed Ridge Proposed Ridge Proposed Ridge Proposed
29 Simulation Significance testing Hypothesis Wald test statistic H 0 v.s. H 0 0, 50 : 50, 50 : 50 Note: H 0, , is true Z ˆ ˆ 50 ˆ, se{ ˆ ˆ, ˆ } H 50 N 50 0, ~ 50 0, ype I error Z 500 s I Z s Z / the upper / percent point of / N 0, 9
30 Simulation Significance testing 0.05 able 3 Simulation results for significance testing based on 500 replicates with b=d=5. E ˆ sd ˆ E Z sd Z ype I error p p p p Note: ˆ ˆ ˆ s E / s 500. ˆ ˆ s ˆ sd / 499 s ˆ s E Z Z Z / s 500 s 4. ˆ sd Z50 Z50 Z50 / 499 s 50 30
31 Simulation Significance testing Hypothesis Wald test statistic H 0 v.s. H 0 0, :, : Note: H, 0, is true Z ˆ ˆ ˆ, se{ ˆ ˆ, ˆ } H N 0, ~ 0, Power 500 s I Z s 500 Z / Z the upper / percent point of / N 0, 3
32 Simulation Significance testing 0.05 able 4 Simulation results for significance testing based on 500 replicates with b=d=5. Note: E ˆ sd ˆ E Z sd Z Power p p p p ˆ ˆ ˆ s E / 500 s 500. ˆ ˆ s ˆ sd / 499 s ˆ s E Z Z Z / 500 s 500 s 4. ˆ sd Z Z Z / 499 s 3
33 NSCLC data Epithelial lung cancer Small cell lung cancer Non-small cell lung cancer: Squamous cell carcinoma Large cell carcinoma adenocarcinoma y Xβ ε 394 selected gene signatures EGFR index from 4 patients n 4, p
34 NSCLC data 4-fold cross validation 3 4 rain rain est 3 rain 4 3 patients 3 patients 3 patients 3 patients x i ith row of X ˆ k β estimator based on all the data not in k Prediction error PE 4 4 k i k y i x i k 34
35 NSCLC data 4-fold cross validation able 5 PE comparison of the ridge regression and the proposed method over 00 random cross validation. No. of replicate proposed ˆ proposed ˆ ridge PE ˆridge proposed PE Average
36 NSCLC data Prediction Estimation Regressors selection Prediction 6 patients raining sample 6 patients esting sample 36
37 NSCLC data Prediction able 6 he 0 most strongly associated genes Figure 3 he plots of y i against x est rain i Ridge Proposed method No. Gene symbol Coefficient P-value Gene symbol Coefficient P-value FGA E-07 FGA E-07 AKRB E-07 AKRB E-06 3 CPS E-05 CPS E-05 4 KR6A E-05 FGG E-05 5 MSMB MSMB FGG KR6A CYPB7P CYPB7P SERPINB FGB FGB CYPB CYPB SERPINB LOC GPR SERPINB LOC GPR CRP HSD7B SERPINB CRP SLC6A DKK HSD7B SLC34A DKK MUC MUC CPN CPN SLC6A PE
38 Conclusion Introduction Methodology Numerical analysis Conclusion 38
39 Conclusion Proposed estimator under high-dimensionality Able to utilize prior knowledge Good Baysian interpretation and theoretical calculation Reduction of the MSE Significance testing Apply to microarray data 39
40 Reference Allen, D. M he relationship between variable selection and data augmentation and a method for prediction. echnometrics 6, 5-7. Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 5, Cule, E., Vineis, P. and De Iorio, M. 0. Significance testing in ridge regression for genetic data. BMC Bioinformatics, 37. Emura,., Chen, Y. H., and Chen, H. Y. 0. Survival prediction based on compound covariate under cox proportional hazard models. PLoS ONE 7, e4767. Golub, G. H., Heath, M. and Wahba, G Generalized cross-validation as a method for choosing a good ridge parameter. echnometrics, 5-3. Hoerl, A. E. and Kennard, R. W Ridge regression: biased estimation for nonorthogonal problems. echnometrics, Ing, C.-K. and Lai,.-L. 0. A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, Loesgen, K.-H A generalization and Bayesian interpretation of ridge-type estimators with good prior means. Statistical Papers 3, McLachlan, G. J On the mean square error associated with adaptive generalized ridge regression. Biometrical Journal, 5-9. rekler, G Mean square error matrix comparisons of estimators in linear regression. Communications in Statistics A4, Whittaker, J. C., hompson, R., and Denham, M. C., 000. Marker-assisted selection using ridge regression. Genetical Research 75, Zhao, X., Rødland, E. A., and Sørlie,., et al. 0. Combining gene signatures improves prediction of breast cancer survival. PLoS ONE 6, e
41 hank you for your listening. 4
EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION
EFFICIENCY of the PRINCIPAL COMPONEN LIU- YPE ESIMAOR in LOGISIC REGRESSION Authors: Jibo Wu School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, China, linfen52@126.com
More informationSurvival Prediction Under Dependent Censoring: A Copula-based Approach
Survival Prediction Under Dependent Censoring: A Copula-based Approach Yi-Hau Chen Institute of Statistical Science, Academia Sinica 2013 AMMS, National Sun Yat-Sen University December 7 2013 Joint work
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationMulticollinearity and A Ridge Parameter Estimation Approach
Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com
More informationEffect of outliers on the variable selection by the regularized regression
Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationFull versus incomplete cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation
cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation IIM Joint work with Christoph Bernau, Caroline Truntzer, Thomas Stadler and
More informationapplication in microarrays
Biostatistics Advance Access published April 7, 2006 Regularized linear discriminant analysis and its application in microarrays Yaqian Guo, Trevor Hastie and Robert Tibshirani Abstract In this paper,
More informationComparison of Some Improved Estimators for Linear Regression Model under Different Conditions
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under
More informationImproved Liu Estimators for the Poisson Regression Model
www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author
More informationRidge Regression and Ill-Conditioning
Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationRegularized Discriminant Analysis and Its Application in Microarrays
Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,
More informationResearch Article On the Stochastic Restricted r-k Class Estimator and Stochastic Restricted r-d Class Estimator in Linear Regression Model
Applied Mathematics, Article ID 173836, 6 pages http://dx.doi.org/10.1155/2014/173836 Research Article On the Stochastic Restricted r-k Class Estimator and Stochastic Restricted r-d Class Estimator in
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationUnivariate shrinkage in the Cox model for high dimensional data
Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)
More informationFocused fine-tuning of ridge regression
Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationNonlinear Inequality Constrained Ridge Regression Estimator
The International Conference on Trends and Perspectives in Linear Statistical Inference (LinStat2014) 24 28 August 2014 Linköping, Sweden Nonlinear Inequality Constrained Ridge Regression Estimator Dr.
More informationApproximate analysis of covariance in trials in rare diseases, in particular rare cancers
Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework
More informationDIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux
Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationRegression Model In The Analysis Of Micro Array Data-Gene Expression Detection
Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics
More informationADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS
Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCorrelation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University
Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationIntroduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data
Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation
More informationMeasuring Local Influential Observations in Modified Ridge Regression
Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationPart IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation
Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationRidge Regression Revisited
Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge
More informationA novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer
A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer Binghui Liu, Chong Wu, Xiaotong Shen, Wei Pan University of Minnesota, Minneapolis, MN 55455 Nov 2017 Introduction
More informationImproved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers
Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationGenomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.
Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationAuxiliary-variable-enriched Biomarker Stratified Design
Auxiliary-variable-enriched Biomarker Stratified Design Ting Wang University of North Carolina at Chapel Hill tingwang@live.unc.edu 8th May, 2017 A joint work with Xiaofei Wang, Haibo Zhou, Jianwen Cai
More informationBayes Estimators & Ridge Regression
Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationA STUDY OF PRE-VALIDATION
A STUDY OF PRE-VALIDATION Holger Höfling Robert Tibshirani July 3, 2007 Abstract Pre-validation is a useful technique for the analysis of microarray and other high dimensional data. It allows one to derive
More informationGene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm
Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,
More informationA Framework for Unbiased Model Selection Based on Boosting
Benjamin Hofner, Torsten Hothorn, Thomas Kneib & Matthias Schmid A Framework for Unbiased Model Selection Based on Boosting Technical Report Number 072, 2009 Department of Statistics University of Munich
More informationStatistical Inference
Statistical Inference J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France SPM Course Edinburgh, April 2011 Image time-series Spatial
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationSingle gene analysis of differential expression
Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition
More informationLasso regression. Wessel van Wieringen
Lasso regression Wessel van Wieringen w.n.van.wieringen@vu.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Amsterdam, The Netherlands Lasso regression Instead
More informationGeometry and properties of generalized ridge regression in high dimensions
Contemporary Mathematics Volume 622, 2014 http://dx.doi.org/10.1090/conm/622/12438 Geometry and properties of generalized ridge regression in high dimensions Hemant Ishwaran and J. Sunil Rao Abstract.
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationSPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1
SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationAbstract We propose an extension of the expectation-maximization (EM) algorithm, called the hyperpenalized EM (HEM) algorithm, that maximizes a
Noname manuscript No. (will be inserted by the editor) Data-adaptive Shrinkage via the Hyperpenalized EM Algorithm Philip S. Boonstra Jeremy M. G. Taylor Bhramar Mukherjee the date of receipt and acceptance
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationRegularized Discriminant Analysis and Its Application in Microarray
Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationA SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA
A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr With inputs from Anne Bernard, Ph.D student Industrial
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationCOMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso
COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified
More informationCorrecting gene expression data when neither the unwanted variation nor the factor of interest are observed
Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Laurent Jacob 1 laurent@stat.berkeley.edu Johann Gagnon-Bartsch 1 johann@berkeley.edu Terence
More informationInstitute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR
DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable
More informationOther likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression
Other likelihoods Patrick Breheny April 25 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/29 Introduction In principle, the idea of penalized regression can be extended to any sort of regression
More informationSmoothed and Corrected Score Approach to Censored Quantile Regression With Measurement Errors
Smoothed and Corrected Score Approach to Censored Quantile Regression With Measurement Errors Yuanshan Wu, Yanyuan Ma, and Guosheng Yin Abstract Censored quantile regression is an important alternative
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationPractical considerations for survival models
Including historical data in the analysis of clinical trials using the modified power prior Practical considerations for survival models David Dejardin 1 2, Joost van Rosmalen 3 and Emmanuel Lesaffre 1
More informationBayesian Estimation of a Possibly Mis-Specified Linear Regression Model
Econometrics Working Paper EWP14 ISSN 1485-6441 Department of Economics Bayesian Estimation of a Possibly Mis-Specified Linear Regression Model David E. Giles Department of Economics, University of Victoria
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationMulti Omics Clustering. ABDBM Ron Shamir
Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)
More informationGsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information
Tang et al. BMC Bioinformatics (2019) 20:94 https://doi.org/10.1186/s12859-019-2656-1 METHODOLOGY ARTICLE Open Access Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated
More informationRidge Estimator in Logistic Regression under Stochastic Linear Restrictions
British Journal of Mathematics & Computer Science 15(3): 1-14, 2016, Article no.bjmcs.24585 ISSN: 2231-0851 SCIENCEDOMAIN international www.sciencedomain.org Ridge Estimator in Logistic Regression under
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationApproaches for Multiple Disease Mapping: MCAR and SANOVA
Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR
More information