High-dimensional Ordinary Least-squares Projection for Screening Variables

Size: px
Start display at page:

Download "High-dimensional Ordinary Least-squares Projection for Screening Variables"

Transcription

1 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor Professor Grace Wahba 4-6 June, 2014

2 2 / 38 The Story Started working as a project assistant for Grace; Intended to work on spline density estimation (but never completed); Thursday group meetings; Fortunately graduated!!

3 Back to the future 3 / 38

4 4 / 38 Outline Introduction and motivation Theory Simulation and application Conclusion and future research An open question

5 5 / 38 The setup Consider the linear regression model y = β 1 x 1 + β 2 x β p x p + ε. With data we write Y = Xβ + ɛ, where Y R n, X R n p, and ɛ R n consists of i.i.d. errors. Notations M = {x 1,..., x p } as the full model M S as the true model where S = {j : β j 0, j = 1,, p} with s = S.

6 6 / 38 Introduction In high-dimensional data analysis: The dimension p is much larger than the sample size n; The number of the important variables s is often much smaller s n; The goal is to identify these important variables. Two approaches One-stage: Selection and estimation, often optimisation based; Two-stage: Screening followed by some one-stage approach.

7 7 / 38 One-stage methods Penalised likelihood with a sparsity inducing penalty. Lasso (Tibshirani, 1996), SCAD (Fan and Li, 2001), elastic net (Zou and Hastie, 2005), grouped Lasso (Yuan and Lin, 2006), Cosso (Lin and Zhang, 2006), Dantzig selector (Candes and Tao, 2007) and so on. Convex and non-convex optimisation. Different conditions for estimation and selection consistency: difficult to achieve both (Leng, Lin and Wahba, 2006). Subsampling approaches for consistency: computationally intensive.

8 8 / 38 Two-stage methods Screen first, refine next. Intuition: choosing a superset is easier than estimating the exact set. Actually widely used: in cancer classification, use marginal t tests for example. Fan and Lv (2008) put forward a theory for marginal screening in linear regression by retaining variables with large marginal correlations (with the response): sure independent screening (SIS) Marginal: Generalised to many models including GLMs, Cox s model, GAMs, varying-coefficient models, and etc. Correlation: Generalised notions of correlation. Alternative iterative procedures: forward selection (Wang, 2009) and tilting (Cho and Frylewicz, 2012).

9 9 / 38 Elements of Screening 2 Elements: Computational: key; otherwise we can use optimisation-based approaches such as Lasso for screening too! Theoretical: Screening property; the superset must contain the important variables (with probability tending to one) the sure screening property. Remark: Ideally the sure screening property should hold under general conditions.

10 10 / 38 Motivation Let s look at a class of estimates of β as where A R p n. β = AY, Screening procedure: Choose a submodel M d that retains the d << p largest entries of β, M d = {x j : β j are among the largest d of all β j s}. Ideally, β maintains the rank order of the entries of β the nonzero entries of β are large in β relatively the zero entries of β are small in β relatively.

11 11 / 38 Signal noise analysis Note β = AY = A(Xβ + ɛ) = (AX)β + Aɛ. Signal (AX)β + Noise Aɛ The noise part is small stochastically In order for β to preserve the rank order of β, ideally AX = I, or AX I The above discussion motivated us to use some inverse of X. The SIS in Fan and Lv (2008) set A = X T and thus β = X T Y.

12 12 / 38 Inverse of X Look for A such that AX I. When p < n, A = (X T X) 1 X T gives rise to the OLS estimator. When p > n, Moore-Penrose inverse of X as A = X T (XX T ) 1, unique to high-dimensional data. We use ˆβ = X T (XX T ) 1 Y named the High-dimensional Ordinary Least-squares Projector (HOLP): high-d OLS.

13 13 / 38 Remarks Write ˆβ = X T (XX T ) 1 Xβ + X T (XX T ) 1 ɛ, HOLP projects β onto the row space of X; OLS projects β onto the column space of X Straightforward to implement Can be efficiently computed: O(n 2 p), as opposed to O(np) of SIS

14 14 / 38 A comparison of the screening matrices The screening matrix AX in AY = (AX)β + Aɛ HOLP: X T (XX T ) 1 X SIS: X T X A quick simulation: n = 50, p = 1000, x N(0, Σ). Three setups Independent: Σ = I CS: σ jk = 0.6 for j k AR(1): σ jk = j k

15 15 / 38 Screening matrices SIS, Ind SIS, CS SIS, AR(1) HOLP, Ind HOLP, CS HOLP, AR(1)

16 16 / 38 Theory Assumptions p > n and log p = O(n γ ) for some γ > 0. Conditions on the eigenvalues of XΣ 1 X T /p and the distribution of Σ 1/2 x where Σ = var(x) Conditions on the magnitude of the smallest β j for j S Conditions on s and the condition number of Σ However, we don t need the marginal correlation assumption which requires y and the important x j with j S to satisfy min j S cov(β 1 j y, x j ) c.

17 17 / 38 Marginal screening The marginal correlation assumption is vital to all marginal screening approaches. In SIS, AY = X T Y = X T Xβ + X T ɛ. The SIS signal X T Xβ Σβ: β j nonzero doesn t imply (Σβ) j nonzero. For HOLP, X T (XX T ) 1 Xβ Iβ = β.

18 18 / 38 Theorem 1 (Screening property of HOLP) Under mild conditions, if we choose the submodel size d p properly, the M d chosen by HOLP satisfies ( ) P(M S M d ) = 1 O p exp( nc 1 log n ). Theorem 2 (Screening consistency of HOLP) Under mild conditions, the HOLP estimator satisfies ( ) ( ) P min ˆβ j > max ˆβ j = 1 O p exp( nc 2 j S j S log n ).

19 19 / 38 Another motivation for HOLP The ridge regression estimator ˆβ(r) = (ri + X T X) 1 X T Y, where r is the ridge parameter. Letting r gives r ˆβ(r) X T Y, the SIS. Letting r 0 gives ˆβ(r) (X T X) + X T Y An application of the Sherman-Morrison-Woodbury formula gives (ri + X T X) 1 X T Y = X T (ri + XX T ) 1 Y. Then letting r 0 gives which gives us HOLP. (X T X) + X T Y = X T (XX T ) 1 Y,

20 20 / 38 Ridge Regression Theorem 3 (Screening consistency of ridge regression) Under mild conditions, with a proper ridge parameter r, the ridge regression estimator satisfies P ( ) ( ) min ˆβ j (r) > max ˆβ j (r) = 1 O exp( nc 3 j S j S log n ). Remark The theorem holds in particular when the ridge parameter r is fixed. Potential to generalise to GLMs, Cox s models and etc. Ongoing.

21 21 / 38 Simulation (p, n) = (1000, 100) or (10000, 200) Signal to noise ratio R 2 = 0.5 or 0.9 Σ and β (i) Independent predictors β i = ( 1) u i ( N(0, 1) + 4 log n/ n) where u i Ber(0.4) for i S and β i = 0 for i S. (ii) Compound symmetry: β i = 5 for i = 1,..., 5 and β i = 0 otherwise, ρ = 0.3, 0.6, 0.9. (iii) Autoregressive correlation: β 1 = 3, β 4 = 1.5, β 7 = 2, and β i = 0 otherwise.

22 22 / 38 More setups (iv) Factor model: x i = k j=1 φ jf ij + η i, where f ij and η i and φ j are iid normal. Coefs as in CS. (v) Group structure: 15 true variables into three groups. x j+3m = z j + N(0, δ 2 ). β i = 3, i 15; β i = 0, i > 15. where m = 0,..., 4, j = 1, 2, 3, and δ 2 is 0.01, 0.05 or 0.1. (vi) Extreme correlation: x i = (z i + w i )/ 2, i = 1,, 5 and x i = (z i + 5 j=1 w j)/2, i = 16,, p. Coefs as in (ii). The response variable is more correlated to a large number of unimportant variables. Make it even harder, x i+s, x i+2s = x i + N(0, 0.01), i = 1,, 5.

23 23 / 38 (p, n) = (1000, 100): R 2 = 0.5 Example HOLP SIS ISIS FR Tilting (i) Ind ρ = (ii) CS ρ = ρ = ρ = (iii) AR(1) ρ = ρ = k = (iv) Factor k = k = δ 2 = (v) Group δ 2 = δ 2 = (vi) Extreme

24 24 / 38 (p, n) = (1000, 100): R 2 = 0.9 Example HOLP SIS ISIS FR Tilting (i) Ind ρ = (ii) CS ρ = ρ = ρ = (iii) AR(1) ρ = ρ = k = (iv) Factor k = k = δ 2 = (v) Group δ 2 = δ 2 = (vi) Extreme

25 25 / 38 (p, n) = (10000, 200): R 2 = 0.5 Example HOLP SIS ISIS FR Tilting (i) Ind ρ = (ii) CS ρ = ρ = ρ = (iii) AR(1) ρ = ρ = k = (iv) Factor k = k = δ 2 = (v) Group δ 2 = δ 2 = (vi) Extreme

26 26 / 38 (p, n) = (10000, 200): R 2 = 0.9 Example HOLP SIS ISIS FR Tilting (i) Ind ρ = (ii) CS ρ = ρ = ρ = (iii) AR(1) ρ = ρ = k = (iv) Factor k = k = δ 2 = (v) Group δ 2 = δ 2 = (vi) Extreme

27 27 / 38 A demonstration of Theorem 2 and 3 We set { 4 [exp(n 1/3 )] for examples except Example (vi) p = and 20 [exp(n 1/4 )] for Example (vi) { 1.5 [n 1/4 ] for R 2 = 90% s = [n 1/4 ] for R 2 = 50%

28 28 / 38 Theorem 2 HOLP with R 2 = 90% HOLP with R 2 = 50% probability all β^true >β^false (i) (ii) (iii) (iv) (v) (vi) probability all β^true >β^false (i) (ii) (iii) (iv) (v) (vi) Figure 1: HOLP: P(min i S ˆβ i > max i S ˆβ i ) versus the sample size n.

29 29 / 38 Theorem 3 ridge HOLP with R 2 = 90% ridge HOLP with R 2 = 50% probability all β^true >β^false (i) (ii) (iii) (iv) (v) (vi) probability all β^true >β^false (i) (ii) (iii) (iv) (v) (vi) Figure 2: ridge-holp (r = 10): P(min i S ˆβ i > max i S ˆβ i ) versus sample size n.

30 30 / 38 Computation efficiency: Varying supset p=1000,n=100 p=1000,n=100 (tilting excluded) time cost (sec) Tilting Forward regression ISIS HOLP SIS time cost (sec) Forward regression ISIS HOLP SIS Figure 3: Computational time against the submodel size when (p, n) = (1000, 100).

31 31 / 38 Computation efficiency: Varying p d=50,n=100 d=50,n=100 (tilting excluded) time cost (sec) Tilting Forward regression ISIS HOLP SIS time cost (sec) Forward regression ISIS HOLP SIS Figure 4: Computational time against the total number of the covariates when (d, n) = (50, 100).

32 32 / 38 A data analysis The mammalian eye diseases (Scheetz et al., 2006) gene expressions on the eye tissues from 120 twelve-week-old male F2 rats gene coded as TRIM32 responsible for causing Bardet-Biedl syndrome Focused on 5000 genes (out of about 19K genes) with the highest sample variance

33 33 / 38 Table 1: The 10-fold cross validation error for nine different methods Methods Mean errors Standard errors Final size Lasso SCAD ISIS-SCAD SIS-SCAD FR-Lasso FR-SCAD HOLP-Lasso HOLP-SCAD tilting NULL

34 34 / 38 Table 2: Commonly selected genes for different methods Probe ID Gene name BE Zfp292 BF BE Lasso yes yes yes SCAD yes yes yes ISIS-SCAD yes SIS-SCAD yes yes FR-Lasso yes yes FR-SCAD yes yes yes HOLP-Lasso yes yes HOLP-SCAD yes yes Tilting

35 35 / 38 Take-home message For a linear model Y = Xβ + ɛ, with normalized data, follow variable screening step 1, 2, Compute HOLP ˆβ = X T (XX T ) 1 Y. 2. Retain the d variables (usually can take d = n) corresponding to the d largest entries of ˆβ. 3. Screening is done! Start thinking about building a refined model based on the remaining d variables.

36 36 / 38 Conclusion HOLP Computationally efficient Theoretical appealing Methodologically simple Generalisable via its ridge version Future work (ongoing) GLMs Cox s models Screening for compressed sensing Grouped variable screening, GAMs...

37 An open question 37 / 38

38 38 / 38 He who teaches me for one day is my father for life. Thank you!

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables High-dimensional Ordinary Least-squares Projection for Screening Variables Xiangyu Wang and Chenlei Leng arxiv:1506.01782v1 [stat.me] 5 Jun 2015 Abstract Variable selection is a challenging issue in statistical

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS R3 Statistica Sinica Preprint No: SS-2015-0413.R3 Title Regularization after retention in ultrahigh dimensional linear regression models Manuscript ID SS-2015-0413.R3 URL http://www.stat.sinica.edu.tw/statistica/

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Variable Screening in High-dimensional Feature Space

Variable Screening in High-dimensional Feature Space ICCM 2007 Vol. II 735 747 Variable Screening in High-dimensional Feature Space Jianqing Fan Abstract Variable selection in high-dimensional space characterizes many contemporary problems in scientific

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Dhruv B. Sharma, Howard D. Bondell and Hao Helen Zhang Abstract Statistical procedures for variable selection

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS. November The University of Iowa. Department of Statistics and Actuarial Science

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS. November The University of Iowa. Department of Statistics and Actuarial Science ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Shuangge Ma 2, and Cun-Hui Zhang 3 1 University of Iowa, 2 Yale University, 3 Rutgers University November 2006 The University

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Covariate-Assisted Variable Ranking

Covariate-Assisted Variable Ranking Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Hard Thresholded Regression Via Linear Programming

Hard Thresholded Regression Via Linear Programming Hard Thresholded Regression Via Linear Programming Qiang Sun, Hongtu Zhu and Joseph G. Ibrahim Departments of Biostatistics, The University of North Carolina at Chapel Hill. Q. Sun, is Ph.D. student (E-mail:

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Feature Screening in Ultrahigh Dimensional Cox s Model

Feature Screening in Ultrahigh Dimensional Cox s Model Feature Screening in Ultrahigh Dimensional Cox s Model Guangren Yang School of Economics, Jinan University, Guangzhou, P.R. China Ye Yu Runze Li Department of Statistics, Penn State Anne Buu Indiana University

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17 Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7 The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear

More information

Variable selection in high-dimensional regression problems

Variable selection in high-dimensional regression problems Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

Indirect multivariate response linear regression

Indirect multivariate response linear regression Biometrika (2016), xx, x, pp. 1 22 1 2 3 4 5 6 C 2007 Biometrika Trust Printed in Great Britain Indirect multivariate response linear regression BY AARON J. MOLSTAD AND ADAM J. ROTHMAN School of Statistics,

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Forward Selection and Estimation in High Dimensional Single Index Models

Forward Selection and Estimation in High Dimensional Single Index Models Forward Selection and Estimation in High Dimensional Single Index Models Shikai Luo and Subhashis Ghosal North Carolina State University August 29, 2016 Abstract We propose a new variable selection and

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL Statistica Sinica 26 (2016), 881-901 doi:http://dx.doi.org/10.5705/ss.2014.171 FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL COX S MODEL Guangren Yang 1, Ye Yu 2, Runze Li 2 and Anne Buu 3 1 Jinan University,

More information

12. Perturbed Matrices

12. Perturbed Matrices MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information