Some Sparse Methods for High Dimensional Data. Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris

Size: px
Start display at page:

Download "Some Sparse Methods for High Dimensional Data. Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris"

Transcription

1 Some Sparse Methods for High Dimensional Data Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris

2 A joint work with Anne Bernard (QFAB Bioinformatics) Stéphanie Bougeard (ANSES) Ndeye Niang (CNAM) Cristian Preda (Lille) H2DM, Napoli, June 7,

3 Outline 1. Introduction 2. Sparse regression 3. Sparse PCA 4. Group sparse PCA and sparse MCA 5. Clusterwise sparse PLS regression 6. Conclusion and perspectives H2DM, Napoli, June 7,

4 1.Introduction High dimensional data: p>>n Gene expression data Chemometrics etc. Several solutions for regression problems with all variables (PLS, ridge, PCR) ; but interpretation is difficult Sparse methods: provide combinations of few variables H2DM, Napoli, June 7,

5 This talk: a survey of sparse methods for supervised (regression) and unsupervised (PCA) problems New propositions for large n and large p when unobserved heterogeneity among observations occurs Clusterwise sparse PLS regression H2DM, Napoli, June 7,

6 2. Sparse regression Keeping all predictors is a drawback for high dimensional data: combinations of too many variables cannot be interpreted Sparse methods simultaneously shrink coefficients and select variables, hence better predictions H2DM, Napoli, June 7,

7 2.1 Lasso and elastic-net The Lasso (Tibshirani, 1996) is a shrinkage and selection method. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients (L 1 penalty). min y - Xβ with 2 2 ˆ βlasso arg min y Xβ β p j1 j c p j1 j H2DM, Napoli, June 7,

8 Looks similar to ridge regression (L 2 penalty) 2 2 min y - Xβ with β c Lasso continuously shrinks the coefficients towards zero when c decreases Convex optimisation; no explicit solution If c βˆ arg min y - Xβ - β ridge p j1 b jols 2 2 Lasso identical to OLS H2DM, Napoli, June 7,

9 Constraints and log-priors Like ridge regression, the Lasso is a bayesian regression but with an exponential prior f ( ) exp j j 2 j is proportional to the log-prior H2DM, Napoli, June 7,

10 Finding the optimal parameter Cross validation if optimal prediction is needed BIC when the sparsity is the main concern opt 2 y X ˆ( ) log( n) arg min ˆ ( ) 2 df n n a good unbiased estimate of df is the number of nonzero coefficients. (Zou et al., 2007) H2DM, Napoli, June 7,

11 A few drawbacks Lasso produces a sparse model but the number of selected variables cannot exceed the number of units Unfitted to DNA Arrays : n(arrays)<<p(genes) Select only one variable among a set of correlated predictors H2DM, Napoli, June 7,

12 Elastic net Combines ridge and lasso penalties to select more predictors than the number of observations (Zou & Hastie, 2005) L 1 part leads to a sparse model 2 2 βˆ en arg min y Xβ β β β L 2 part removes the limitation on the number of retained predictors and favourises group choice H2DM, Napoli, June 7,

13 2.2 Group-lasso X matrix divided into J sub-matrices X j of p j variables Group Lasso: extension of Lasso for selecting groups of variables (Yuan & Lin, 2007): J βˆ arg min y X β p β GL j j j j j1 j1 2 J If p j =1 for all j, group Lasso = Lasso H2DM, Napoli, June 7,

14 Drawback: no sparsity within groups A solution: sparse group lasso (Simon et al., 2012) 2 min y X β β β Two tuning parameters: grid search J J J j j 1 j 2 ij j1 j1 j1 i1 p j H2DM, Napoli, June 7,

15 2.3. Sparse PLS Several solutions: Le Cao et al. (2008) Chun & Keles (2010) H2DM, Napoli, June 7,

16 3.Sparse PCA In PCA, each PC is a linear combination of all the original variables : difficult to interpret the results Challenge of SPCA: obtain components easily interpretable (many zero loadings in principal factors) Principle of SPCA: modify PCA imposing lasso/elastic-net constraints to construct modified PCs with sparse loadings Warning: Sparse PCA does not provide a global selection of variables but a selection dimension by dimension : different from the regression context (Lasso, Elastic Net, ) H2DM, Napoli, June 7,

17 3.1 First attempts: Simple PCA by Vines (2000) : integer loadings Rousson, V. and Gasser, T. (2004) : loadings (+, 0, -) SCoTLASS (Simplified Component Technique Lasso) by Jolliffe & al. (2003) : extra L 1 constraints max u'vu with u u'u 1 and 2 p uj j1 t H2DM, Napoli, June 7,

18 SCotLass properties: t t t p usual PCA 1 no solution 1 only one nonzero coefficient 1 t p Non convex problem H2DM, Napoli, June 7,

19 3.2 S-PCA R-SVD by Zou et al (2006) ' Let the SVD of X be X = UDV with Z = UD the principal components Ridge regression: 2 2 βˆ arg min Z - Xβ β ridge β ' 2 ' ' X X = VD V with V V = I β ˆ = X X + λ I X Xv v ' -1 ' i,ridge i i 2 dii d + 2 ii v v i Loadings can be recovered by regressing (ridge regression) PCs on the p variables PCA can be written as a regression-type optimization problem H2DM, Napoli, June 7,

20 Sparse PCA add a new penalty to produce sparse loadings: βˆ arg min Z - Xβ + β β 1 βˆ v ˆ is an approximation to, and the i th approximated i = v i Xvˆ i β ˆ component Produces sparse loadings with zero coefficients to facilitate interpretation Alternated algorithm between elastic net and SVD Lasso penalty H2DM, Napoli, June 7,

21 Loss of orthogonality SCotLass: orthogonal loadings but correlated components S-PCA: neither loadings, nor components are orthogonal Necessity of adjusting the % of explained variance H2DM, Napoli, June 7,

22 4. Group sparse PCA and sparse MCA (Bernard, Guinot, S., 2012) Industrial context and motivation: Funded by R&D department of Chanel cosmetic company Relate gene expression data to skin aging measures n=500, p= SNP s, genes H2DM, Napoli, June 7,

23 4.1 Group Sparse PCA Data matrix X divided into J groups X j of p j variables Group Sparse PCA: compromise between SPCA and group Lasso Goal: select groups of continuous variables (zero coefficients to entire blocks of variables) Principle: replace the penalty function in the SPCA algorithm 2 2 ˆ arg min 1 1 β β Z - Xβ β β by that defined in the group Lasso J β ˆ arg min Z X β p β GL j j j j j1 j1 H2DM, Napoli, June 7, J

24 4.2 Sparse MCA Original table In MCA: Complete disjunctive table X J 1 p J... 3 Selection of 1 column in the original table (categorical variable X j ) = Selection of a block of p j indicator variables in the complete disjunctive table X J1 X Jpj Challenge of Sparse MCA : select categorical variables, not categories Principle: a straightforward extension of Group Sparse PCA for groups of indicator variables, with the chi-square metric. Uses s-pca r-svd algorithm. H2DM, Napoli, June 7,

25 Application on genetic data Single Nucleotide Polymorphisms Data: n=502 individuals p=537 SNPs (among more than of the original data base, genes) q=1554 (total number of columns) X : 502 x 537matrix of qualitative variables K : 502 x 1554 complete disjunctive table K=(K 1,, K 1554 ) 1 block = 1 SNP = 1 K j matrix H2DM, Napoli, June 7,

26 Application on genetic data Single Nucleotide Polymorphisms H2DM, Napoli, June 7,

27 Application on genetic data Single Nucleotide Polymorphisms λ= 0:005: CPEV= 0:32% and 174 columns selected on Comp 1 H2DM, Napoli, June 7,

28 Application on genetic data Comparison of the loadings. H2DM, Napoli, June 7,

29 Application on genetic data Single Nucleotide Polymorphisms Comparison between MCA and Sparse MCA on the first plan H2DM, Napoli, June 7,

30 Properties MCA Sparse MCA Uncorrelated Components TRUE FALSE Orthogonal loadings TRUE FALSE Barycentric property TRUE Partly true % of inertia j 100 tot Z j.1,...,j-1 2 Total inertia 1 p p j 1 p j 1 k j1 Zj.1,...,j-1 2 Z j.1,...,j-1 Z j are the residuals after adjusting for (regression projection) Z 1,...,j-1 H2DM, Napoli, June 7,

31 5. Clusterwise sparse PLS regression Context High dimensional explanatory data X (with p>>n), One or several variables Y to explain, Unknown clusters of the n observations. Twofold aim Clustering: get the optimal clustering of observations, Sparse regression: compute the cluster sparse regression coefficients within each cluster to improve the Y prediction. Improve high-dimensional data interpretation. H2DM, Napoli, June 7,

32 Clusterwise (typological) regression Diday (1974), Charles (1977), Späth (1979) Optimizes simultaneously the partition and the local models n K min ( i) y ( ' ) i1 k1 2 1 β x k i k k i p >n k our proposal : local sparse PLS regressions H2DM, Napoli, June 7,

33 Criterion (for a single dependent variable and G clusters) G 2 2 ' ' ' arg min Xgy g w gv g g w g y g Xgw gcg with w g 1 g1 (cluster) sparse components (cluster) residuals w loading weight of PLS component t=xw v loading weight u=yv c regression coefficients of Y with t: c=(t T) -1 T y H2DM, Napoli, June 7,

34 A simulation study n=500, p= 100 X vector normally distributed with mean 0 and covariance matrix S with elements s ij = cov(x i, X j ) = ρ i-j, ρ [0,1] The response variable (Y) is defined as follows: there exists a latent categorical variable G, G {1,, K}, such that, conditionally to G=i and X=x, we have : Y N x x i i i 2 / Gi, X=x ( p p, i ) H2DM, Napoli, June 7,

35 Signal/noise 2 i Var( Y / G i) 0.1 the n subjects are equiprobably distributed into the K clusters For K=2 we have chosen H2DM, Napoli, June 7,

36 H2DM, Napoli, June 7,

37 A wrong number of groups H2DM, Napoli, June 7,

38 Application to real data Data features (liver toxicity from mixomics R package) X: 3116 genes (mrna extracted from liver) y: clinical chemistry marker for liver injury (serum enzymes level) Observations: 64 rats Each rat is subject to different more or less toxic acetaminophen doses (50, 150, 1500, 2000) Necropsy is performed at 6, 18, 24 and 48 hours after exposure (i.e., time when the mrna is extracted from liver) Data pre-processing y and X are centered and scaled Twofold aim Improve the sparse PLS link between genes (X) and liver injury marker (y) while seeking G unknown clusters of the 64 rats. H2DM, Napoli, June 7,

39 Batch clusterwise algorithm For several random allocations of the n observations into G clusters Compute G sparse PLS within each cluster (select and dimension through 10-fold CV) Assign the observation to the cluster with the minimum criterion value. Iterate until convergence Select the best initialization (i.e., which minimizes the criterion) and get: The allocation of the n observations into G clusters, The sparse regression coefficients within each cluster. H2DM, Napoli, June 7,

40 Clusterwise sparse PLS: performance vs standard spls N=64 R 2 =0.968 RMSE=0.313 No cluster G=3 clusters of rats N=(12; 42; 10) R 2 =(0.99; 0.928; 0.995) RMSE=(0.0272; ; ) =0.9 (sparse parameter) H= 3 dimensions =( 0.9; 0.8; 0.9) H=(3; 3; 3) H2DM, Napoli, June 7,

41 Clusterwise sparse PLS: regression coefficients Interpretation Links between genes (X) and marker for liver injury (y) are specific to each cluster Cluster 1: 12 selected genes Cluster 2: 86 selected genes Cluster 3: 15 selected genes (No cluster: 31 selected genes) The cluster structure is significantly associated with the acetaminophen dose (pvalue- Chi2Fisher= ) Cluster 1: all the doses Cluster 2: mainly doses 50, 150 Cluster 3: mainly doses 1500, 2000 H2DM, Napoli, June 7,

42 6.Conclusions and perspectives Sparse techniques provide elegant and efficient solutions to problems posed by high-dimensional data. A new generation of data analysis methods with few restrictive hypothesis Sparse PCA and group-lasso extended towards sparse MCA. Clusterwise extensions for large p and large n H2DM, Napoli, June 7,

43 Research in progress: Criteria for choosing the tuning parameter λ Extension of Sparse MCA : compromise between the Sparse MCA and the sparse group lasso developed by Simon et al. (2002) select groups and predictors within a group, in order to produce sparsity at both levels Extension to the case where variables have a block structure Writing a R package H2DM, Napoli, June 7,

44 References Bernard, A., Guinot, C., Saporta, G.. (2012) Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis, Compstat 2012, pp , Limassol, Chypre Charles, C. (1977): Régression Typologique et Reconnaissance des Formes. Thèse de doctorat, Université Paris IX. Chun, H. and Keles, S. (2010), Sparse partial least squares for simultaneous dimension reduction and variable selection, Journal of the Royal Statistical Society - Series B, Vol. 72, pp Diday, E. (1974): Introduction à l analyse factorielle typologique, Revue de Statistique Appliquée, 22, 4, pp Jolliffe, I.T., Trendafilov, N.T. and Uddin, M. (2003) A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, Rousson, V., Gasser, T. (2004), Simple component analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 53, Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99: H2DM, Napoli, June 7,

45 Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2012) A Sparse-Group Lasso. Journal of Computational and Graphical Statistics, Späth, H. (1979): Clusterwise linear regression, Computing, 22, pp Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, Vines, S.K., (2000) Simple principal components, Journal of the Royal Statistical Society: Series C (Applied Statistics), 49, Yuan, M., Lin, Y. (2007) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, Zou, H., Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of Computational and Graphical Statistics, 67, , Zou, H., Hastie, T. and Tibshirani, R. (2006) Sparse Principal Component Analysis. Journal of Computational and Graphical Statistics, 15, H. Zou, T. Hastie, R. Tibshirani, (2007), On the degrees of freedom of the lasso, The Annals of Statistics, 35, 5, H2DM, Napoli, June 7,

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr With inputs from Anne Bernard, Ph.D student Industrial

More information

Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis

Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis Anne Bernard 1,5, Hervé Abdi 2, Arthur Tenenhaus 3, Christiane Guinot 4, Gilbert Saporta

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

PCR and PLS for Clusterwise Regression on Functional Data

PCR and PLS for Clusterwise Regression on Functional Data PCR and PLS for Clusterwise Regression on Functional Data Cristian Preda 1 and Gilbert Saporta 2 1 Faculté de Médecine, Université de Lille 2 CERIM - Département de Statistique 1, Place de Verdun, 5945

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Sparse principal component analysis via regularized low rank matrix approximation

Sparse principal component analysis via regularized low rank matrix approximation Journal of Multivariate Analysis 99 (2008) 1015 1034 www.elsevier.com/locate/jmva Sparse principal component analysis via regularized low rank matrix approximation Haipeng Shen a,, Jianhua Z. Huang b a

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

50 Years of Data Analysis

50 Years of Data Analysis 50 Years of Data Analysis from EDA to predictive modelling and machine learning Gilbert Saporta CEDRIC- CNAM, 292 rue Saint Martin, F-75003 Paris gilbert.saporta@cnam.fr http://cedric.cnam.fr/~saporta

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Clusterwise analysis for multiblock component methods

Clusterwise analysis for multiblock component methods In press in: Advances in Data Analysis and Classification, 12. (2018) Clusterwise analysis for multiblock component methods Stéphanie Bougeard Hervé Abdi Gilbert Saporta Ndèye Niang Accepted: October 15,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Principal Component Analysis with Sparse Fused Loadings

Principal Component Analysis with Sparse Fused Loadings Principal Component Analysis with Sparse Fused Loadings Frank Jian Guo, Gareth James, Elizaveta Levina, George Michailidis and Ji Zhu September 16, 2009 Abstract In this paper, we propose a new method

More information

Sparse Principal Component Analysis

Sparse Principal Component Analysis Sparse Principal Component Analysis Hui Zou, Trevor Hastie, Robert Tibshirani April 26, 2004 Abstract Principal component analysis (PCA) is widely used in data processing and dimensionality reduction.

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Sparse orthogonal factor analysis

Sparse orthogonal factor analysis Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Mining Big Data Using Parsimonious Factor and Shrinkage Methods

Mining Big Data Using Parsimonious Factor and Shrinkage Methods Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Sparse Principal Component Analysis Formulations And Algorithms

Sparse Principal Component Analysis Formulations And Algorithms Sparse Principal Component Analysis Formulations And Algorithms SLIDE 1 Outline 1 Background What Is Principal Component Analysis (PCA)? What Is Sparse Principal Component Analysis (spca)? 2 The Sparse

More information

Sparse statistical modelling

Sparse statistical modelling Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Regression Modelling with Many Correlated Predictors and Few Cases: A new approach to linear and logistic regression with high dimensional data

Regression Modelling with Many Correlated Predictors and Few Cases: A new approach to linear and logistic regression with high dimensional data Regression Modelling with Many Correlated Predictors and Few Cases: A new approach to linear and logistic regression with high dimensional data Gary Bennett 25 th October 2013 ASSESS, York, UK Tel: 01732

More information

PLS discriminant analysis for functional data

PLS discriminant analysis for functional data PLS discriminant analysis for functional data 1 Dept. de Statistique CERIM - Faculté de Médecine Université de Lille 2, 5945 Lille Cedex, France (e-mail: cpreda@univ-lille2.fr) 2 Chaire de Statistique

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Principal Component Analysis with Sparse Fused Loadings

Principal Component Analysis with Sparse Fused Loadings Principal Component Analysis with Sparse Fused Loadings Jian Guo, Gareth James, Elizaveta Levina, George Michailidis and Ji Zhu March 21, 2010 Abstract In this paper, we propose a new method for principal

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

EXTENDING PARTIAL LEAST SQUARES REGRESSION

EXTENDING PARTIAL LEAST SQUARES REGRESSION EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Review of Sparse Methods in Regression and Classification with Application to Chemometrics

Review of Sparse Methods in Regression and Classification with Application to Chemometrics Review of Sparse Methods in Regression and Classification with Application to Chemometrics Peter Filzmoser, Moritz Gschwandtner and Valentin Todorov Sparse statistical methods lead to parameter estimates

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Prediction of Stress Increase in Unbonded Tendons using Sparse Principal Component Analysis

Prediction of Stress Increase in Unbonded Tendons using Sparse Principal Component Analysis Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 8-2017 Prediction of Stress Increase in Unbonded Tendons using Sparse Principal Component Analysis Eric Mckinney

More information

Lasso applications: regularisation and homotopy

Lasso applications: regularisation and homotopy Lasso applications: regularisation and homotopy M.R. Osborne 1 mailto:mike.osborne@anu.edu.au 1 Mathematical Sciences Institute, Australian National University Abstract Ti suggested the use of an l 1 norm

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information