Some Sparse Methods for High Dimensional Data. Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris
|
|
- Alicia Hancock
- 5 years ago
- Views:
Transcription
1 Some Sparse Methods for High Dimensional Data Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris
2 A joint work with Anne Bernard (QFAB Bioinformatics) Stéphanie Bougeard (ANSES) Ndeye Niang (CNAM) Cristian Preda (Lille) H2DM, Napoli, June 7,
3 Outline 1. Introduction 2. Sparse regression 3. Sparse PCA 4. Group sparse PCA and sparse MCA 5. Clusterwise sparse PLS regression 6. Conclusion and perspectives H2DM, Napoli, June 7,
4 1.Introduction High dimensional data: p>>n Gene expression data Chemometrics etc. Several solutions for regression problems with all variables (PLS, ridge, PCR) ; but interpretation is difficult Sparse methods: provide combinations of few variables H2DM, Napoli, June 7,
5 This talk: a survey of sparse methods for supervised (regression) and unsupervised (PCA) problems New propositions for large n and large p when unobserved heterogeneity among observations occurs Clusterwise sparse PLS regression H2DM, Napoli, June 7,
6 2. Sparse regression Keeping all predictors is a drawback for high dimensional data: combinations of too many variables cannot be interpreted Sparse methods simultaneously shrink coefficients and select variables, hence better predictions H2DM, Napoli, June 7,
7 2.1 Lasso and elastic-net The Lasso (Tibshirani, 1996) is a shrinkage and selection method. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients (L 1 penalty). min y - Xβ with 2 2 ˆ βlasso arg min y Xβ β p j1 j c p j1 j H2DM, Napoli, June 7,
8 Looks similar to ridge regression (L 2 penalty) 2 2 min y - Xβ with β c Lasso continuously shrinks the coefficients towards zero when c decreases Convex optimisation; no explicit solution If c βˆ arg min y - Xβ - β ridge p j1 b jols 2 2 Lasso identical to OLS H2DM, Napoli, June 7,
9 Constraints and log-priors Like ridge regression, the Lasso is a bayesian regression but with an exponential prior f ( ) exp j j 2 j is proportional to the log-prior H2DM, Napoli, June 7,
10 Finding the optimal parameter Cross validation if optimal prediction is needed BIC when the sparsity is the main concern opt 2 y X ˆ( ) log( n) arg min ˆ ( ) 2 df n n a good unbiased estimate of df is the number of nonzero coefficients. (Zou et al., 2007) H2DM, Napoli, June 7,
11 A few drawbacks Lasso produces a sparse model but the number of selected variables cannot exceed the number of units Unfitted to DNA Arrays : n(arrays)<<p(genes) Select only one variable among a set of correlated predictors H2DM, Napoli, June 7,
12 Elastic net Combines ridge and lasso penalties to select more predictors than the number of observations (Zou & Hastie, 2005) L 1 part leads to a sparse model 2 2 βˆ en arg min y Xβ β β β L 2 part removes the limitation on the number of retained predictors and favourises group choice H2DM, Napoli, June 7,
13 2.2 Group-lasso X matrix divided into J sub-matrices X j of p j variables Group Lasso: extension of Lasso for selecting groups of variables (Yuan & Lin, 2007): J βˆ arg min y X β p β GL j j j j j1 j1 2 J If p j =1 for all j, group Lasso = Lasso H2DM, Napoli, June 7,
14 Drawback: no sparsity within groups A solution: sparse group lasso (Simon et al., 2012) 2 min y X β β β Two tuning parameters: grid search J J J j j 1 j 2 ij j1 j1 j1 i1 p j H2DM, Napoli, June 7,
15 2.3. Sparse PLS Several solutions: Le Cao et al. (2008) Chun & Keles (2010) H2DM, Napoli, June 7,
16 3.Sparse PCA In PCA, each PC is a linear combination of all the original variables : difficult to interpret the results Challenge of SPCA: obtain components easily interpretable (many zero loadings in principal factors) Principle of SPCA: modify PCA imposing lasso/elastic-net constraints to construct modified PCs with sparse loadings Warning: Sparse PCA does not provide a global selection of variables but a selection dimension by dimension : different from the regression context (Lasso, Elastic Net, ) H2DM, Napoli, June 7,
17 3.1 First attempts: Simple PCA by Vines (2000) : integer loadings Rousson, V. and Gasser, T. (2004) : loadings (+, 0, -) SCoTLASS (Simplified Component Technique Lasso) by Jolliffe & al. (2003) : extra L 1 constraints max u'vu with u u'u 1 and 2 p uj j1 t H2DM, Napoli, June 7,
18 SCotLass properties: t t t p usual PCA 1 no solution 1 only one nonzero coefficient 1 t p Non convex problem H2DM, Napoli, June 7,
19 3.2 S-PCA R-SVD by Zou et al (2006) ' Let the SVD of X be X = UDV with Z = UD the principal components Ridge regression: 2 2 βˆ arg min Z - Xβ β ridge β ' 2 ' ' X X = VD V with V V = I β ˆ = X X + λ I X Xv v ' -1 ' i,ridge i i 2 dii d + 2 ii v v i Loadings can be recovered by regressing (ridge regression) PCs on the p variables PCA can be written as a regression-type optimization problem H2DM, Napoli, June 7,
20 Sparse PCA add a new penalty to produce sparse loadings: βˆ arg min Z - Xβ + β β 1 βˆ v ˆ is an approximation to, and the i th approximated i = v i Xvˆ i β ˆ component Produces sparse loadings with zero coefficients to facilitate interpretation Alternated algorithm between elastic net and SVD Lasso penalty H2DM, Napoli, June 7,
21 Loss of orthogonality SCotLass: orthogonal loadings but correlated components S-PCA: neither loadings, nor components are orthogonal Necessity of adjusting the % of explained variance H2DM, Napoli, June 7,
22 4. Group sparse PCA and sparse MCA (Bernard, Guinot, S., 2012) Industrial context and motivation: Funded by R&D department of Chanel cosmetic company Relate gene expression data to skin aging measures n=500, p= SNP s, genes H2DM, Napoli, June 7,
23 4.1 Group Sparse PCA Data matrix X divided into J groups X j of p j variables Group Sparse PCA: compromise between SPCA and group Lasso Goal: select groups of continuous variables (zero coefficients to entire blocks of variables) Principle: replace the penalty function in the SPCA algorithm 2 2 ˆ arg min 1 1 β β Z - Xβ β β by that defined in the group Lasso J β ˆ arg min Z X β p β GL j j j j j1 j1 H2DM, Napoli, June 7, J
24 4.2 Sparse MCA Original table In MCA: Complete disjunctive table X J 1 p J... 3 Selection of 1 column in the original table (categorical variable X j ) = Selection of a block of p j indicator variables in the complete disjunctive table X J1 X Jpj Challenge of Sparse MCA : select categorical variables, not categories Principle: a straightforward extension of Group Sparse PCA for groups of indicator variables, with the chi-square metric. Uses s-pca r-svd algorithm. H2DM, Napoli, June 7,
25 Application on genetic data Single Nucleotide Polymorphisms Data: n=502 individuals p=537 SNPs (among more than of the original data base, genes) q=1554 (total number of columns) X : 502 x 537matrix of qualitative variables K : 502 x 1554 complete disjunctive table K=(K 1,, K 1554 ) 1 block = 1 SNP = 1 K j matrix H2DM, Napoli, June 7,
26 Application on genetic data Single Nucleotide Polymorphisms H2DM, Napoli, June 7,
27 Application on genetic data Single Nucleotide Polymorphisms λ= 0:005: CPEV= 0:32% and 174 columns selected on Comp 1 H2DM, Napoli, June 7,
28 Application on genetic data Comparison of the loadings. H2DM, Napoli, June 7,
29 Application on genetic data Single Nucleotide Polymorphisms Comparison between MCA and Sparse MCA on the first plan H2DM, Napoli, June 7,
30 Properties MCA Sparse MCA Uncorrelated Components TRUE FALSE Orthogonal loadings TRUE FALSE Barycentric property TRUE Partly true % of inertia j 100 tot Z j.1,...,j-1 2 Total inertia 1 p p j 1 p j 1 k j1 Zj.1,...,j-1 2 Z j.1,...,j-1 Z j are the residuals after adjusting for (regression projection) Z 1,...,j-1 H2DM, Napoli, June 7,
31 5. Clusterwise sparse PLS regression Context High dimensional explanatory data X (with p>>n), One or several variables Y to explain, Unknown clusters of the n observations. Twofold aim Clustering: get the optimal clustering of observations, Sparse regression: compute the cluster sparse regression coefficients within each cluster to improve the Y prediction. Improve high-dimensional data interpretation. H2DM, Napoli, June 7,
32 Clusterwise (typological) regression Diday (1974), Charles (1977), Späth (1979) Optimizes simultaneously the partition and the local models n K min ( i) y ( ' ) i1 k1 2 1 β x k i k k i p >n k our proposal : local sparse PLS regressions H2DM, Napoli, June 7,
33 Criterion (for a single dependent variable and G clusters) G 2 2 ' ' ' arg min Xgy g w gv g g w g y g Xgw gcg with w g 1 g1 (cluster) sparse components (cluster) residuals w loading weight of PLS component t=xw v loading weight u=yv c regression coefficients of Y with t: c=(t T) -1 T y H2DM, Napoli, June 7,
34 A simulation study n=500, p= 100 X vector normally distributed with mean 0 and covariance matrix S with elements s ij = cov(x i, X j ) = ρ i-j, ρ [0,1] The response variable (Y) is defined as follows: there exists a latent categorical variable G, G {1,, K}, such that, conditionally to G=i and X=x, we have : Y N x x i i i 2 / Gi, X=x ( p p, i ) H2DM, Napoli, June 7,
35 Signal/noise 2 i Var( Y / G i) 0.1 the n subjects are equiprobably distributed into the K clusters For K=2 we have chosen H2DM, Napoli, June 7,
36 H2DM, Napoli, June 7,
37 A wrong number of groups H2DM, Napoli, June 7,
38 Application to real data Data features (liver toxicity from mixomics R package) X: 3116 genes (mrna extracted from liver) y: clinical chemistry marker for liver injury (serum enzymes level) Observations: 64 rats Each rat is subject to different more or less toxic acetaminophen doses (50, 150, 1500, 2000) Necropsy is performed at 6, 18, 24 and 48 hours after exposure (i.e., time when the mrna is extracted from liver) Data pre-processing y and X are centered and scaled Twofold aim Improve the sparse PLS link between genes (X) and liver injury marker (y) while seeking G unknown clusters of the 64 rats. H2DM, Napoli, June 7,
39 Batch clusterwise algorithm For several random allocations of the n observations into G clusters Compute G sparse PLS within each cluster (select and dimension through 10-fold CV) Assign the observation to the cluster with the minimum criterion value. Iterate until convergence Select the best initialization (i.e., which minimizes the criterion) and get: The allocation of the n observations into G clusters, The sparse regression coefficients within each cluster. H2DM, Napoli, June 7,
40 Clusterwise sparse PLS: performance vs standard spls N=64 R 2 =0.968 RMSE=0.313 No cluster G=3 clusters of rats N=(12; 42; 10) R 2 =(0.99; 0.928; 0.995) RMSE=(0.0272; ; ) =0.9 (sparse parameter) H= 3 dimensions =( 0.9; 0.8; 0.9) H=(3; 3; 3) H2DM, Napoli, June 7,
41 Clusterwise sparse PLS: regression coefficients Interpretation Links between genes (X) and marker for liver injury (y) are specific to each cluster Cluster 1: 12 selected genes Cluster 2: 86 selected genes Cluster 3: 15 selected genes (No cluster: 31 selected genes) The cluster structure is significantly associated with the acetaminophen dose (pvalue- Chi2Fisher= ) Cluster 1: all the doses Cluster 2: mainly doses 50, 150 Cluster 3: mainly doses 1500, 2000 H2DM, Napoli, June 7,
42 6.Conclusions and perspectives Sparse techniques provide elegant and efficient solutions to problems posed by high-dimensional data. A new generation of data analysis methods with few restrictive hypothesis Sparse PCA and group-lasso extended towards sparse MCA. Clusterwise extensions for large p and large n H2DM, Napoli, June 7,
43 Research in progress: Criteria for choosing the tuning parameter λ Extension of Sparse MCA : compromise between the Sparse MCA and the sparse group lasso developed by Simon et al. (2002) select groups and predictors within a group, in order to produce sparsity at both levels Extension to the case where variables have a block structure Writing a R package H2DM, Napoli, June 7,
44 References Bernard, A., Guinot, C., Saporta, G.. (2012) Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis, Compstat 2012, pp , Limassol, Chypre Charles, C. (1977): Régression Typologique et Reconnaissance des Formes. Thèse de doctorat, Université Paris IX. Chun, H. and Keles, S. (2010), Sparse partial least squares for simultaneous dimension reduction and variable selection, Journal of the Royal Statistical Society - Series B, Vol. 72, pp Diday, E. (1974): Introduction à l analyse factorielle typologique, Revue de Statistique Appliquée, 22, 4, pp Jolliffe, I.T., Trendafilov, N.T. and Uddin, M. (2003) A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, Rousson, V., Gasser, T. (2004), Simple component analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 53, Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis, 99: H2DM, Napoli, June 7,
45 Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2012) A Sparse-Group Lasso. Journal of Computational and Graphical Statistics, Späth, H. (1979): Clusterwise linear regression, Computing, 22, pp Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, Vines, S.K., (2000) Simple principal components, Journal of the Royal Statistical Society: Series C (Applied Statistics), 49, Yuan, M., Lin, Y. (2007) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, Zou, H., Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of Computational and Graphical Statistics, 67, , Zou, H., Hastie, T. and Tibshirani, R. (2006) Sparse Principal Component Analysis. Journal of Computational and Graphical Statistics, 15, H. Zou, T. Hastie, R. Tibshirani, (2007), On the degrees of freedom of the lasso, The Annals of Statistics, 35, 5, H2DM, Napoli, June 7,
A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA
A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr With inputs from Anne Bernard, Ph.D student Industrial
More informationSparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis
Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis Anne Bernard 1,5, Hervé Abdi 2, Arthur Tenenhaus 3, Christiane Guinot 4, Gilbert Saporta
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationPCR and PLS for Clusterwise Regression on Functional Data
PCR and PLS for Clusterwise Regression on Functional Data Cristian Preda 1 and Gilbert Saporta 2 1 Faculté de Médecine, Université de Lille 2 CERIM - Département de Statistique 1, Place de Verdun, 5945
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationSparse principal component analysis via regularized low rank matrix approximation
Journal of Multivariate Analysis 99 (2008) 1015 1034 www.elsevier.com/locate/jmva Sparse principal component analysis via regularized low rank matrix approximation Haipeng Shen a,, Jianhua Z. Huang b a
More informationA note on the group lasso and a sparse group lasso
A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationSTATS 306B: Unsupervised Learning Spring Lecture 13 May 12
STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More information50 Years of Data Analysis
50 Years of Data Analysis from EDA to predictive modelling and machine learning Gilbert Saporta CEDRIC- CNAM, 292 rue Saint Martin, F-75003 Paris gilbert.saporta@cnam.fr http://cedric.cnam.fr/~saporta
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationA General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations
A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationClusterwise analysis for multiblock component methods
In press in: Advances in Data Analysis and Classification, 12. (2018) Clusterwise analysis for multiblock component methods Stéphanie Bougeard Hervé Abdi Gilbert Saporta Ndèye Niang Accepted: October 15,
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More informationInstitute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR
DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationPrincipal Component Analysis with Sparse Fused Loadings
Principal Component Analysis with Sparse Fused Loadings Frank Jian Guo, Gareth James, Elizaveta Levina, George Michailidis and Ji Zhu September 16, 2009 Abstract In this paper, we propose a new method
More informationSparse Principal Component Analysis
Sparse Principal Component Analysis Hui Zou, Trevor Hastie, Robert Tibshirani April 26, 2004 Abstract Principal component analysis (PCA) is widely used in data processing and dimensionality reduction.
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationSparse orthogonal factor analysis
Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationMining Big Data Using Parsimonious Factor and Shrinkage Methods
Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationSparse Principal Component Analysis Formulations And Algorithms
Sparse Principal Component Analysis Formulations And Algorithms SLIDE 1 Outline 1 Background What Is Principal Component Analysis (PCA)? What Is Sparse Principal Component Analysis (spca)? 2 The Sparse
More informationSparse statistical modelling
Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationOutlier detection and variable selection via difference based regression model and penalized regression
Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationInternational Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA
International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationRegression Modelling with Many Correlated Predictors and Few Cases: A new approach to linear and logistic regression with high dimensional data
Regression Modelling with Many Correlated Predictors and Few Cases: A new approach to linear and logistic regression with high dimensional data Gary Bennett 25 th October 2013 ASSESS, York, UK Tel: 01732
More informationPLS discriminant analysis for functional data
PLS discriminant analysis for functional data 1 Dept. de Statistique CERIM - Faculté de Médecine Université de Lille 2, 5945 Lille Cedex, France (e-mail: cpreda@univ-lille2.fr) 2 Chaire de Statistique
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationPrincipal Component Analysis with Sparse Fused Loadings
Principal Component Analysis with Sparse Fused Loadings Jian Guo, Gareth James, Elizaveta Levina, George Michailidis and Ji Zhu March 21, 2010 Abstract In this paper, we propose a new method for principal
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationLecture VIII Dim. Reduction (I)
Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationEXTENDING PARTIAL LEAST SQUARES REGRESSION
EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationPrincipal Component Analysis
I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationReview of Sparse Methods in Regression and Classification with Application to Chemometrics
Review of Sparse Methods in Regression and Classification with Application to Chemometrics Peter Filzmoser, Moritz Gschwandtner and Valentin Todorov Sparse statistical methods lead to parameter estimates
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationPrediction of Stress Increase in Unbonded Tendons using Sparse Principal Component Analysis
Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 8-2017 Prediction of Stress Increase in Unbonded Tendons using Sparse Principal Component Analysis Eric Mckinney
More informationLasso applications: regularisation and homotopy
Lasso applications: regularisation and homotopy M.R. Osborne 1 mailto:mike.osborne@anu.edu.au 1 Mathematical Sciences Institute, Australian National University Abstract Ti suggested the use of an l 1 norm
More informationFunctional SVD for Big Data
Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,
More informationLASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationAn algorithm for the multivariate group lasso with covariance estimation
An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More information