Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis

Similar documents
A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA

Some Sparse Methods for High Dimensional Data. Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris

Sparse principal component analysis via regularized low rank matrix approximation

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Sparse PCA with applications in finance

A direct formulation for sparse PCA using semidefinite programming

Supplement to Sparse Non-Negative Generalized PCA with Applications to Metabolomics

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

Functional SVD for Big Data

How to Sparsify the Singular Value Decomposition with Orthogonal Components

Biclustering via Sparse Singular Value Decomposition

Generalized Power Method for Sparse Principal Component Analysis

ESL Chap3. Some extensions of lasso

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

A direct formulation for sparse PCA using semidefinite programming

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Sparse Principal Component Analysis

Bi-level feature selection with applications to genetic association

A Generalized Least Squares Matrix Decomposition

Tractable Upper Bounds on the Restricted Isometry Constant

A note on the group lasso and a sparse group lasso

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

A Generalized Least Squares Matrix Decomposition

Lecture 6: Methods for high-dimensional problems

Biclustering via Sparse Singular Value Decomposition

Learning with Singular Vectors

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS


Chapter 3. Linear Models for Regression

MSA220/MVE440 Statistical Learning for Big Data

Sparse statistical modelling

Lecture 14: Variable Selection - Beyond LASSO

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

A direct formulation for sparse PCA using semidefinite programming

STATISTICAL LEARNING SYSTEMS

Regularization: Ridge Regression and the LASSO

Sparse orthogonal factor analysis

Analysis Methods for Supersaturated Design: Some Comparisons

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Forward Selection and Estimation in High Dimensional Single Index Models

Package sgpca. R topics documented: July 6, Type Package. Title Sparse Generalized Principal Component Analysis. Version 1.0.

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

A Generalized Least-Square Matrix Decomposition

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Permutation-invariant regularization of large covariance matrices. Liza Levina

Principal Component Analysis with Sparse Fused Loadings

Principal Component Analysis

Package tsvd. March 24, 2015

Regularized PCA to denoise and visualise data

Package tsvd. March 11, 2015

Generalized Elastic Net Regression

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Regularization Path Algorithms for Detecting Gene Interactions

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Generalizing Partial Least Squares and Correspondence Analysis to Predict Categorical (and Heterogeneous) Data

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Bayesian Grouped Horseshoe Regression with Application to Additive Models

How to analyze multiple distance matrices

PLS discriminant analysis for functional data

SVD, PCA & Preprocessing

Linear Methods for Regression. Lijun Zhang

Sparse Principal Component Analysis Formulations And Algorithms

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

A Generalized Least Squares Matrix Decomposition

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Gaussian Graphical Models and Graphical Lasso

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Learning with Sparsity Constraints

Prediction of Stress Increase in Unbonded Tendons using Sparse Principal Component Analysis

Lecture 2 Part 1 Optimization

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Machine Learning (BSMC-GA 4439) Wenke Liu

Regularized Tensor Factorizations and Higher-Order. Principal Components Analysis

CSC 576: Variants of Sparse Learning

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

High-dimensional Ordinary Least-squares Projection for Screening Variables

Sparse inverse covariance estimation with the lasso

Regularization and Variable Selection via the Elastic Net

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Penalized versus constrained generalized eigenvalue problems

Iterative Selection Using Orthogonal Regression Techniques

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Sparse Partial Least Squares Regression with an Application to. Genome Scale Transcription Factor Analysis

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

Principal Component Analysis with Sparse Fused Loadings

25 : Graphical induced structured input/output models

Outlier detection and variable selection via difference based regression model and penalized regression

Vector Space Models. wine_spectral.r

A Modern Look at Classical Multivariate Techniques

Generalized power method for sparse principal component analysis

Sparse representation classification and positive L1 minimization

ISyE 691 Data mining and analytics

A Selective Review of Sufficient Dimension Reduction

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Compressed Sensing in Cancer Biology? (A Work in Progress)

Transcription:

Sparse Principal Component Analysis for multiblocks data and its extension to Sparse Multiple Correspondence Analysis Anne Bernard 1,5, Hervé Abdi 2, Arthur Tenenhaus 3, Christiane Guinot 4, Gilbert Saporta 5 1 CE.R.I.E.S, Neuilly sur Seine, France 2 Department of Brain and Behavioral Sciences, The University of Texas at Dallas, Richardson, TX, USA 3 SUPELEC, Gif-sur-Yvette, France 4 Université Francois Rabelais, département d informatique, Tours, France 5 Laboratoire Cedric, CNAM, Paris June 13, 2013 Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 1 / 33

Background Variable selection and dimensionality reduction are necessary in different application domains, and more specifically in genetics. Quantitative data analysis: PCA Categorical data analysis: MCA In case of high dimensional data (n p): components (PCs) difficult to interpret. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 2 / 33

Background Variable selection and dimensionality reduction are necessary in different application domains, and more specifically in genetics. Quantitative data analysis: PCA Categorical data analysis: MCA In case of high dimensional data (n p): components (PCs) difficult to interpret. A way for obtaining simple structures Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 2 / 33

Background Variable selection and dimensionality reduction are necessary in different application domains, and more specifically in genetics. Quantitative data analysis: PCA Categorical data analysis: MCA In case of high dimensional data (n p): components (PCs) difficult to interpret. A way for obtaining simple structures 1. Sparse methods like sparse PCA for quantitative data: PCs are combinations of few original variables Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 2 / 33

Background Variable selection and dimensionality reduction are necessary in different application domains, and more specifically in genetics. Quantitative data analysis: PCA Categorical data analysis: MCA In case of high dimensional data (n p): components (PCs) difficult to interpret. A way for obtaining simple structures 1. Sparse methods like sparse PCA for quantitative data: PCs are combinations of few original variables extension for multiblocks data structure (GSPCA) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 2 / 33

Background Variable selection and dimensionality reduction are necessary in different application domains, and more specifically in genetics. Quantitative data analysis: PCA Categorical data analysis: MCA In case of high dimensional data (n p): components (PCs) difficult to interpret. A way for obtaining simple structures 1. Sparse methods like sparse PCA for quantitative data: PCs are combinations of few original variables extension for multiblocks data structure (GSPCA) 2. Extension for categorical variables: Development of a new sparse method (sparse MCA) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 2 / 33

Data and methods Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 3 / 33

Data and methods Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 3 / 33

Uniblock data structure Principal Component Analysis (PCA): Each PC is a linear combination of all the original variables, so loadings are nonzero PCA results difficult to interpret Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 4 / 33

Uniblock data structure Principal Component Analysis (PCA): Each PC is a linear combination of all the original variables, so loadings are nonzero PCA results difficult to interpret Multiple Correspondence Analysis (MCA): Counterpart of PCA for categorical data Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 4 / 33

Uniblock data structure Principal Component Analysis (PCA): Each PC is a linear combination of all the original variables, so loadings are nonzero PCA results difficult to interpret Multiple Correspondence Analysis (MCA): Counterpart of PCA for categorical data Sparse PCA (SPCA): Modified PCs with sparse loadings using regularized SVD Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 4 / 33

Principal Component Analysis (PCA) X: matrix of quantitative variables PCA can be computed via the SVD of X Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 5 / 33

Principal Component Analysis (PCA) X: matrix of quantitative variables PCA can be computed via the SVD of X Singular Value Decomposition of X (SVD): X = UDV T with U T U = V T V = I (1) where Z=UD the PCs and V the corresponding nonzero loadings PCA results difficult to interpret Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 5 / 33

Principal Component Analysis (PCA) X: matrix of quantitative variables PCA can be computed via the SVD of X Singular Value Decomposition of X (SVD): X = UDV T with U T U = V T V = I (1) where Z=UD the PCs and V the corresponding nonzero loadings PCA results difficult to interpret SVD as low rank approximation of matrices: min X X (1) 2 X (1) 2 = min X ũ,ṽ ũṽt 2 2 (2) X (1) = ũṽ T is the best rank-one matrix approximation of X with: ũ = δ 1 u 1 the first component and ṽ = v 1 the first loading vector. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 5 / 33

Sparse Principal Component Analysis (SPCA) Challenge: facilitate interpretation of PCA results How? PCs with a lot of zero loadings (sparse loadings) Several approaches: ScoTLass by Jolliffe, I.T. et al. (2003), SPCA by Zou, H. et al. (2006) SPCA-rSVD by Shen, H. and Huang, J.Z. (2008) Principle: use connection of PCA with SVD low rank matrix approximation problem with regularization penalty on the loadings Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 6 / 33

Regularization in regression Lasso penalty: (Tibshirani, 1996) a variable selection technique produces sparse models Imposes the L 1 norm on the linear regression coefficients Regression with L 1 penalization ˆβ lasso = min y Xβ 2 + λ β J β j with P λ (β) = λ j=1 J β j j=1 If λ=0: usual regression (non-zero coefficients) If λ increases: some coefficients sets to zero selection of variables but the number of variables selected is bounded by the number of units Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 7 / 33

Regularization in regression Other kinds of penalties: Hard thresholding penalty: (Antoniadis, 1997) P λ ( β ) = λ 2 ( β λ) 2 I( β < λ) SCAD penalty (Smoothly Clipped Absolute Deviation penalty): (Fan and Li, 2001) ( P λ (β) = λ I( β λ) + (aλ β) ) + I( β > λ) (a 1)λ Group Lasso penalty: (Yuan and Lin, 2007) P λ (β) = λ K k=1 with a > 2, β > 0 J [k] β k 2 (3) with J [k] the number of variables in the group k. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 8 / 33

Sparse PCA via regularized SVD (SPCA-rSVD) Low rank matrix approximation problem min ũ,ṽ X ũṽt 2 2 (4) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 9 / 33

Sparse PCA via regularized SVD (SPCA-rSVD) Low rank matrix approximation problem with X ũṽ T 2 2 min ũ,ṽ X ũṽt 2 2 (4) ( ) = tr (X ũṽ) T (X ũṽ) ( ) = X 2 2 + tr ṽ T ṽ 2 tr (X ) T ũṽ T = X 2 2 + J j=1 ( ṽ 2 j 2(X T ũ) j ṽ j ) (5) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 9 / 33

Sparse PCA via regularized SVD (SPCA-rSVD) Low rank matrix approximation problem +regularization penalty function applied on ṽ with X ũṽ T 2 2 min ũ,ṽ X ũṽt 2 2+P λ (ṽ) (4) ( ) = tr (X ũṽ) T (X ũṽ) ( ) = X 2 2 + tr ṽ T ṽ 2 tr (X ) T ũṽ T = X 2 2 + J j=1 ( ṽ 2 j 2(X T ũ) j ṽ j ) (5) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 9 / 33

Sparse PCA via regularized SVD (SPCA-rSVD) Low rank matrix approximation problem +regularization penalty function applied on ṽ min ũ,ṽ X ũṽt 2 2+P λ (ṽ) (4) with ( ) X ũṽ T 2 2+P λ (ṽ) = tr (X ũṽ) T (X ũṽ) +P λ (ṽ) (5) ( ) ( ) = X 2 2 + tr ṽ T ṽ 2 tr X T ũṽ T +P λ (ṽ) = X 2 2 + = X 2 2 + J j=1 J j=1 ( ṽ 2 j 2(X T ũ) j ṽ j ) + J P λ (ṽ j ) j=1 ( ) ṽj 2 2(X T ũ) j ṽ j +P λ (ṽ j ) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 9 / 33

Sparse PCA via regularized SVD (SPCA-rSVD) Iterative procedure: 1. ṽ fixed: minimizer of (4) is ũ = Xṽ/ Xṽ 2. ũ fixed: minimizer of (4) is ṽ = h λ (X T ũ) with h λ (X T ũ) = min ṽ j ṽj 2 2(X T ũ) j ṽ j +P λ (ṽ j ) ( ) (6) For the Lasso penalty: h λ (X T ũ) = sign(x T ũ)( X T ũ λ) + For the hard thresholding penalty: h λ (X T ũ) = I( X T ũ > λ)x T ũ Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 10 / 33

Sparse PCA via regularized SVD (SPCA-rSVD) Iterative procedure: 1. ṽ fixed: minimizer of (4) is ũ = Xṽ/ Xṽ 2. ũ fixed: minimizer of (4) is ṽ = h λ (X T ũ) with h λ (X T ũ) = min ṽ j ṽj 2 2(X T ũ) j ṽ j +P λ (ṽ j ) ( ) (6) For the Lasso penalty: h λ (X T ũ) = sign(x T ũ)( X T ũ λ) + For the hard thresholding penalty: h λ (X T ũ) = I( X T ũ > λ)x T ũ min obtained by applying h λ to the vector X T ũ componentwise v 1 is obtained. Subsequently sparse loadings v i (i > 1) obtained via rank-one approximation of residual matrices. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 10 / 33

Multiblock data structure Group Sparse PCA (GSPCA): Challenge: To create sparseness in multiblock data structure Principle: Modified PCs with sparse loadings using the group lasso penalty Result: All variables of a block is selected or removed Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 11 / 33

Group Sparse PCA Group Sparse PCA: Extension of SPCA-rSVD for blocks of quantitative variables Challenge: select groups of continuous variables zero coefficients to entire blocks of variables How? A compromise between SPCA-rSVD and group Lasso Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 12 / 33

Group Sparse PCA X: matrix of quantitative variables divided into K sub-matrices SVD of X: X = [ ] X [1]... X [k]... X [K] = UDV T [ ] = UD V[1] T... VT [k]... VT [K] where each line of V T is a vector divided into K blocks. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 13 / 33

Group Sparse PCA Remember SPCA-rSVD: min X ũ,ṽ ũṽt 2 2 + P λ (ṽ) (7) ) ũ fixed: ṽ = h λ (X T ũ) = minṽj (ṽj 2 2(X T ũ) j ṽ j + P λ (ṽ j ) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 14 / 33

Group Sparse PCA Remember SPCA-rSVD: min X ũ,ṽ ũṽt 2 2 + P λ (ṽ) (7) ) ũ fixed: ṽ = h λ (X T ũ) = minṽj (ṽj 2 2(X T ũ) j ṽ j + P λ (ṽ j ) In GSPCA: ( ) ṽ = h λ (X T ũ) = min tr(ṽ [k] ṽ[k] T ) 2 tr(ṽ [k]ũ T X [k] ) + P λ (ṽ [k] ) ṽ [k] Here P λ is the group lasso regularizer: K P λ (ṽ) = λ J [k] ṽ [k] 2 (8) The solution is h λ (X T [k]ũ) = ( k=1 1 λ 2 J[k] X T [k]ũ 2 ) + X T [k]ũ (9) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 14 / 33

Group Sparse PCA Group Sparse PCA Algorithm Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 15 / 33

Group Sparse PCA Group Sparse PCA Algorithm 1 Apply the SVD to X and obtain the best rank-one approximation of X as δũṽ T. Set ṽ old = ṽ = [ṽ [1],..., ṽ [K] ] and ũ old = δũ. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 15 / 33

Group Sparse PCA Group Sparse PCA Algorithm 1 Apply the SVD to X and obtain the best rank-one approximation of X as δũṽ T. Set ṽ old = ṽ = [ṽ [1],..., ṽ [K] ] and ũ old = δũ. 2 Update: a) ṽ new = [h λ (X T [1]ũold),..., h λ (X T [K]ũold)] b) ũ new = Xṽ new / Xṽ new Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 15 / 33

Group Sparse PCA Group Sparse PCA Algorithm 1 Apply the SVD to X and obtain the best rank-one approximation of X as δũṽ T. Set ṽ old = ṽ = [ṽ [1],..., ṽ [K] ] and ũ old = δũ. 2 Update: a) ṽ new = [h λ (X T [1]ũold),..., h λ (X T [K]ũold)] b) ũ new = Xṽ new / Xṽ new 3 Repeat Step 2 replacing ũ old and ṽ old by ũ new and ṽ new until convergence. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 15 / 33

Group Sparse PCA Group Sparse PCA Algorithm 1 Apply the SVD to X and obtain the best rank-one approximation of X as δũṽ T. Set ṽ old = ṽ = [ṽ [1],..., ṽ [K] ] and ũ old = δũ. 2 Update: a) ṽ new = [h λ (X T [1]ũold),..., h λ (X T [K]ũold)] b) ũ new = Xṽ new / Xṽ new 3 Repeat Step 2 replacing ũ old and ṽ old by ũ new and ṽ new until convergence. Sparse loadings v i (i > 1) are obtained via rank-one approximation of residual matrices. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 15 / 33

Group Sparse PCA Group Sparse PCA Algorithm 1 Apply the SVD to X and obtain the best rank-one approximation of X as δũṽ T. Set ṽ old = ṽ = [ṽ [1],..., ṽ [K] ] and ũ old = δũ. 2 Update: a) ṽ new = [h λ (X T [1]ũold),..., h λ (X T [K]ũold)] b) ũ new = Xṽ new / Xṽ new 3 Repeat Step 2 replacing ũ old and ṽ old by ũ new and ṽ new until convergence. Sparse loadings v i (i > 1) are obtained via rank-one approximation of residual matrices. Choice of λ using cross-validation or an ad-hoc approach. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 15 / 33

Multiblock data structure To select 1 column in the original table (categorical variable X) = To select a block of indicator variables in the complete disjunctive table Sparse MCA (SMCA): Extension of the GSPCA for blocks of indicator variables Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 16 / 33

MCA Generalized SVD of F r = F1 c = F T 1 D r = diag(r) D c = diag(c) p [j] =Nb of modalities of variable j F = P Q T with P T MP = Q T WQ = I with F = [F [1]... F [j]... F [J] ] and Q = [Q [1]... Q [j]... Q [J] ] In the case of PCA: M = W = I In the case of MCA: M = D r W = D c Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 17 / 33

From MCA to SMCA GSVD as low rank approximation matrices min F F (1) 2 F (1) W = min F p, q p qt 2 W (10) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 18 / 33

From MCA to SMCA GSVD as low rank approximation matrices min F F (1) 2 F (1) W = min F p, q p qt 2 W (10) where F 2 W = tr(m 1 2 FWF T M 1 2 ) is the norm W-generalized The norm is under the constraints of M and W Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 18 / 33

From MCA to SMCA GSVD as low rank approximation matrices min F F (1) 2 F (1) W = min F p, q p qt 2 W (10) where F 2 W = tr(m 1 2 FWF T M 1 2 ) is the norm W-generalized The norm is under the constraints of M and W F p q T 2 W = tr (M 1 2 (F p q T )W(F p q T ) T M 1 2 ) (11) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 18 / 33

From MCA to SMCA GSVD as low rank approximation matrices min F F (1) 2 F (1) W = min F p, q p qt 2 W (10) where F 2 W = tr(m 1 2 FWF T M 1 2 ) is the norm W-generalized The norm is under the constraints of M and W F p q T 2 W = tr (M 1 2 (F p q T )W(F p q T ) T M 1 2 ) (11) Sparse MCA problem min p, q F p qt 2 W +P λ( q) p T M p = q t W q = 1 (12) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 18 / 33

From MCA to SMCA GSVD as low rank approximation matrices min F F (1) 2 F (1) W = min F p, q p qt 2 W (10) where F 2 W = tr(m 1 2 FWF T M 1 2 ) is the norm W-generalized The norm is under the constraints of M and W F p q T 2 W = tr (M 1 2 (F p q T )W(F p q T ) T M 1 2 ) (11) Sparse MCA problem min p, q F p qt 2 W +P λ( q) p T M p = q t W q = 1 (12) ( ( ) ) min tr M 1 2 (F p q T )W(F p q T ) T M 1 2 +P λ ( q) p, q (13) Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 18 / 33

From MCA to SMCA Remember GSPCA solution ( ) min tr(ṽ [k] ṽ[k] T ) 2 tr(ṽ [k]ũ T X [k] ) + P λ (ṽ [k] ) ṽ [k] ṽ = h λ (X T [k]ũ) = ( 1 λ 2 J[k] X T [k]ũ 2 ) + X T [k]ũ Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 19 / 33

From MCA to SMCA Remember GSPCA solution ( ) min tr(ṽ [k] ṽ[k] T ) 2 tr(ṽ [k]ũ T X [k] ) + P λ (ṽ [k] ) ṽ [k] ṽ = h λ (X T [k]ũ) = ( Sparse MCA solution 1 λ 2 J[k] X T [k]ũ 2 ) + X T [k]ũ ( ) min tr( q [k] W q [k] q T [k] ) 2 tr(mf [k]w [k] q [k] p T ) + P λ ( q [k] ) [k] q = h λ (F T [k] M p) = ( J[k] 1 λ 2 F T [k] M p W ) + F T [k] M p Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 19 / 33

Sparse MCA Sparse MCA Algorithm 1 Apply the GSVD to F and obtain the best rank-one approximation of F as δ p q T. Set p old = δ p and q old = q. 2 Update: a) q new = [h λ (F T [1] M p old),..., h λ (F T [J] M p old)] b) p new = F q new / F q new 3 Repeat Step 2 replacing p old and q old by p new and q new until convergence. Sparse loadings q i (i > 1) are obtained via rank-one approximation of residual matrices. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 20 / 33

Properties of the SMCA Properties MCA Sparse MCA Barycentric properties Non correlated components Orthogonal loadings Inertia TRUE TRUE TRUE δ j tot 100 TRUE FALSE FALSE Adjusted variance Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 21 / 33

Sparse MCA: exemple on breeds of dogs From Tenenhaus, M. (2007) Data I = 27 breeds of dogs J = 6 variables Q = 16 (total number of modalities) X: 27 6 matrix of categorical variables D: 27 16 complete disjunctive table D=(D [1],..., D [6] ) 1 block = 1 descriptive variable = 1 sub-matrix D [j] Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 22 / 33

Sparse MCA: exemple on breeds of dogs Result of the MCA Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 23 / 33

Sparse MCA: exemple on breeds of dogs Result of the Sparse MCA From λ = 0.25: CPEV rapidly We choose λ = 0.25 to find a compromise between the number of variables selected and the % of variance lost. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 24 / 33

Sparse MCA: exemple on breeds of dogs Result of the Sparse MCA From λ = 0.25: CPEV rapidly We choose λ = 0.25 to find a compromise between the number of variables selected and the % of variance lost. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 24 / 33

Sparse MCA: exemple on breeds of dogs Result of the Sparse MCA Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 25 / 33

Sparse MCA: exemple on breeds of dogs Comparison of the loadings MCA Sparse MCA Variable Comp1 Comp2 Comp3 Comp4 Comp1 Comp2 Comp3 Comp4 large -0.361 0.071-0.005 0.060-0.389 0.000 0.000 0.000 medium 0.280 0.287 0.300-0.055 0.226 0.000 0.000 0.000 small 0.291-0.400-0.293-0.041 0.390 0.000 0.000 0.000 lightweight 0.316-0.389-0.193-0.081 0.368-0.256 0.000 0.000 heavy -0.047 0.390-0.133 0.088-0.075 0.451 0.000 0.000 very heavy -0.294-0.215 0.458-0.055-0.305-0.479 0.000 0.000 slow 0.059-0.383 0.296 0.133 0.000-0.561 0.000 0.000 fast 0.224 0.256 0.057-0.299 0.000 0.282 0.000 0.000 veryfast -0.303 0.156-0.391 0.168 0.000 0.328 0.000 0.000 unintelligent 0.173 0.157 0.356 0.236 0.000 0.000 0.693-0.693 avg intelligent -0.145-0.309-0.168 0.125 0.000 0.000-0.327 0.327 very intelligent -0.086 0.125-0.330-0.491 0.000 0.000-0.642 0.642 unloving -0.366-0.084 0.030 0.087-0.462 0.000 0.000 0.000 veryaffectionate 0.353 0.081-0.029-0.084 0.445 0.000 0.000 0.000 agressive -0.170-0.096 0.162-0.515 0.000 0.000 0.000 0.000 non agressive 0.164 0.093-0.156 0.497 0.000 0.000 0.000 0.000 Nb non-zero loadings 16 16 16 16 8 6 3 3 Adjusted variance (%) 28.19 22.80 13.45 9.55 23.03 17.40 10.20 9.50 Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 26 / 33

Sparse MCA: application on SNPs data Single nucleotide polymorphism (SNP): A single base pair mutation at a specific locus The most frequent polymorphism Data I = 502 women J = 537 SNPs Q = 1554 (2 or 3 modalities by SNP) X: 502 537 matrix of categorical variables (SNPs) D: 502 1554 complete disjunctive matrix D=(D [1],..., D [537] ) 1 block = 1 SNP=1 sub-matrix D [j] Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 27 / 33

Sparse MCA: application on SNPs data Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 28 / 33

Sparse MCA: application on SNPs data λ = 0.0045: CPEV=0.36% and 300 columns selected on Comp 1. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 28 / 33

Sparse MCA: application on SNPs data λ = 0.0045: CPEV=0.36% and 300 columns selected on Comp 1. λ = 0.005: CPEV= 0.32% and 174 columns selected on Comp 1. we choose λ = 0.005. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 28 / 33

Sparse MCA: application on SNPs data Comparison of the loadings SNPs MCA SMCA Comp1 Comp2 Comp1 Comp2 SNP1.AA -0.078 0.040-0.092 0.102 SNP1.AG -0.014-0.027-0.022-0.053 SNP1.GG 0.150-0.002 0.132-0.003 SNP2.AA -0.082 0.041-0.118 0.000 SNP2.AG -0.021-0.025-0.020 0.000 SNP2.GG -0.081 0.040-0.001 0.000 SNP3.CC -0.004 0.050 0.000 0.000 SNP3.CG 0.016 0.021 0.000 0.000 SNP3.GG -0.037-0.325 0.000 0.000 SNP4.AA 0.149-0.003 0.050 0.000 SNP4.AG -0.016-0.025-0.002 0.000 SNP4.GG -0.081 0.040-0.100 0.000............... Nb non-zero loadings 1554 1554 172 108 Variance (%) 1.14 0.63 0.32 0.16 Cumulative variance (%) 1.14 1.77 0.32 0.48 Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 29 / 33

Sparse MCA: application on SNPs data Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 30 / 33

Sparse MCA: application on SNPs data Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 31 / 33

Conclusions In a unsupervised multiblock data context: GSPCA for continuous variables, SMCA for categorical variables. Produce sparse loading structures (with limited loss of explained variance) easier interpretation and comprehension of the results. Very powerful in a context of variable selection in high dimension issues reduce noise as well as computation time. Research in progress: Extension of Sparse MCA to select groups and predictors within a group sparsity at both group and individual feature levels compromise between Sparse MCA and sparse group lasso developed by Simon et al. (2012). Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 32 / 33

References [1] Jolliffe, I. T. et al. (2003), A modified principal component technique based on the LASSO, Journal of Computational and Graphical Statistics, 12, 531 547, [2] Shen, H. et Huang, J. (2008), Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, 99, 1015 1034, [3] Simon, N., Friedman, J., Hastie, T. et Tibshirani, R. (2012), A Sparse-Group Lasso, Journal of Computational and Graphical Statistics, DOI:10.1080/10618600.2012.681250, [4] Tenenhaus, M. (2007), Statistique; méthodes pour décrire, expliquer et prévoir, Dunod, 252 254, [5] Yuan, M. et Lin, Y. (2006), Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B, 68, 49 67, [6] Zou, H., Hastie, T. et Tibshirani, R. (2006), Sparse Principal Component Analysis, Journal of Computational and Graphical Statistics, 15, 265 286. Bernard, Abdi, Tenenhaus, Guinot, Saporta Group Sparse PCA and ACM Sparse 33 / 33