Kernel Generalized Canonical Correlation Analysis Arthur Tenenhaus To cite this version: Arthur Tenenhaus. Kernel Generalized Canonical Correlation Analysis. JdS 10, May 2010, Marseille, France. CD-ROM Proceedings (6 p.), 2010. <hal-00553602> HAL Id: hal-00553602 https://hal-supelec.archives-ouvertes.fr/hal-00553602 Submitted on 7 Jan 2011 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Kernel Generalized Canonical Correlation Analysis Arthur Tenenhaus SUPELEC Sciences des Systèmes (E3S)-Département Signaux et Systèmes Electroniques 3, rue Joliot Curie, Plateau de Moulon, 91192 Gif-sur-Yvette Cedex Résumé Un problème classique en statistique est d étudier les liens entre plusieurs blocs de variables. L obectif est de trouver les variables d un bloc influençant les variables d autres blocs. L Analyse Canonique Généralisée Régularisée (ACGR) est un cadre très attractif pour traiter ce type de problématique. Cependant, l ACGR ne capture que des relations linéaires entre blocs et pour accéder à des liens non linéaires nous proposons une extension à noyau de l ACGR. Mots-clés: Méthodes à noyau, Analyse de tableaux Multiples, Analyse Canonique Généralisée Régularisée, Approche PLS Abstract A classical problem in statistics is to study relationships between several blocs of variables. The goal is to find variables of one bloc directly related to variables of other blocs. The Regularized Generalized Canonical Correlation Analysis (RGCCA) is a very attractive framewor to study such a ind of relationships between blocs. However, RGCCA captures linear relations between blocs and to assess nonlinear relations we propose a ernel extension of RGCCA. Keywords: Kernel Methods, Multibloc Data Analysis, Regularized Generalized Canonical Correlaton Analysis, PLS approach 1 Introduction A common problem in applied statistics is to relate several blocs of variables to each other in order to find variables of one bloc directly related to variables of other blocs. Typical examples are found in large variety of fields such as bioinformatics, sensory analysis, mareting, food research... To study such a ind of relationships between blocs, the starting point of the paper is the generalized canonical correlation analysis (gcca) proposed in [4]. This version of gcca captures linear relationships between blocs and to assess nonlinear relations, we propose in this paper a ernel extension of gcca. The paper is organized as follows: the first part presents the initial formulation of gcca and the second part is devoted to its nonlinear version. 1
2 Population generalized canonical correlation analysis Let us consider J random p -dimensional centered column vectors x and J p -dimensional non random column vectors ξ. We also consider a networ of connections between the random vectors by defining a design matrix C = (c ) : c = 1 if x and x are connected and 0 otherwise. Now consider two linear combinations η = ξ t x and η = ξ t x. The correlation between η and η is : ρ(ξ t x, ξ t x ) = ξ t Σ ξ (ξ t Σ ξ ) 1/2 (ξ t Σ ξ ) 1/2 (1) where Σ = E ( x x) t and Σ = E (x x t ). The population generalized canonical correlation analysis is defined as the following optimization problem [4]: 1 < J c g ( ρ(ξx t, ξ tx ) ) ξ 1,ξ 2,...,ξ J (2) s.c. var(ξx t ) = 1, = 1,..., J where g is the identity, the absolute value or the square function. Problem (2) is equivalent to the following optimization problem: 1 < J c g ( ) ξσ t ξ ξ 1,ξ 2,...,ξ J (3) s.c. ξσ t ξ = 1, = 1,..., J By considering the derivates with respect to ξ and λ of the Lagrangian function associated to optimization problem (3), we obtain J stationnary equations : 1 ϕ Σ 1 =1, c g (ξ t Σ ξ )Σ ξ = λ ξ, = 1,..., J (4) subect to the constraints: ξ t Σ ξ = 1, = 1,..., J (5) 2.1 Regularized Generalized Canonical Correlation Analysis In practice, we have to estimate the ξ s from a finite sample. Let s consider J blocs X 1,..., X J of centered variables measured on a set of n individuals (a row of X represents a realization of the row-random vector x t ). C = {c } is now a design matrix describing a networ of relationships between blocs: c = 1 for two connected blocs, and 0 otherwise. In the case of high multi-colinearity or when the number of observations is 2
smaller than the number of variables, the sample covariance matrix S = 1 n Xt X is a bad estimation of the true covariance matrix Σ. A suggestion for finding a better estimation of the true covariance matrix is to consider the class of linear combinations {Ŝ = τ I + (1 τ )S } of the identity matrix I and the sample covariance matrix S [3]. We then consider a sample version of stationnary equations (4) with the constraints (5) in replacing Σ by Ŝ and Σ by S = 1 n Xt X. This leads to J sample stationary equations which are the stationary equations associated to the regularized generalized canonical correlation analysis (RGCCA) defined below: 1 < J c g(ĉov(x a, X a )) a 1,a 2,...,a J (6) s.c. (1 τ ) var(x a ) + τ a 2 = 1, = 1,..., J The regularisation parameters τ [0, 1], = 1,..., J interpolate smoothly between the maximisation of the covariance (all τ s = 1) and the maximisation of the correlation (all τ s = 0). More details on RGCCA can be found in [4]. This formulation of RGCCA detects only linear relations between blocs and in the next section, we introduce a ernel extension of RGCCA allowing extracting nonlinear relations between blocs. 3 Kernel generalized Canonical Correlation Analysis For each = 1,..., J, let H be a Reproducing Kernel Hilbert space (RKHS) with associated ernel (, ) and feature map Φ (x) = (, x). We define Kernel Generalized Canonical Correlation Analysis (KGCCA) as the following optimization problem. where argmax 1 < J c g (ρ (f (x ), f (x ))) f 1,...,f J H 1... H J s.c. var(f (x )) = 1, = 1,..., J ρ(f (x ), f (x )) = (7) cov(f (x ), f (x )) var(f (x )) 1/2 var(f (x )) 1/2 (8) is the correlation between the random variables f (x ) and f (x ). The functions f and f are decided up to scale. From the reproducing property of RKHS, ρ(f (x ), f (x )) = cor( Φ (x ), f, Φ (x ), f ). Therefore, ρ(f (x ), f (x )) is the correlation between one dimensional linear proection of Φ (x ) onto f and Φ (x ) onto f. The canonical correlation between Φ (x ) and Φ (x ) is exactly the maximal possible correlation between these two proections. Now, the main obective is to estimate f 1,..., f J from a finite sample. For each = 1,..., J, let {x 1,..., x n } be a set of n observations of x, K be the associated Gram matrix defined as ( K ) rs = (x r, x s ) and K be the Gram matrix of the centered data points in the 3
feature space defined as K = P K P where P = I 1 1, where 1 is a n n matrix n composed of 1. For fixed f and f, it can be shown [1] that the empirical covariance of the proections in the feature space can be written as: ĉov ( Φ (x ), f, Φ (x ), f ) = 1 n αt K K α (9) where K and K are the Gram matrices associated with the data sets {x i } n i=1 and {x i }n i=1 respectively. Combining optimization problem (7) with equation (9) and adding constraint on the smoothness of the f through f H = α t K α, the empirical counterpart of KGCCA becomes that of performing the following maximization : 1 < J c g(ĉov(k α, K α )) α 1,...,α J ( ) (10) s.c. (1 τ ) 1 n K2 + τ K α = 1, = 1,..., J α t The regularisation parameters not only maes this optimisation problem well posed numerically and statistically ([1];[2]) but also provide control over the capacity of the function space where the solution is sought. The larger the values of τ are, the less sensitive the method to the input data is and the more stable (less prone to finding spurious relations) the solution becomes. Since the Gram matrices K, = 1,..., J are symmetric semidefinite, an (incomplete) cholesy decomposition is applied to solve optimimization problem (10) efficiently. Therefore, K = R t R where R is a ran(k ) = n n upper triangular matrix. Therefore, by noting w = R α optimization problem (10) becomes 1 < J c g(ĉov(r t w, R t w )) w 1,...,w J ( s.c. (1 τ ) 1 R ) (11) n R t + τ I n w = 1, = 1,..., J w t A new procedure is proposed to solve (11). The first step of the procedure is to cancel the derivatives of the Lagrangian function related to the maximization problem (11) with respect to w and λ. This yields to the following J stationary equations: 1 ϕ N 1 =1, subect to the constraints: c g ( 1 n wt R R t w )R R t w = λ w, = 1,..., J (12) w t N w = 1, = 1,..., J (13) where N = (1 τ ) 1 n R R t + τ I n. From the PLS terminology, we introduce for each bloc an outer component y = R t w and an inner component z defined as follows: z = 1 ϕ =1, c g (ĉov(y, y )) y (14) 4
Then combining equations (12), (14) and (13), we obtain the outer weights: w = [ ] z t R t N 1 1/2 R z N 1 R z, = 1,..., J (15) We then propose an iterative PLS style algorithm for finding a solution of the J stationary equations (see Table 1). This algorithm is monotonically convergent that means that the bounded criterion to be maximized is increasing at each step of the procedure (the proof of this result is beyond the scope of the paper). Table 1: Kernel Generalized Canonical Correlation Analysis Algorithm A. Initialisation A.1. Choose J arbitrary vectors w (0) 1,..., w (0) J A.2. Compute vectors w (0) 1,..., w (0) J [ w (0) = ( w (0) verifying the constraints as: ] 1/2 ) t N 1 w (0) N 1 w (0) A.3. Compute the outer components : y (0) 1 = R t 1w (0) 1,..., y (0) J = R t J w(0) J B. Inner component for the bloc X i Compute the inner component according to the choice og g: z (s) = 1 ϕ [ 1 =1 ( c g [cov y (s), y (s+1) )] y (s+1) + =+1 ( c g [cov y (s) )], y (s) where g (x)/ϕ = 1 for g equal to the identity, x for g equal to the square and sign(x) for g equal to the absolute value. C. Outer component for the bloc C.1. Compute the outer weight: w (s+1) = C.2. Compute the outer component: y (s+1) [ (z (s) ) t R t N 1 = R t w (s+1) ] 1/2 R z (s) N 1 y (s) R z (s) ] The procedure begins by an arbitrary choice of initialisation (A). Suppose the outer components y (s+1) 1, y (s+1) 2,..., y (s+1) 1 are constructed for the blocs 1,..., 1. The outer component y (s+1) is computed by considering the inner component z (s) for the bloc (B), then by considering the outer one (C). The procedure is iterated until convergence. 5
The essential feature of this algorithm is that each replacement is optimal, and sequential, that is to say that w (s) must be replaced by w (s+1) before replacing w (s) +1. This is the essence of the Gauss-Seigel Algorithm; this sequential approach leads to the monotone convergence of this algorithm. 4 Conclusion KGGCA is a very attractive framewor for non linear multibloc data analysis and covers a large spectrum of existing methods depending on the values of the regularisation parameters. Moreover, the proposed algorithm is computationally efficient but have some limitations: (1) There is no guarantee that the algorithm converges towards a fixed point of the stationary equations. (2) There is no guarantee that the algorithm converges towards a global optimum of the criterion. References [1] F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1 48, 2003. [2] K. Fuumizu, F. R. Bach, and A. Gretton. Statistical consistency of ernel canonical correlation analysis. Journal of Machine Learning Research, 8:361 383, 2007. [3] O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2):365 411, 2004. [4] A. Tenenhaus and M. Tenenhaus. Regularized Generalized Canonical Correlation Analysis. Psychometria, in revision. 6