Canonical Correlation Analysis with Kernels

Size: px

Start display at page:

Download "Canonical Correlation Analysis with Kernels"

Leona Fisher
6 years ago
Views:

1 Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1

2 Overview Applied Kernel Generalized Canonical Correlation Analysis Explained 1. CCA: Canonical Correlation Analysis; 2. GCCA: Generalized Canonical Correlation Analysis; 3. KGCCA: Kernel Generalized Canoncial Correlation Analysis; Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 2

3 Canonical Correlation Analysis - in words CCA seeks to identify and quantify the associations between two sets of variables. It searches for linear combinations of the original variables having maximal correlation. Further pairs of maximally correlated linear combinations are chosen such, that they are othogonal to those already identified. The pairs of linear combinations are called canonical variables and their correlations the canonical correlations. The canonical correlations measure the strength of association between the two sets of variables. CCA is closely related to other linear subspace methods like Principal Component Analysis, Partial Least Squares and Multivariate Linear Regression. Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 3

4 Canonical Correlation Analysis - in formulas Data: m microarrays measuring N genes, organized in N m matrix Z. Z = [ X Y ] { X : N p matrix Y : N q matrix Linear combinations of the variables X i and Y i : U a = a X = p a i X i V b = b Y = 1 q b i Y i 1 Correlation is defined as: corru, V ) = covu, V ) varu) varv ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 4

5 Stating the CCA problem CCA is the solution of the optimization problem: maximize a,b corru a, V b ) subject to varu a ) = varv b ) = 1 The maximal value ρ is the first canonical correlation. The canonical variates U α, V β are defined by: α, β) = argmax a,b corra X, b Y ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 5

6 Solving the optimization problem First we decompose covz): covz) = Z Z R Z = ) RXX R XY R Y X R Y Y Solving the optimization problem by Lagrange method leads to an Eigenvalue equation: B 1 A w = ρw A = 0 RXY R Y X 0 ) B = ) RXX 0 0 R Y Y w = a b ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 6

7 Relation to other linear subspace methods A B PCA R XX I CCA PLS MLR 0 RXY R Y X 0 0 RXY R Y X 0 0 RXY R Y X 0 ) ) RXX 0 0 R Y Y ) I 0 0 I ) ) RXX 0 0 I ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 7

8 Generalized Canonical Correlation Analysis How to deal with more than two sets? Straightforward: maximize the sum of all pairwise correlations. Using kernels: Combining several datasets by summing up kernel matrices. Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 8

9 Kernel Canonical Correlation Analysis What is a kernel function? A similarity measure like the inner product x, y = x i y i. k : S S R Given a nonlinear) map Φ into a highdimensional) Feature space, we can define a kernel function by: kx i, x j ) := Φx i ), Φx j ) What is a kernel matrix? A positive definite matrix which summarizes the similarities of all members of a set. K ij = kx i, x j ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 9

10 The Kernel Trick Given an algorithm which is formulated in terms of an inner product, one can construct an alternative algorithm by replacing the inner product by an kernel function k. This is used in SVMs and can also be applied to CCA. Useful, because it makes linear algorithms nonlinear and heterogeneous datasets can be combined. Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 10

11 Examples of kernels Radial basis function kernels - a kernel for vectorial data. kx i, x j ) = exp x i x j /σ) The diffusion kernel - a kernel for graphs. Given an undirected graph Γ = V, E) with adjacency matrix A. Let A i+ be the sum over the i-th row of A. We define the matrix H by H = A diaga 1+,..., A n+ ) The diffusion kernel is defined by the positive definite matrix K diff : K diff = expch) = c k k! Hk Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 11

12 Kernel Canonical Correlation Analysis Both ordinary and kernel CCA can be written as the solution of an Eigenvalue equation of the form B 1 A w = ρw. Ordinary CCA A = 0 RXY R Y X 0 ) B = ) RXX 0 0 R Y Y w = a b ) Kernel CCA A = 0 K X K Y K Y K X 0 ) B = ) KX K X 0 0 K Y K Y w = a b ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 12

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;