Canonical Correlation Analysis with Kernels

Similar documents
Kernel methods, kernel SVM and ridge regression

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

Kernel methods for comparing distributions, measuring dependence

A Least Squares Formulation for Canonical Correlation Analysis

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Principal Component Analysis

Kernel Sliced Inverse Regression With Applications to Classification

Learning in Bayesian Networks

Data Mining and Analysis: Fundamental Concepts and Algorithms

Metabolic networks: Activity detection and Inference

Statistical Machine Learning

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

Canonical Correlations

Classifier Complexity and Support Vector Classifiers

Perceptron Revisited: Linear Separators. Support Vector Machines

Kernel PCA, clustering and canonical correlation analysis

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Unsupervised dimensionality reduction

Learning with Singular Vectors

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

STATISTICAL LEARNING SYSTEMS

Support vector machines, Kernel methods, and Applications in bioinformatics

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Applied Machine Learning Annalisa Marsico

Kernel Methods. Outline

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Kernels and the Kernel Trick. Machine Learning Fall 2017

Introduction to SVM and RVM

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Lecture 10: A brief introduction to Support Vector Machine

Maximum variance formulation

Reproducing Kernel Hilbert Spaces

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Support Vector Machines

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Statistical learning theory, Support vector machines, and Bioinformatics

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

STAT 100C: Linear models

Machine Learning (BSMC-GA 4439) Wenke Liu

Introduction to Machine Learning

Support Vector Machine (SVM) and Kernel Methods

PCA, Kernel PCA, ICA

Kernel PCA: keep walking... in informative directions. Johan Van Horebeek, Victor Muñiz, Rogelio Ramos CIMAT, Guanajuato, GTO

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines

Machine learning for automated theorem proving: the story so far. Sean Holden

Standardization and Singular Value Decomposition in Canonical Correlation Analysis

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Support Vector Machine (SVM) and Kernel Methods

Review. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Designing Kernel Functions Using the Karhunen-Loève Expansion

Learning From Data Lecture 25 The Kernel Trick

Lecture 18: Multiclass Support Vector Machines

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

6-1. Canonical Correlation Analysis

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

LMS Algorithm Summary

The Geometry Of Kernel Canonical Correlation Analysis

Statistical learning on graphs

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Properties of Matrices and Operations on Matrices

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

STAT 100C: Linear models

Linear Dimensionality Reduction

1 Principal Components Analysis

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Nonlinear Dimensionality Reduction

Statistical Methods for SVM

Support Vector Machine (SVM) and Kernel Methods

Introduction. x 1 x 2. x n. y 1

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55

Basis Expansion and Nonlinear SVM. Kai Yu

Advanced data analysis

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Machine Learning A W VO

Kernel Methods. Barnabás Póczos

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Modeling Dependence of Daily Stock Prices and Making Predictions of Future Movements

A Kernel on Persistence Diagrams for Machine Learning

Machine Learning - MT & 14. PCA and MDS

Kernel Principal Component Analysis

Computation. For QDA we need to calculate: Lets first consider the case that

On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares

Nonlinear Dimensionality Reduction. Jose A. Costa

4 Bias-Variance for Ridge Regression (24 points)

Pattern Recognition 2018 Support Vector Machines

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Support Vector Machine (continued)

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Support Vector Machines Explained

(Kernels +) Support Vector Machines

Transcription:

Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1

Overview Applied Kernel Generalized Canonical Correlation Analysis Explained 1. CCA: Canonical Correlation Analysis; 2. GCCA: Generalized Canonical Correlation Analysis; 3. KGCCA: Kernel Generalized Canoncial Correlation Analysis; Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 2

Canonical Correlation Analysis - in words CCA seeks to identify and quantify the associations between two sets of variables. It searches for linear combinations of the original variables having maximal correlation. Further pairs of maximally correlated linear combinations are chosen such, that they are othogonal to those already identified. The pairs of linear combinations are called canonical variables and their correlations the canonical correlations. The canonical correlations measure the strength of association between the two sets of variables. CCA is closely related to other linear subspace methods like Principal Component Analysis, Partial Least Squares and Multivariate Linear Regression. Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 3

Canonical Correlation Analysis - in formulas Data: m microarrays measuring N genes, organized in N m matrix Z. Z = [ X Y ] { X : N p matrix Y : N q matrix Linear combinations of the variables X i and Y i : U a = a X = p a i X i V b = b Y = 1 q b i Y i 1 Correlation is defined as: corru, V ) = covu, V ) varu) varv ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 4

Stating the CCA problem CCA is the solution of the optimization problem: maximize a,b corru a, V b ) subject to varu a ) = varv b ) = 1 The maximal value ρ is the first canonical correlation. The canonical variates U α, V β are defined by: α, β) = argmax a,b corra X, b Y ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 5

Solving the optimization problem First we decompose covz): covz) = Z Z R Z = ) RXX R XY R Y X R Y Y Solving the optimization problem by Lagrange method leads to an Eigenvalue equation: B 1 A w = ρw A = 0 RXY R Y X 0 ) B = ) RXX 0 0 R Y Y w = a b ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 6

Relation to other linear subspace methods A B PCA R XX I CCA PLS MLR 0 RXY R Y X 0 0 RXY R Y X 0 0 RXY R Y X 0 ) ) RXX 0 0 R Y Y ) I 0 0 I ) ) RXX 0 0 I ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 7

Generalized Canonical Correlation Analysis How to deal with more than two sets? Straightforward: maximize the sum of all pairwise correlations. Using kernels: Combining several datasets by summing up kernel matrices. Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 8

Kernel Canonical Correlation Analysis What is a kernel function? A similarity measure like the inner product x, y = x i y i. k : S S R Given a nonlinear) map Φ into a highdimensional) Feature space, we can define a kernel function by: kx i, x j ) := Φx i ), Φx j ) What is a kernel matrix? A positive definite matrix which summarizes the similarities of all members of a set. K ij = kx i, x j ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 9

The Kernel Trick Given an algorithm which is formulated in terms of an inner product, one can construct an alternative algorithm by replacing the inner product by an kernel function k. This is used in SVMs and can also be applied to CCA. Useful, because it makes linear algorithms nonlinear and heterogeneous datasets can be combined. Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 10

Examples of kernels Radial basis function kernels - a kernel for vectorial data. kx i, x j ) = exp x i x j /σ) The diffusion kernel - a kernel for graphs. Given an undirected graph Γ = V, E) with adjacency matrix A. Let A i+ be the sum over the i-th row of A. We define the matrix H by H = A diaga 1+,..., A n+ ) The diffusion kernel is defined by the positive definite matrix K diff : K diff = expch) = c k k! Hk Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 11

Kernel Canonical Correlation Analysis Both ordinary and kernel CCA can be written as the solution of an Eigenvalue equation of the form B 1 A w = ρw. Ordinary CCA A = 0 RXY R Y X 0 ) B = ) RXX 0 0 R Y Y w = a b ) Kernel CCA A = 0 K X K Y K Y K X 0 ) B = ) KX K X 0 0 K Y K Y w = a b ) Florian Markowetz, Kernel Generalized Canonical Correlation Analysis, 2003 Mar 10 12