Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Size: px

Start display at page:

Download "Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis"

Kory Candace Fields
5 years ago
Views:

1 Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German ECML KPDD, Antwerp, Belgium, September 16, 2008

2 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook

3 Paired Datasets Realistic data comes in man different modalities: tet, images... We call data paired, if the samples come in more than one such representation at the same time, e.g. images + captions, audio + transscript, multi-language tet documents MRI + CT scans We assume a latent aspect that relates the representations. Z X Y

4 Paired Datasets A full paired dataset has correspondences for all samples: X = { 1, 2,..., n }, Y = { 1, 2,..., n }.

5 Paired Datasets However, data is often onl partiall paired: ˆX = { 1, 2,..., n, n+1,..., n+p },?? Ŷ = { 1, 2,..., n, n+1,..., n+p }.

6 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook

7 Review of Canonical Correlation Analsis Principal Component Analsis (PCA) Single dataset 1,..., n. Find projections that maimize the variance of the projected data. simple eigenvalue problem. Canonical Correlation Analsis (CCA) Full paired data 1 1,..., n n. Find projection directions w and w that maimize the correlation between the projected data. ma w,w w T C w w T C w w T C w C /C /C : (cross) correlation matrices generalized eigenvalue problem. supervised situation: { 1, 1} ground truth labels CCA LDA!

8 Wh is Correlation better than Variance? To dataset: 1 signal direction, 1 (high variance) noise direction. X original data maimal variance maimal correlation Y CCA can ignore noise that is uncorrelated between X and Y.

9 Kernelization Kernel Canonical Correlation Analsis Kernelize CCA, to use it for arbitrar input domains, and (latent) data embeddings φ : X H X, φ : Y H Y. k ( i, j ) = φ ( i ), φ ( j ), k ( i, j ) = φ ( i ), φ ( j ) w = i α iφ ( i ), w = i β iφ ( i ) KCCA CCA in H X /H Y : Solve ma α,β α T K K β α T K 2 α β T K 2 β K, K : kernel matrices of X, Y, equivalent to the constrained optimization problem ma α,β α T K K β sb.t. α T K 2 α = 1 and β T K 2 β = 1.

10 Need for Regularization Problem: Maimization is degenerate for invertible K or K. ma α,β α T K K β sb.t. α T K 2 α = 1 and β T K 2 β = 1 allows α, β with perfect correlation but without learning anthing! We need to regularize!

11 Need for Regularization Problem: Maimization is degenerate for invertible K or K. ma α,β α T K K β sb.t. α T K 2 α = 1 and β T K 2 β = 1 allows α, β with perfect correlation but without learning anthing! Tikhonov regularization: ma w,w = ma α,β w T C w (w T C w + ε w 2 ) ( w T C w + ε w 2) α T K K β α T (K 2 + ε K ) α β ( ) T K 2 + ε K β Regularization parameters ε, ε need to be model selected.

12 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook

13 Proposed Laplacian Regularization Manifold assumption: The data lie on a lower-dimensional manifold M H. Use the Laplace operator M to measure smoothness along M. Laplacian regularization: Approimate M b Graph Laplacian L = D 1/2 (D W )D 1/2 (W similarit matri, D diagonal matri with D ii = j W ij). ma α,β α T K K β ( sb.t. α T K K + ε K + γ K L K )α = 1, ) β = 1 β T ( K K + ε K }{{} Tikhonov + γ K L K }{{} Laplacian Favour projections that var smoothl wrt. the manifold structure.

14 Partiall paired data revisited Making use of unpaired data: We don t need correspondences to compute Laplacian. More data allows a better estimate of the manifold structure.

15 Partiall paired data revisited Making use of unpaired data: We don t need correspondences to compute Laplacian. More data allows a better estimate of the manifold structure. Notation: Paired training data { 1,..., n } and { 1,..., n }, Additional unpaired data { n+1,..., n+p } and { n+1,..., n+p }. Data matri X = ( 1,..., n ) T HX n, Etended data matri ˆX = ( 1,..., n+p ) T H n+p X, Kernel matri K = XX T R n n, Etended kernel matrices Kˆ = ˆX X T R (n+p ) n, Kˆˆ = ˆX ˆX T R (n+p ) (n+p ), etc.

16 Semi-Supervised Laplacian Regularization Semi-Supervised KCCA with Laplacian Regularization: ma α,β α T Kˆ K ŷ β ( sb.t. α T γ ) Kˆ K ˆ + ε Kˆˆ + (n + p ) 2 KˆˆLˆ Kˆˆ α = 1 ( β T γ Kŷ K ŷ + ε Kŷŷ + }{{} (n + p ) 2 K ŷŷ Lŷ Kŷŷ Tikhonov }{{} semi-supervised Laplacian Finds data projections that achieve high correlation when applied to X Y var smoothl on manifolds defined b ˆX, Ŷ. ) β = 1 Four regularization parameters: ε, ε, γ, γ model selection

17 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook

18 Model Selection Supervised szenario: cross-validation or similar. Unsupervised szenario: dependence maimization HSNIC HSNIC: kernel measure of dependence between random variables (Fukumizu, Bach, & Gretton 2007) Hilbert-Schmidt norm of normalized cross-covariance operator V Closel related to KCCA HSNIC(, ) V 2 HS Regularization with Laplace-Beltrami operators on data manifolds V =(Σ + ε I }{{} Tikhonov + γ }{{ M ) 1 2 Σ } Laplacian }{{} covariance operator ( Σ + ε I + γ M ) 1 2 Finite sample set: closed form epression ˆV (see paper or poster)

19 Model Selection: Concluded ˆV 2 HS estimates correlation achievable b KCCA. But beware: overfitting! trivial maimum at ε = ε = γ = γ = 0. Better model selection criterion: Maimize increase in correlation onl due to the data pairing ρ(ε, ε, γ, γ ) = ˆV 2 HS ˆV Π() 2 HS with Π() randoml reshuffled version(s) of.

20 Overview Paired Data Kernel Canonical Correlation Analsis Semi-Supervised Laplacian Regularization Model Selection Results Summar and Outlook

21 Eample Results l 2 norm of cross correlations Four datasets of images + associated tet Multiple train / test splits Var percentage of correspondences Eample results: UIUC-ISD Bass, 6 semantic classes Perform (semi-supervised) KCCA on training set Measure correlations on test set Measure scatter ratio on test set (larger is better) Cross correlation Image scatter ratio Tet scatter ratio 2 Laplacian regularization no Laplacian Regularization image scatter ratio 1.6 Laplacian regularization no Laplacian Regularization tet scatter ratio 2.3 Laplacian regularization 2.2 no Laplacian Regularization % data w/ correspondences % data w/ correspondences % data w/ correspondences More results in paper.

22 Summar & Future Work Summar: Laplacian regularization for KCCA Allows straight-forward semi-supervised etension Model selection: closed form epression for Laplacian regularized kernel independence measure (HSNIC) Eperimental results: projections better respect latent aspect Future work: Improved model selection criterion ratio used somewhat heuristic Other application of Laplacian regularized HSNIC Causalit inference ICA

23 Summar & Future Work Summar: Laplacian regularization for KCCA Allows straight-forward semi-supervised etension Model selection: closed form epression for Laplacian regularized kernel independence measure (HSNIC) Eperimental results: projections better respect latent aspect Future work: Improved model selection criterion ratio used somewhat heuristic Other application of Laplacian regularized HSNIC Causalit inference ICA Thank ou.

24 More about V Covariance operator: Σ : H Y H X defined b f, Σ g HX = E, [f ()g()] E [f ()]E [g()] for all f H X, g H g.

25 More about V Covariance operator: Σ : H Y H X defined b f, Σ g HX = E, [f ()g()] E [f ()]E [g()] for all f H X, g H g. Normalized Cross-covariance operator, V, such that Σ = Σ 1 2 V Σ 1 2

26 Model Selection: Laplacian Regularized HSNIC Empirical estimate of V : ( 1 ˆV = n X T X + ε I + γ m 2 ) 1 ˆX T 2 1 Lˆ ˆX n X T Y ( 1 n Y T Y + ε I + γ m 2 Ŷ T Lŷ Ŷ Closed form solution [ ] ˆV 2 HS = Tr ˆV ˆV T = Tr [M M ] with M = I n ( ) 1 2 ni + 1 K 1 ( m 2 K ε ˆ I + Lˆ Kˆˆ ε ε γ ) 1 LˆKˆ) 1

Semi-supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Matthew B Blaschko, Christoph H Lampert, and Arthur Gretton Max Planck Institute for Biological Cybernetics Department