Chapter 2 Canonical Correlation Analysis

Similar documents
1 Singular Value Decomposition and Principal Component

Vector Spaces. EXAMPLE: Let R n be the set of all n 1 matrices. x 1 x 2. x n

Multivariate Statistical Analysis

Chapter 3 Transformations

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

The Singular Value Decomposition

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Example Linear Algebra Competency Test

Chap 3. Linear Algebra

Review of some mathematical tools

Ch.3 Canonical correlation analysis (CCA) [Book, Sect. 2.4]

Linear Algebra: Matrix Eigenvalue Problems

Conceptual Questions for Review

Review of Linear Algebra

I. Multiple Choice Questions (Answer any eight)

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Linear inversion methods and generalized cross-validation

Quantum Computing Lecture 2. Review of Linear Algebra

Efficient and Accurate Rectangular Window Subspace Tracking

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

A Least Squares Formulation for Canonical Correlation Analysis

1 2 2 Circulant Matrices

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Numerical Methods for Solving Large Scale Eigenvalue Problems

Chapter 2 Wiener Filtering

Eigenvalues and Eigenvectors

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

Multi- and Hyperspectral Remote Sensing Change Detection with Generalized Difference Images by the IR-MAD Method

Linear Algebra using Dirac Notation: Pt. 2

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Math 408 Advanced Linear Algebra

Numerical Methods I Singular Value Decomposition

18.06 Problem Set 8 - Solutions Due Wednesday, 14 November 2007 at 4 pm in

Analysis Preliminary Exam Workshop: Hilbert Spaces

Lecture 15 Review of Matrix Theory III. Dr. Radhakant Padhi Asst. Professor Dept. of Aerospace Engineering Indian Institute of Science - Bangalore

Numerical Linear Algebra Homework Assignment - Week 2

MTH 102: Linear Algebra Department of Mathematics and Statistics Indian Institute of Technology - Kanpur. Problem Set

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Error estimates for the ESPRIT algorithm

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Review of Linear Algebra

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Diagonalizing Matrices

Math 307 Learning Goals

Linear Algebra Review. Vectors

Further Mathematical Methods (Linear Algebra) 2002

Maths for Signals and Systems Linear Algebra in Engineering

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Notes on Eigenvalues, Singular Values and QR

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

ON WEIGHTED PARTIAL ORDERINGS ON THE SET OF RECTANGULAR COMPLEX MATRICES

1 Linear Algebra Problems

ACM 104. Homework Set 5 Solutions. February 21, 2001

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Jordan Canonical Form

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

EE731 Lecture Notes: Matrix Computations for Signal Processing

The Singular Value Decomposition and Least Squares Problems

Linear Algebra - Part II

Eigenvalue and Eigenvector Problems

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

A Reduced Rank Kalman Filter Ross Bannister, October/November 2009

The QR Factorization

Lecture 7 MIMO Communica2ons

Matrix Algebra, part 2

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Next is material on matrix rank. Please see the handout

Problem # Max points possible Actual score Total 120

MA 265 FINAL EXAM Fall 2012

Solving large scale eigenvalue problems

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

12.4 Known Channel (Water-Filling Solution)

Cheat Sheet for MATH461

Background Mathematics (2/2) 1. David Barber

Lecture notes: Applied linear algebra Part 1. Version 2

Main matrix factorizations

LinGloss. A glossary of linear algebra

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method

Notes on basis changes and matrix diagonalization

Lecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation

MATH36001 Generalized Inverses and the SVD 2015

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

11.0 Introduction. An N N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if. A x = λx (11.0.1)

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Math 489AB Exercises for Chapter 2 Fall Section 2.3

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Review of similarity transformation and Singular Value Decomposition

Canonical Correlation Analysis of Longitudinal Data

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1 Linearity and Linear Systems

Differential Geometry of Surfaces

Basic Calculus Review

Matrices A brief introduction

Transcription:

Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales, leading to different modes of maximum correlation [1]. In this chapter, we explain how CCA works from a signal processing perspective. 2.1 Preliminaries Let a and e two zero-mean complex-valued circular random vectors of lengths L a and L, respectively,, without loss of generality, it is assumed that L a L. Let us define the longer random vector: c = [ a T T ] T, 2.1 the superscript T denotes transpose of a vector or a matrix. The covariance matrix of c is then = E cc H [ ] a = a, 2.2 a E denotes mathematical expectation, the superscript H is the conjugatetranspose operator, a = E aa H is the covariance matrix of a, = E H is the covariance matrix of, a = E a H is the cross-covariance matrix etween a and, and a = H a. It is assumed that rank a = L a and, unless stated otherwise, a and are assumed to have full rank. The Authors 2018 J. Benesty and I. Cohen, Canonical Correlation Analysis in Speech Enhancement, SpringerBriefs in Electrical and Computer Engineering, DOI 10.1007/978-3-319-67020-1_2 5

6 2 Canonical Correlation Analysis The Schur complement of a in is defined as [2] and the Schur complement of in is /a = a 1 a a 2.3 a/ = a a 1 a. 2.4 Clearly, oth matrices /a and a/ are nonsingular. An equivalent and more interesting way to express the two previous expressions is and 1/2 /a 1/2 1/2 a a/ 1/2 = I L 1/2 a 1 = I L 1/2 a 1/2 a = I L H a a 1/2 1/2 a a 1/2 = I L 2.5 a = I La 1/2 a a 1 = I La 1/2 a a 1/2 a 1/2 a 1/2 a 1/2 a = I La H = I La a, 2.6 I L and I La are the identity matrices of sizes L L and L a L a, respectively, = 1/2 a 1/2 a, = H, and a = H.From2.5 and 2.6, it can e oserved that the eigenvalues of and a are always smaller than 1 and, of course, positive or null. Furthermore, the matrices and a have the same nonzero eigenvalues [2]. Indeed, since det stands for determinant, it follows that det I La a = det IL, 2.7 λ L det λi La a = λ L a det λi L, 2.8 which shows that a and have the same nonzero characteristic roots. Oviously, a is nonsingular while is singular. Using the well-known eigenvalue decomposition [3], the Hermitian matrix a can e diagonalized as U H a au a = a, 2.9

2.1 Preliminaries 7 U a = [ u a,1 u a,2 u a,la ] 2.10 is a unitary matrix, i.e., Ua H U a = U a Ua H = I L a, and a = diag λ 1, λ 2,...,λ La 2.11 is a diagonal matrix. The orthonormal vectors u a,1, u a,2,...,u a,la are the eigenvectors corresponding, respectively, to the eigenvalues λ 1, λ 2,...,λ La of the matrix a, 1 > λ 1 λ 2 λ La > 0. In the same way, the Hermitian matrix can e diagonalized as U H U =, 2.12 U = [ u,1 u,2 u,l ] 2.13 is a unitary matrix and is a diagonal matrix. For l = 1, 2,...,L a,wehave = diag λ 1, λ 2,...,λ La, 0, 0,...,0 = diag a, 0 2.14 u,l = H u,l = λ l u,l. 2.15 Left multiplying oth sides of the previous equation y H / λ l, we get H H u,l = H H u,l = λ l H u,l. 2.16 We deduce that u a,l = H u,l. 2.17 Similarly, for l = 1, 2,...,L a,wehave a u a,l = H u a,l = λ l u a,l. 2.18

8 2 Canonical Correlation Analysis Left multiplying oth sides of the previous expression y / λ l, we get H u a,l = H u a,l = λ l u a,l, 2.19 from which we find that u,l = u a,l. 2.20 Relations 2.17 and 2.20 showhowthefirstl a eigenvectors of a = H and = H are related. From the aove, we also have = U 1/2 a U H a, 2.21 U = [ ] u,1 u,2 u,la = U a 1/2 a 2.22 is a semi-unitary matrix of size L L a.in2.21, we recognize the singular value decomposition SVD of. In fact, from a practical point of view, this is all what we need for CCA. 2.2 How CCA Works Let g and h e two complex-valued filters of lengths L a and L, respectively. Applying these filters to the two random vectors a and, we otain the two random signals: Z a = g H a, 2.23 Z = h H. 2.24 The Pearson correlation coefficient PCC etween Z a and Z is then [4] ρ g, h = E Z a Z E Z a 2 E Z 2 = g H a h gh a g h H h, 2.25

2.2 How CCA Works 9 the superscript is the complex conjugate. The ojective of CCA [1, 5, 6] is to find the two filters g and h in such a way that ρ g, h + ρ g, h, i.e., the real part of the PCC, is maximized. This is equivalent to maximizing g H a h + h H a g suject to the constraints g H a g = h H h = 1, i.e., max g H a h + h H a g s. t. g,h { g H a g = 1 h H h = 1. 2.26 The Lagrange function associated with the previous criterion is L g, h = g H a h + h H a g + μ g g H a g 1 + μ h h H h 1, 2.27 μ g = 0 and μ h = 0 are two real-valued Lagrange multipliers. Taking the gradient of L g, h with respect to g and h and equating the results to zero, we get the two equations: which can e rewritten as a h + μ g a g = 0, 2.28 a g + μ h h = 0, 2.29 g = 1 μ g 1 a ah, 2.30 h = 1 μ h 1 ag. 2.31 Left multiplying oth sides of 2.28 and 2.29 yg H and h H, respectively, and using the constraints, we easily find that As a result, μ g = gh a h g H a g = g H a h, 2.32 μ h = hh a g h H h = h H a g. 2.33 ρ g, h 2 = μ g μ h 2.34 and ρ g, h must e a real-valued numer. From all previous equations, we deduce that

10 2 Canonical Correlation Analysis [ 1/2 [ a 1/2 a 1 a 1/2 a ρ g, h 2 I La ] 1/2 a g = 0, 2.35 a 1 a a 1/2 ρ g, h 2 I L ] 1/2 h = 0, 2.36 or, using the notation from the previous section, [ a ρ g, h 2 ] I La 1/2 a g = 0, 2.37 [ ρ g, h 2 ] 1/2 I L h = 0, 2.38 we recognize the studied eigenvalue prolem. From the results of Sect. 2.1, it is clear that the solution to our optimization prolem in 2.26 is g 1 = 1/2 a u a,1, 2.39 h 1 = 1/2 u,1, 2.40 which are the first canonical filters. They lead to the first canonical correlation: and to the first canonical variates: ρ g 1, h 1 = λ 1 2.41 Z a,1 = g1 H a = ua,1 H 1/2 a a, 2.42 Z,1 = h1 H = u,1 H 1/2. 2.43 The second canonical filters are otained from the criterion: g max g H a h + h H a g H a g = 1 h s. t. H h = 1 g,h g H a g 1 = 0 h H h 1 = 0. 2.44 It is not hard to see that the optimization of 2.44 gives the second canonical filters: the second canonical correlation: g 2 = 1/2 a u a,2, 2.45 h 2 = 1/2 u,2, 2.46 ρ g 2, h 2 = λ 2, 2.47

2.2 How CCA Works 11 and the second canonical variates: Z a,2 = g2 H a = ua,2 H 1/2 a a, 2.48 Z,2 = h2 H = u,2 H 1/2. 2.49 Oviously, following the same procedure, we can derive all the L a canonical modes l = 1, 2,...,L a. The lth canonical filters: g l = 1/2 a u a,l, 2.50 h l = 1/2 u,l. 2.51 It is important to notice that instead of g l and h l, we can use the filters ς gl g l and ς hl h l, ς gl = 0 and ς hl = 0 are aritrary real numers, since the canonical correlations will not e affected. The lth canonical correlation: The lth canonical variates: ρ g l, h l = λ l. 2.52 Z a,l = gl H a = ua,l H 1/2 a a, 2.53 Z,l = hl H = u,l H 1/2. 2.54 Considering all the modes of the canonical filters, they can e comined in a matrix form as G = [ ] g 1 g 2 g La = 1/2 a U a, 2.55 H = [ ] h 1 h 2 h La = 1/2 U. 2.56 Then, it is easy to check that G H a G = H H H = I La and G H a H = H H a G = 1/2 a. It can also e verified that H 1/2 a G H = 1 a 1 a.inthesame way, we can comine all the modes of the canonical variates:

12 2 Canonical Correlation Analysis z a = [ ] Z a,1 Z a,2 Z a,la = G H a, 2.57 z = [ ] Z,1 Z,2 Z,La = H H. 2.58 Then, it can e verified that E z a za H = E z z H = ILa and E z a z H = E z za H = 1/2 a. Let us now riefly discuss the two particular cases: L a = L = 1 and L a = 1, L > 1. When L a = L = 1, the two random vectors a and ecome the two random variales A and B. In this case, CCA simplifies to the classical PCC etween Z A = A and Z B = B, i.e., ρ AB = E AB E A 2 E 2.59 B 2. In the second case L a = 1, L > 1, we only have one canonical mode. We find that the canonical filters are G = 1 2.60 h = 1/2 u, 2.61 u = 1/2 φ A 2.62 is the unnormalized eigenvector corresponding to the only nonzero eigenvalue: λ = φh A 1 φ A, 2.63 φ A of the rank-1 matrix H = 1/2 φ A φa H 1/2 /φ A, with φ A = E A and φ A = E A 2. As a result, the canonical correlation is and the canonical variates are φa H ρ G, h = h φa h H h = λ 2.64

2.2 How CCA Works 13 Z A = A, 2.65 Z = φa H 1. 2.66 2.3 The Singular Case Now, assume that the covariance matrix a is singular ut the covariance matrix is still nonsingular. 1 In this case, it is clear that CCA as explained in the previous section cannot work since the existence of the inverse of a is required. One way to circumvent this prolem is derived next. Let us assume that rank a = rank a = P < L a. We can diagonalize a as Q H a Q =, 2.67 is a unitary matrix and Q = [ q 1 q 2 q La ] = diag λ 1, λ 2,...,λ L a 2.68 2.69 is a diagonal matrix. The orthonormal vectors q 1, q 2,...,q La are the eigenvectors corresponding, respectively, to the eigenvalues λ 1, λ 2,...,λ L a of the matrix a, λ 1 λ 2 λ P > 0 and λ P+1 = λ P+2 = =λ L a = 0. Let Q = [ T ], 2.70 the L a P matrix T contains the eigenvectors corresponding to the nonzero eigenvalues of a and the L a L a P matrix contains the eigenvectors corresponding to the null eigenvalues of a. It can e verified that I La = TT H + H. 2.71 Notice that TT H and H are two orthogonal projection matrices of rank P and L a P, respectively. Hence, TT H is the orthogonal projector onto the signal suspace all the energy of the signal is concentrated or the range of a, and H is the orthogonal projector onto the null suspace of a.using2.71, we can write the random vector a of length L a as 1 The same approach discussed in this section can e applied when is singular or when oth a and are singular.

14 2 Canonical Correlation Analysis a = QQ H a = TT H a = Tã, 2.72 ã = T H a 2.73 is the transformed random signal vector of length P. Therefore, instead of working with the pair of random vectors a and as we did in Sect. 2.2, we propose to handle the pair of random vectors ã and since now the covariance matrix of ã denoted ã = T H a T has full rank. As a result, CCA with P different canonical modes can e performed y simply replacing in previous derivations a y ã and a y ã = T H a. References 1. H. Hotelling, Relations etween two sets of variales. Biometrika 28, 321 377 1936 2. D.V. Ouellette, Schur complements and statistics. Linear Alger. Appl. 36, 187 295 1981 3. G.H. Golu, C.F. Van Loan, Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore, Maryland, 1996 4. J. Benesty, J. Chen, Y. Huang, On the importance of the Pearson correlation coefficient in noise reduction. IEEE Trans. Audio Speech Lang. Process. 16, 757 765 2008 5. J.R. Kettenring, Canonical analysis of several sets of variales. Biometrika 58, 433 451 1971 6. T.W. Anderson, Asymptotic theory for canonical correlation analysis. J. Multivar. Anal. 79, 1 29 1999

http://www.springer.com/978-3-319-67019-5