Chapter 2 Canonical Correlation Analysis

Chapter 2 Canonical Correlation Analysis Canonical correlation analysis CCA, which is a multivariate analysis method, tries to quantify the amount of linear relationships etween two sets of random variales, leading to different modes of maximum correlation [1]. In this chapter, we explain how CCA works from a signal processing perspective. 2.1 Preliminaries Let a and e two zero-mean complex-valued circular random vectors of lengths L a and L, respectively,, without loss of generality, it is assumed that L a L. Let us define the longer random vector: c = [ a T T ] T, 2.1 the superscript T denotes transpose of a vector or a matrix. The covariance matrix of c is then = E cc H [ ] a = a, 2.2 a E denotes mathematical expectation, the superscript H is the conjugatetranspose operator, a = E aa H is the covariance matrix of a, = E H is the covariance matrix of, a = E a H is the cross-covariance matrix etween a and, and a = H a. It is assumed that rank a = L a and, unless stated otherwise, a and are assumed to have full rank. The Authors 2018 J. Benesty and I. Cohen, Canonical Correlation Analysis in Speech Enhancement, SpringerBriefs in Electrical and Computer Engineering, DOI 10.1007/978-3-319-67020-1_2 5

6 2 Canonical Correlation Analysis The Schur complement of a in is defined as [2] and the Schur complement of in is /a = a 1 a a 2.3 a/ = a a 1 a. 2.4 Clearly, oth matrices /a and a/ are nonsingular. An equivalent and more interesting way to express the two previous expressions is and 1/2 /a 1/2 1/2 a a/ 1/2 = I L 1/2 a 1 = I L 1/2 a 1/2 a = I L H a a 1/2 1/2 a a 1/2 = I L 2.5 a = I La 1/2 a a 1 = I La 1/2 a a 1/2 a 1/2 a 1/2 a 1/2 a = I La H = I La a, 2.6 I L and I La are the identity matrices of sizes L L and L a L a, respectively, = 1/2 a 1/2 a, = H, and a = H.From2.5 and 2.6, it can e oserved that the eigenvalues of and a are always smaller than 1 and, of course, positive or null. Furthermore, the matrices and a have the same nonzero eigenvalues [2]. Indeed, since det stands for determinant, it follows that det I La a = det IL, 2.7 λ L det λi La a = λ L a det λi L, 2.8 which shows that a and have the same nonzero characteristic roots. Oviously, a is nonsingular while is singular. Using the well-known eigenvalue decomposition [3], the Hermitian matrix a can e diagonalized as U H a au a = a, 2.9

2.1 Preliminaries 7 U a = [ u a,1 u a,2 u a,la ] 2.10 is a unitary matrix, i.e., Ua H U a = U a Ua H = I L a, and a = diag λ 1, λ 2,...,λ La 2.11 is a diagonal matrix. The orthonormal vectors u a,1, u a,2,...,u a,la are the eigenvectors corresponding, respectively, to the eigenvalues λ 1, λ 2,...,λ La of the matrix a, 1 > λ 1 λ 2 λ La > 0. In the same way, the Hermitian matrix can e diagonalized as U H U =, 2.12 U = [ u,1 u,2 u,l ] 2.13 is a unitary matrix and is a diagonal matrix. For l = 1, 2,...,L a,wehave = diag λ 1, λ 2,...,λ La, 0, 0,...,0 = diag a, 0 2.14 u,l = H u,l = λ l u,l. 2.15 Left multiplying oth sides of the previous equation y H / λ l, we get H H u,l = H H u,l = λ l H u,l. 2.16 We deduce that u a,l = H u,l. 2.17 Similarly, for l = 1, 2,...,L a,wehave a u a,l = H u a,l = λ l u a,l. 2.18

8 2 Canonical Correlation Analysis Left multiplying oth sides of the previous expression y / λ l, we get H u a,l = H u a,l = λ l u a,l, 2.19 from which we find that u,l = u a,l. 2.20 Relations 2.17 and 2.20 showhowthefirstl a eigenvectors of a = H and = H are related. From the aove, we also have = U 1/2 a U H a, 2.21 U = [ ] u,1 u,2 u,la = U a 1/2 a 2.22 is a semi-unitary matrix of size L L a.in2.21, we recognize the singular value decomposition SVD of. In fact, from a practical point of view, this is all what we need for CCA. 2.2 How CCA Works Let g and h e two complex-valued filters of lengths L a and L, respectively. Applying these filters to the two random vectors a and, we otain the two random signals: Z a = g H a, 2.23 Z = h H. 2.24 The Pearson correlation coefficient PCC etween Z a and Z is then [4] ρ g, h = E Z a Z E Z a 2 E Z 2 = g H a h gh a g h H h, 2.25

2.2 How CCA Works 9 the superscript is the complex conjugate. The ojective of CCA [1, 5, 6] is to find the two filters g and h in such a way that ρ g, h + ρ g, h, i.e., the real part of the PCC, is maximized. This is equivalent to maximizing g H a h + h H a g suject to the constraints g H a g = h H h = 1, i.e., max g H a h + h H a g s. t. g,h { g H a g = 1 h H h = 1. 2.26 The Lagrange function associated with the previous criterion is L g, h = g H a h + h H a g + μ g g H a g 1 + μ h h H h 1, 2.27 μ g = 0 and μ h = 0 are two real-valued Lagrange multipliers. Taking the gradient of L g, h with respect to g and h and equating the results to zero, we get the two equations: which can e rewritten as a h + μ g a g = 0, 2.28 a g + μ h h = 0, 2.29 g = 1 μ g 1 a ah, 2.30 h = 1 μ h 1 ag. 2.31 Left multiplying oth sides of 2.28 and 2.29 yg H and h H, respectively, and using the constraints, we easily find that As a result, μ g = gh a h g H a g = g H a h, 2.32 μ h = hh a g h H h = h H a g. 2.33 ρ g, h 2 = μ g μ h 2.34 and ρ g, h must e a real-valued numer. From all previous equations, we deduce that

10 2 Canonical Correlation Analysis [ 1/2 [ a 1/2 a 1 a 1/2 a ρ g, h 2 I La ] 1/2 a g = 0, 2.35 a 1 a a 1/2 ρ g, h 2 I L ] 1/2 h = 0, 2.36 or, using the notation from the previous section, [ a ρ g, h 2 ] I La 1/2 a g = 0, 2.37 [ ρ g, h 2 ] 1/2 I L h = 0, 2.38 we recognize the studied eigenvalue prolem. From the results of Sect. 2.1, it is clear that the solution to our optimization prolem in 2.26 is g 1 = 1/2 a u a,1, 2.39 h 1 = 1/2 u,1, 2.40 which are the first canonical filters. They lead to the first canonical correlation: and to the first canonical variates: ρ g 1, h 1 = λ 1 2.41 Z a,1 = g1 H a = ua,1 H 1/2 a a, 2.42 Z,1 = h1 H = u,1 H 1/2. 2.43 The second canonical filters are otained from the criterion: g max g H a h + h H a g H a g = 1 h s. t. H h = 1 g,h g H a g 1 = 0 h H h 1 = 0. 2.44 It is not hard to see that the optimization of 2.44 gives the second canonical filters: the second canonical correlation: g 2 = 1/2 a u a,2, 2.45 h 2 = 1/2 u,2, 2.46 ρ g 2, h 2 = λ 2, 2.47

2.2 How CCA Works 11 and the second canonical variates: Z a,2 = g2 H a = ua,2 H 1/2 a a, 2.48 Z,2 = h2 H = u,2 H 1/2. 2.49 Oviously, following the same procedure, we can derive all the L a canonical modes l = 1, 2,...,L a. The lth canonical filters: g l = 1/2 a u a,l, 2.50 h l = 1/2 u,l. 2.51 It is important to notice that instead of g l and h l, we can use the filters ς gl g l and ς hl h l, ς gl = 0 and ς hl = 0 are aritrary real numers, since the canonical correlations will not e affected. The lth canonical correlation: The lth canonical variates: ρ g l, h l = λ l. 2.52 Z a,l = gl H a = ua,l H 1/2 a a, 2.53 Z,l = hl H = u,l H 1/2. 2.54 Considering all the modes of the canonical filters, they can e comined in a matrix form as G = [ ] g 1 g 2 g La = 1/2 a U a, 2.55 H = [ ] h 1 h 2 h La = 1/2 U. 2.56 Then, it is easy to check that G H a G = H H H = I La and G H a H = H H a G = 1/2 a. It can also e verified that H 1/2 a G H = 1 a 1 a.inthesame way, we can comine all the modes of the canonical variates:

12 2 Canonical Correlation Analysis z a = [ ] Z a,1 Z a,2 Z a,la = G H a, 2.57 z = [ ] Z,1 Z,2 Z,La = H H. 2.58 Then, it can e verified that E z a za H = E z z H = ILa and E z a z H = E z za H = 1/2 a. Let us now riefly discuss the two particular cases: L a = L = 1 and L a = 1, L > 1. When L a = L = 1, the two random vectors a and ecome the two random variales A and B. In this case, CCA simplifies to the classical PCC etween Z A = A and Z B = B, i.e., ρ AB = E AB E A 2 E 2.59 B 2. In the second case L a = 1, L > 1, we only have one canonical mode. We find that the canonical filters are G = 1 2.60 h = 1/2 u, 2.61 u = 1/2 φ A 2.62 is the unnormalized eigenvector corresponding to the only nonzero eigenvalue: λ = φh A 1 φ A, 2.63 φ A of the rank-1 matrix H = 1/2 φ A φa H 1/2 /φ A, with φ A = E A and φ A = E A 2. As a result, the canonical correlation is and the canonical variates are φa H ρ G, h = h φa h H h = λ 2.64

2.2 How CCA Works 13 Z A = A, 2.65 Z = φa H 1. 2.66 2.3 The Singular Case Now, assume that the covariance matrix a is singular ut the covariance matrix is still nonsingular. 1 In this case, it is clear that CCA as explained in the previous section cannot work since the existence of the inverse of a is required. One way to circumvent this prolem is derived next. Let us assume that rank a = rank a = P < L a. We can diagonalize a as Q H a Q =, 2.67 is a unitary matrix and Q = [ q 1 q 2 q La ] = diag λ 1, λ 2,...,λ L a 2.68 2.69 is a diagonal matrix. The orthonormal vectors q 1, q 2,...,q La are the eigenvectors corresponding, respectively, to the eigenvalues λ 1, λ 2,...,λ L a of the matrix a, λ 1 λ 2 λ P > 0 and λ P+1 = λ P+2 = =λ L a = 0. Let Q = [ T ], 2.70 the L a P matrix T contains the eigenvectors corresponding to the nonzero eigenvalues of a and the L a L a P matrix contains the eigenvectors corresponding to the null eigenvalues of a. It can e verified that I La = TT H + H. 2.71 Notice that TT H and H are two orthogonal projection matrices of rank P and L a P, respectively. Hence, TT H is the orthogonal projector onto the signal suspace all the energy of the signal is concentrated or the range of a, and H is the orthogonal projector onto the null suspace of a.using2.71, we can write the random vector a of length L a as 1 The same approach discussed in this section can e applied when is singular or when oth a and are singular.

14 2 Canonical Correlation Analysis a = QQ H a = TT H a = Tã, 2.72 ã = T H a 2.73 is the transformed random signal vector of length P. Therefore, instead of working with the pair of random vectors a and as we did in Sect. 2.2, we propose to handle the pair of random vectors ã and since now the covariance matrix of ã denoted ã = T H a T has full rank. As a result, CCA with P different canonical modes can e performed y simply replacing in previous derivations a y ã and a y ã = T H a. References 1. H. Hotelling, Relations etween two sets of variales. Biometrika 28, 321 377 1936 2. D.V. Ouellette, Schur complements and statistics. Linear Alger. Appl. 36, 187 295 1981 3. G.H. Golu, C.F. Van Loan, Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore, Maryland, 1996 4. J. Benesty, J. Chen, Y. Huang, On the importance of the Pearson correlation coefficient in noise reduction. IEEE Trans. Audio Speech Lang. Process. 16, 757 765 2008 5. J.R. Kettenring, Canonical analysis of several sets of variales. Biometrika 58, 433 451 1971 6. T.W. Anderson, Asymptotic theory for canonical correlation analysis. J. Multivar. Anal. 79, 1 29 1999

http://www.springer.com/978-3-319-67019-5