Robust canonical correlation analysis: a predictive approach.

Size: px

Start display at page:

Download "Robust canonical correlation analysis: a predictive approach."

Elvin Kennedy
5 years ago
Views:

1 Robust canonical correlation analysis: a predictive approach. Nadia L. Kudraszow a, Ricardo A. Maronna b a CONICET and University of La Plata, Argentina b University of La Plata, Argentina Abstract We present a method for robust Canonical Correlation Analysis based on a prediction approach. The robust canonical coordinates are obtained using robust estimators for the multivariate linear model. Two different methods are proposed to robustly estimate the canonical correlations. The performance of the proposed method is compared with those of the classical approach and of other robust estimators of Canonical Correlations Analysis through real and simulated data sets. Keywords: Robust methods; MM-estimator; Multivariate linear model. 1. Introduction Canonical correlation analysis (CCA) is a standard tool, introduced by Hotelling (1936), to detect and quantify the linear association between two sets of variables. Let x = (x 1,...,x p ) and y = (y 1,...,y q ) be random vectors with joint distribution H(y, x). Henceforth Cov(x) and Corr(y, x) will respectively denote the covariance matrix of the random vector x and the correlation of the random variables x and y. We will use the following notation for the expectation vectors and covariance matrices: E(x) = µ x and E(y) = µ y, Cov(x) = V, Cov(y) = W and Cov(y,x) = C and we will denote the joint covariance matrix z = (y,x ) by ( ) W C Cov(z) = C. (1.1) V The aim of CCA is to find vectors t R q and v R p such that the correlation between the linear combinations t y and v x is maximized. More generally, addresses: nkudraszow@mate.unlp.edu.ar (Nadia L. Kudraszow), rmaronna@retina.ar (Ricardo A. Maronna) Preprint submitted to Elsevier June 1, 2011

2 canonical analysis aims at finding K = rank(c) pairs of linear combinations such that each successive pair maximizes the correlation under the restriction of being uncorrelated with the previous pairs, i.e., we are looking for a set of vectors {(t k,v k ) } K k=1 such that (t k,v k ) = arg max (t,v ) {Corr(t y,v x)} (1.2) where (t,v ) in (1.2) satisfy for k = 2,...,K Corr(t jy,t y) = Corr(v jx,v x) = 0, j = 1,...,k 1. (1.3) The vectors {t k } K 1 and {v k } K 1 are called the canonical coordinates of y and x respectively and (t 1,v 1) is the first pair of canonical coordinates. For k = 1,...,K, the linear combinations t k y and v kx are called the k-th pair of canonical variates and their respective correlation c k = Corr(t ky,v kx) is called the k-th canonical correlation of H(y, x). The criterion (1.2) is invariant with respect to the scale of the linear combinations, and therefore there are infinite possible solutions. We can transform the optimization problem in CCA to one with unique solution (up to a postfactor, see Appendix A3 of Seber (1984)) assuming that Cov(y) and Cov(x) are nonsingular and standardizing all the linear combinations so that Var(t ky) = Var(v kx) = 1 k = 1,...,K. (1.4) It is well known (see for instance Seber (1984), Chapter 5) that the canonical coordinates {t k } K k=1 and {v k} K k=1, solutions of the problem (1.2) subject to (1.3) and (1.4), are respectively the eigenvectors corresponding to the eigenvalues c 2 1 c 2 K > 0 of the matrices W 1 C V 1 C y V 1 CW 1 C. (1.5) Both of the above matrices have the same eigenvalues c 2 k (k = 1,...,K) which are the squared canonical correlations. CCA is a very common procedure, included in most textbooks of multivariate statistics. The traditional approach to CCA is to consider it as a method of dimension reduction (see Das and Sen (1998) for a review). Another approach to CCA is based on the concept of prediction. In this case -in which x and y do not play symmetrical roles- we are interested in using linear combinations of x to predict y. For example, Brillinger (1975) was interested in find linear combinations u = Ax and w = By, where A and B each have r (1 r K) linearly independent rows satisfying AVA = I r and BWB = I r, such that u and w are close to each other. It can be shown that the distance measure E[ u w 2 ] is minimized when u and w are the canonical variates. Other interesting derivation of a similar nature was proposed by Yohai and García Ben (1980). 2

3 The classic method to estimate the canonical coordinates and correlations is based on computing the eigenvectors and eigenvalues of the matrices in (1.5) where W, V and C are replaced by their sample versions. It is well known that the sample covariances are not resistant to outliers, Romanazzi (1992) proved that the classic canonical coordinates and correlations are also sensitive to outlying observations. Karnel (1991) proposed a robust CCA using an M- estimator of multivariate location and scatter to estimate the covariances in (1.5) and then proceeding as in the classical approach; the disadvantage of this method is that the robustness properties of the M-estimators are poor in higher dimensions (for further details see Maronna et al. (2006), page 186). Instead, Croux and Dehon (2002) used the minimum covariance determinant (MCD, proposed by Rousseeuw (1985)), which has high breakdown point, to estimate the same covariances. Taskinen et al. (2006) stated properties for CCA based on robust estimators of the covariance matrix. Filzmoser et al. (2000) derived a robust method for obtaining the first canonical variates using robust alternating regressions (RAR) and following the approach suggested by Wold (1966). Later Branco et al. (2005) extended the method in Filzmoser et al. (2000) and proposed a robust method for obtaining all the canonical variates using RAR. CCA is employed in methods of prediction for multivariate regression such as Curds and Whey (C&W) introduced by Breiman and Friedman (1997) and Reduced Rank Regression (Izenman, 1975). Developing robust versions of these methods requires replacing the least squares (LS) estimator by a robust multivariate regression estimator, and replacing classical by robust CCA. Since we are particularly interested in the application of CCA to these situations, we are going to describe an approach to CCA based on the concept of prediction in multivariate regression, and then develop robust versions thereof. The paper is organized as follows. In Section 2 we describe an approach to CCA based on the concept of prediction and in Section 3 we propose a robust version thereof. In Section 4 we present the results of a simulation study, and an example with real data is given in Section 5. Section 6 summarizes our conclusions. 2. CCA as a prediction tool In this Section we describe a method to derive the canonical coordinates and correlations from a regression setting. Let A be the matrix of coefficients of the linear regression of y with respect to x, i.e. A = V 1 C. Let ŷ = µ y +A (x µ x ). Then, ŷ is the best linear predictor (in the L 2 sense) of y with respect to x and we may write y = ŷ + u where u is uncorrelated with x. We denote Σ = Cov(u). Note that given t k, (1.2) is equivalent to v k = argmin v {E(t k(y µ y ) v (x µ x )) 2 }. (2.1) 3

4 An easy calculation shows that and note that the minimum value of (2.1) is v k = V 1 C t = At k (2.2) E(t k(y µ y ) t ka (x µ x )) 2 = E(t k(y ŷ)) 2 = E(t ku) 2 = t kσt k. Then the vectors {t k } K k=1 satisfy where t k = argmin t t Σt (2.3) t jwt k = δ jk for all j = 1,...,k, (2.4) andthereforethevectors{t k } K k=1 canbeobtainedbycomputingtheeigenvectors of Σt = λwt, (2.5) such that related to the eigenvalues λ 1 λ K > 0. Combining (2.2) and (1.4), we have that where Using the equality t Wt = 1, (2.6) v k = At k /b k (2.7) b 2 k = (At k ) V(At k ). (2.8) W = Σ+A VA, it is easily seen that: (i) v j Vv k = 0 for all j k, (ii) the canonical coordinates t k and v k obtained in (2.3)-(2.4) and (2.7) are the same that we obtained with the classical method, i.e. computing the eigenvectors of the matrices in (1.5), and (iii) the canonical correlations satisfy c 2 k = 1 λ k for all k = 1,...,K, where λ k are the eigenvalues of the eigen-problem. In this approach we can estimate the canonical coordinates and correlations by replacing the covariance matrices in (2.5), (2.6) and (2.8) by their respective sample covariances and replacing the matrix of regression coefficient A, in (2.7) and (2.8), by its least squares estimator. As we mentioned before, the sample covariances and the least squares estimator of the matrix of regression coefficients are not resistant to outliers and, in consequence, the canonical coordinates and correlations obtained with the proposed method are also sensitive to those observations. Note that in this approach to the CCA, the vectors x and y do not play symmetrical roles. 4

5 3. A robust predictive approach to CCA To obtain the robust canonical coordinates of the approach to CCA presented in (2.5) - (2.8), we propose to estimate Σ and A through robust regression estimators, and V and W through robust estimators of multivariate location and scatter. Any robust estimator for the multivariate linear model (resp. location and scatter) can be used to estimate Σ and A (resp. V and W). It is desirable that the robust estimators to be employed have high breakdown point and high Gaussian efficiency. Examples of this kind of estimators could be τ-estimators, proposed by García Ben et al. (2006) for the multivariate linear model and Lopuhaä (1991) for multivariate location and scatter, constrained M (CM) estimators, proposed by Kent and Tyler (1997) for multivariate location and scatter and extended for the multivariate linear model by Bai et al. (2008), and MM-estimators, studied by Lopuhaä (1992), Tatsuoka and Tyler (2000) and Salibián-Barrera et al. (2006) for the multivariate location and scatter model and recently extended by Kudraszow and Maronna (2011) for the multivariate linear model. In this article we choose to use MM-estimators because we consider that they possess certain comparative advantages, but as said, this is not the only possible choice. In the following Section we will give a brief description of these estimators MM-estimates for the multivariate linear model Analogously to MM-estimators of regression, MM-estimators for the multivariate linear model are based on two loss functions called bounded ρ-functions. A bounded ρ-function, will denote a function ρ(u) which is a continuous nondecreasing function of u such that ρ(0) = 0, sup u ρ(u) = 1, and ρ(u) is increasing for nonnegative u such that ρ(u) < 1. A popular family of bounded ρ-functions is given by the bisquare functions: ρ B (u;c) = 1 (1 (u/c) 2 ) 3 I( u c), (3.1) where I( ) is the indicator function and c > 0. Before defining MM-estimators for the multivariate linear model, we will define a robust estimator of scale: given a sample of size n, v = (v 1,...,v n ), an M-estimator of scale s(v) is the value of s that is solution of 1 n ( vi ) ρ 0 = b, (3.2) n s i=1 where ρ 0 is a bounded ρ-function and b (0,1), or s = 0 if (v i = 0) n(1 b), where is the symbol for cardinality. The MM-estimators for multivariate linear model can now be defined as follows. 5

6 Let z i = (y i,x i ) with 1 i n, where y i = (y i1,...,y iq ) R q, x i = (1,x i1,...,x ip ) R p+1 and A be the matrix of coefficients (the first vector column of A is the intercept). Let (Ã, Σ) be an initial estimator of (A,Σ), with high breakdown point and such that Σ = 1, where Σ is the determinant of Σ. Then, compute the scale M-estimator ˆσ := s(d(ã, Σ)) of the Mahalanobis norms of the residuals d i (Ã, Σ) = (y i Ã x i ) Σ 1 (y i Ã x i ) 1/2. Letρ 1 beanotherboundedρ-function such that ρ 1 ρ 0 and let S q be the set of all positive definite symmetric q q matrices. Let (Â, Γ) be any local minimum of S(B,Γ) = n i=1 ( ) di (B,Γ) ρ 1 ˆσ in R p q S q, which satisfies S(Â, Γ) S(Â, Σ) and Γ = 1. Then the MMestimator of A is defined as Â, and the respective estimator of Σ is Σ = ˆσ 2 Γ. (3.3) To obtain an MM-estimator which simultaneously has high breakdown point and high efficiency under normal errors it suffices to choose ρ 0 and ρ 1 appropriately. If ρ 0 and ρ 1 are ρ-functions of the bisquare family defined in (3.1), ρ 0 (u) = ρ B (u;c 0 ) and ρ 1 (u) = ρ B (u;c 1 ), and the breakdown point and efficiency with respect to the LS estimator can be chosen with suitable values for c 0 and c 1 (see Table 1 and 2 of Kudraszow and Maronna (2011)). The proposed robust approach to CCA requires computing a robust multivariate regression estimator; if one wants robust versions of methods such as C&W or Reduced Rank Regression, such estimator must be computed anyway, and therefore our robust CCA is obtained almost for free Robust Correlations Let (y i,x i ), with i = 1,...,n, be a random sample of real variables. To estimate the correlation in a robust manner we propose two different methods: Method 1: Let s be a robust scale. Consider the regression estimators β 0 and β defined by ( β 0, β) = argmin β s(y i β 0 βx i ), and the location estimator µ defined by µ = argmin µ s(y i µ). Then, Croux and Dehon (2003) define the squared correlation between y and x as R 2 CD(y,x) = 1 s(y i β 0 βx i ) 2 s(y i µ) 2. (3.4) 6

7 Efficiency a Table 1: Values of the correction factor a appearing in (3.5) for different choices of efficiency of the MM-estimator using the bisquare function. Let S be the scale functional associated to the robust scale s. If y = β 0 + β 1 x+u, where the error u is independent of x and follows a distribution F σ (u) = F 0 (u/σ), where σ is the residual scale parameter and F 0 is symmetric and unimodal around zero with a strictly positive density, it can be shown that RCD 2 is Fisher consistent for Corr2 (y,x) under the following assumptions: - S is Fisher consistent at the error model distribution, i.e. S(F σ ) = σ, - the regression functionals associated to β 0, β and µ are Fisher consistent and - the distribution of the response variable is a location-scale transformation of the errors distribution F 0. An example of this situation is when the joint distribution of (y,x) is a bivariate normal distribution (for further details see Section 3 of Croux and Dehon (2003)). Method 2: Consider the MM-estimator of linear regression β = ( β 0, β) defined by Yohai (1987) which coincides with the MM-estimator defined in Section 3.1 taking y i = y i and x i = (1,x i ) and let ˆσ be an M-estimator of the residual scale. Let ( ) yi ŷ i w i = W, ˆσ where W(u) = ψ B (u;c 1 )/u, ψ B (u;c) is the derivative of ρ B (u;c) with respect to u, called bisquare ψ-function and ŷ i = β x i, and ( ) 1 n ŷ w = n i=1 w w i ŷ i. i Then the estimator proposed by Renaud and Victoria-Feser (2010), is defined by ) 2 n i=1 w i(ŷ i ŷ w R 2 RV = i=1 n i=1 w i( ŷ i ŷ w ) 2 +a n i=1 w i(y i ŷ i ) 2, (3.5) where a is a correction factor that ensures consistency of the estimator Ra 2 for the desired efficiency of the estimator β. Assuming y i = β x i +u i with normal errors u i, with 0 mean and variance σ 2, independent of x i, and the consistency of the M-estimator of the residual scale ˆσ, it can be proved (see Theorem 1 in Renaud and Victoria-Feser (2010)) that RRV 2 is consistent for Corr2 (y,x) when a = E(W(u/σ))/E(uψ B (u/σ,c 1 )/σ). Table 1 shows the values of a for different efficiencies of the MM-estimators computed with bisquare ρ-functions. Then, to estimate the squared canonical correlation, c 2 k, we have to take y i = t k y i and x i = v k x i and apply (3.4) or (3.5). 7

8 4. Simulation Study To compare the performances of the estimators of the canonical coordinates and correlations proposed in this paper we performed a simulation study The model We chose y i R 4 and x i R 5, i = 1,...,n, satisfying y i = A x i +u i with matrix of coefficients A = The errors u i are generated from a N 4 (0,I) distribution (where N p (µ,σ) denotes the p-variate normal distribution with mean vector µ and covariance matrix Σ) and the predictors x i from a N 5 (0,I) distribution. Due to the equivariance of MM-estimators, these choices do not represent loss of generality for the proposed estimators. It follows from the model proposed above that the true values of the squared canonical correlations are c 2 1 = , c2 2 = and c2 3 = c 2 4 = 0.5. We generated 300 samples of size 100. We consider uncontaminated samples and samples that contain 10% of identical outliers of the form (x 0,y 0 ) with x 0 = (0,0,0,0,1) and y 0 = (0,0,0,m). We took a grid of values of m, starting at 0. The grid was chosen in order that all robust estimators attain the maximum values of their error measure. For each replication j (j = 1,...,300) we obtained the estimations of the canonical correlations, denoted by {ĉ (j) k }4 k=1, and the canonical coordinates of x that are represented by the columns of the matrix (j) Performance measures As a measure of performance for each canonical correlation, we choose the mean squared relative error MSE(ĉ k ) = j=1 (ĉ (j) k c k) 2 c 2 k for k = 1,...,4 (4.1) and for the canonical coordinates, we choose a measure based on prediction error: DE( ) = E(y yz 300 j )(y yz j ), j=1 8

9 where y z j is the best linear predictor of y based on z j = (j) x The reason for choosing DE( ) as a performance measure for estimating the canonical coordinates is that Yohai and García Ben (1980) proved that the solution of the problem of choosing a matrix R p K such that z = x minimizes E(y y z)(y y z) (4.2) among all the vectors of dimension K with the form z = x, is the matrix whose columns are the canonical coordinates of x. It is easy to check that for y = A x + u, where x R p and u R q have zero median, Cov(x) = I p and Cov(u) = I q, the determinant in (4.2) can be rewritten as I q +A (I p )A. (4.3) For the case where K = q = p 1, as in this simulation study, we have that = I p vpv p (4.4) where v p is the vector that together with the column vectors of form an orthonormal base of R p. It follows from (4.4) that (4.3) equals I q +A v p(a v p) = 1+tr(A v p(a v p) ) = 1+ A v p 2. (4.5) From (4.3) it can be seen that in this case, z = x also minimizes the expression E y y z 2 = Tr((y y z)(y y z) ) among all the z = x with R K p. Similarly, under the same conditions used to derive (4.5), we can prove that DE( ) = A v p (j) 2, (4.6) 300 wherev (j) p isthevectorthattogetherwiththecolumnsof (j)( (j) (j) ) 1/2 form an orthonormal base of R p. It must be recalled that the distributions of robust estimators under contamination are themselves heavy-tailed, and it is therefore prudent to evaluate their performance through robust measures (see Huber (1981), Sec. 1.4, p. 12, and Huber (1964), p.75). For this reason, we employed the measures TMSE(ĉ k ) and TDE( ) computed replacing the averages in (4.1) and (4.6) by averages with the 10% (upper) trimmed. j=1 9

10 Method Classic MM MCD PP-MCD PP-SPM RAR TDE( ) Table 2: Trimmed mean of the determinants of the estimation error matrices (T DE) obtained from uncontaminated samples, for each method The estimators We shall compare the behavior of the proposed robust estimators of CCA with that of the classic one based on the sample covariance matrices, and of the estimators included in the extensive simulations by Branco et al. (2005), which they call MCD, PP-MCD, PP-SPM and RAR (M and PP-M were not included in the simulation because they are based on M-estimators, which have low breakdown point). A detailed description of these estimators can be found in Branco et al. (2005). In this Section, MM will denote the proposed estimator of the canonical coordinates, described in Section 3. To compute the MM canonical coordinates we employed ρ-functions of the bisquare family and as initial estimator for the MM-estimator of multivariate regression we used an S-estimator (see Bilodeau and Duchesne (2000) and Van Aelst and Willems (2005) for the definition of S-estimators for the multivariate linear model). The constants for the bisquare functions (see last paragraph of Section 3.1) are chosen as c 0 = 4.10 and c 1 = 4.41, corresponding to an asymptotic efficiency of For the estimation of the correlations we proposed three estimators, which we call MM-S, MM-τ, MM-RV, and are defined as follows: MM-S and MM-τ compute the squared correlations using Method 1, choosing the robust scale s in (3.4) as an M-scale and a τ-scale, respectively. The τ-scale is defined by τ 2 (v) = (s 2 M (v)/n) n i=1 ρ 2( v i /s M (v)), where v = (v 1,...,v n ), ρ is a bounded ρ-function and s M is an M-estimator of scale. To compute the M-scale, for both estimators, we used a bisquare function ρ 0 (u) = ρ B (u/c 0 ) with c 0 = 1.56 (see table 1 in Kudraszow and Maronna (2011)). For the τ-scale, we used ρ 2 (u) = ρ B (u/c 2 ) with c 2 = 4.39 (which ensures asymptotic relative efficiency equal to 0.85). The canonical correlations for MM-RV were computed by (3.5) using c 1 = 3.43 and a = (see Table 1) Simulation results Wedealfirstwiththecaseofnocontamination. Tables2and3showthemeasures TDE and the square root of TMSE (RTMSE), which are the trimmed versions of the measures defined in (4.6) and (4.1), for the different methods compared in the simulation study. InTable2wecanseethat-asexpected-theclassicestimatorhasthesmallest value of TDE( ), and the values for MM, MCD, PP-SPM and RAR are close to those of the classical method. PP-MCD shows the largest value of TDE( ), 10

11 Method Classic MM-S MM-τ MM-RV MCD PP-MCD PP-SPM RAR RTMSE(ĉ 1) RTMSE(ĉ 2) RTMSE(ĉ 3) RTMSE(ĉ 4) Table 3: Square root of the trimmed mean squared error of each canonical correlation (RT MSE) obtained from uncontaminated samples, for each method. TDE MM Classic MCD PP MCD PP SPM RAR m Figure 1: Trimmed mean of the determinants of the error estimation matrices (T DE) for samples of size n = 100 with 10% contamination. which is much larger than that of the classical estimator. The MM method is the robust method with lowest value of TDE( ). Table3showsthesquarerootsofthetrimmedmeansquarederrors(RTMSE) for each of the estimators of the canonical correlations. Again, the best performance corresponds to the classic method. MM-τ outperforms all other robust estimators for all correlations, with values very close to those of the classical one. MCD has an acceptable overall performance, followed by MM-RV, MM-S and RAR. PP-MCD and PP-SPM have in general values of RTMSE remarkably larger than the other estimators. We now deal with the case of contaminated samples. Figures 1, 2, 3 and 4 show the values of the performance measures for the different methods, under 10% contamination. Tables 4 and 5 show de maximum values of these measures, corresponding to Figure 1 and Figures 2 (b) and 3, respectively. In Figure 1, which shows the values of the performance measure for the 11

12 TMSE(c 1 ) 6 x MM S MM τ MM RV Classic MCD PP MCD PP SPM RAR 1 (a) TMSE(c 1 ) m 1.2 x MM S MM τ MM RV MCD RAR 0.2 (b) m Figure 2: Trimmed mean squared errors (T MSE) of the estimation of the maximum canonical correlation, ĉ 1, for samples of size n = 100 with 10% contamination. (a) shows the results for all methods and (b) for the methods with the five smallest maximum TMSE. estimates of the canonical coordinates (T DE), we observe that the MM have the smallest maximum value of TDE. Moreover, for each value of m, the values of TDE corresponding to MM are smaller than those corresponding to the other robust methods. As we be expected, the values of TDE for the classic method increase with m attaining very large values. The maximum values of T DE for each robust method in Figure 1 are summarized in Table 4. 12

13 TMSE(c 2 ) x MM S MM τ MM RV Classic MCD RAR m Figure 3: Trimmed mean squared errors (T MSE) of the estimation of the second canonical correlation, ĉ 2, for samples of size n = 100 with 10% contamination, with PP-methods omitted for reasons of scale. Method MM MCD PP-MCD PP-SPM RAR maxtde( ) Table 4: Maximum TDEs of the robust estimators in Figure 1. In Figure 2 (a), corresponding to the TMSEs of the estimator of the first correlation,we can see that PP-MCD, PP-SPM and classic method have the largest maximums TMSE(ĉ 1 ). Figure 2 (b) shows the same values, with the classic estimator, PP-MCD and PP-SPM omitted to improve viewing. The respectivemaximumvaluesaregiveninthefirstrowoftable5. MM-SandMM- RV have similar performances, with the former slightly better. Both perform better than MM-τ. The maximum TMSE of RAR is only slightly larger than those of any of the MM s, but the curve shows substantially larger values for m > 4. MCD shows a sharply redescending behavior, but its maximum TMSE is the largest of this subset of estimators. Figure 3 shows the TMSEs corresponding to the estimation of the second canonical correlation; the maximum values for the robust estimators are given in the second row of Table 5. PP-MCD and PP-SPM were omitted from the figure for reasons of scale, their values of TMSE(ĉ 2 ) are even larger than those of the classic method. The values for the classic and RAR methods clearly increase with m. MM-τ by far shows the best performance, followed by MM-RV and MM-S. As in the former Figure, MCD is competitive with the other estimators 13

14 (a) TMSE(c 3 ) MM S MM τ MM RV Classic MCD PP MCD PP SPM RAR TMSE(c 4 ) MM S MM τ MM RV Classic MCD PP MCD RAR m (b) m Figure 4: Trimmed mean squared errors (TMSE) of the estimation of the (a) third canonical correlation, ĉ 3, and (b) fourth canonical correlation, ĉ 4, for samples of size n = 100 with 10% contamination. In (b) PP-SPM was omitted for reasons of scale. Method MM-S MM-τ MM-RV MCD PP-MCD PP-SPM RAR maxtmse(ĉ 1) maxtmse(ĉ 2) Table 5: Maximum TMSEs ( 10 4 ) of the robust estimators in Figures 2 (a) and 3. for some values of m; but its maximum TMSE is large. TheclassicandRARmethodsattainverylargevaluesofTMSE(ĉ 2 )forlarge m. MM-τ has by far the best overall performance, followed by MM-RV, MM-S and MCD. The second row of Table 5 shows the maximum TMSE(ĉ 2 ) s of the robust methods in Figure 3. It is seen that the smallest maximum TMSE(ĉ 2 ) is attained by the method MM-τ; and the largest one by RAR. Figure 4 shows the TMSE of the estimated canonical correlations ĉ 3 (a) and ĉ 4 (b). In the plot corresponding to the TMSE(ĉ 4 ), the results of PP- SPM were omitted since all are larger than For both correlations, MM-τ and the classic method have the best performances. This unexpected good behavior of the classic method occurs because for reasonable values of m the contamination introduced in the simulation affects only the estimation of the two largest canonical correlations (compare Figures 4 (a) and 4 (b) with the third and fourth rows of Table 3). 5. An example with real data In this section we analyze a dataset corresponding to electron-probe X-ray microanalysis of archeological glass vessels (Janssens et al., 1998). For each of n = 180 vessels we have a spectrum of 1920 frequencies and the contents of 13 chemical compounds. In order to limit the size of our dataset, we employed compounds, MgO, Al 2 O 3, SiO 2, P 2 O 5, Fe 2 O 3, BaO and PbO; and we chose 12 14

15 Method MM Classic MCD RMSPE RTMSPE Table 6: Squared root of the mean of the squared norm of prediction errors (RMSPE) and squared root of the trimmed mean of the squared norm of prediction errors (RTMSPE), computed by cross validation. equispaced frequencies between 100 and 400. This interval was chosen because the values of x ij are almost null for frequencies below 100 and above 400. We therefore have p = 12 and q = 7. We compute three estimates of the canonical coordinates: classic, MM and MCD. The predictive performances of the estimated canonical coordinates of these methods are compared through five-fold cross validation, employing two measures of prediction error, as follows 1. From the cross validation process we have five testing samples, associated to five learning samples. With each training sample we estimate the canonical coordinatesusingtheclassic, MMandMCDmethodswith(µ y,µ x)andcov(z), defined in(1.1), replaced respectively by the sample mean and covariance matrix and the MM- and MCD-estimators of multivariate location and scatter. 2. For each set of estimations obtained from a training sample we compute B defined as B = Ĉ ( V ) 1, where is the matrix of the estimated canonical coordinates and Ĉ and V are submatrices (see decomposition (1.1)) of the estimate of Cov(z). For the observations in the corresponding testing sample, we compute the squared norm of the predictive error e i = y i µ y B(x i µ x ). 3. Finally, let MSPE be the mean of the squared Euclidean norm of the prediction errors and TMSPE be the mean with the upper 10% of the squared norms trimmed off. Method MM-S MM-τ MM-RV Classic MCD ĉ ĉ ĉ ĉ ĉ ĉ ĉ Table 7: Absolute values of the estimated canonical correlations. Table6showsthesquaredrootsofthevaluesofMSPEandTMSPE(denoted by RMSPE and RTMSPE) obtained using the classic, MM and MCD methods 15

16 MM 4 MM (a) Classic (b) MCD Figure 5: QQ-plots of sorted norms of the prediction errors corresponding to the MM method vs sorted norms of the predictive errors corresponding to the classic method (a) and the MCD method (b). to estimate the canonical coordinates. The MM method is much better than the other methods for both RMSPE and RTMSPE and the classic method is better than the MCD. Figure 5 shows the QQ-plots of the sorted norms of the prediction errors corresponding to the MM method vs sorted norms of the predictive errors corresponding to the classic and MCD methods. It is seen that for a the majority of the data, the predictive errors corresponding to the MM method are smaller than those of classic and MCD methods. Table 7 shows the values of the canonical correlations estimated through the classic, MM and MCD methods. It is seen that the correlations estimated by the methods MM-S, -τ, -RV are smaller than those corresponding to the classic and MCD methods. 6. Conclusions We may summarize the results of the simulation as follows. For the estimation of the canonical coordinates, the MM approach outperforms its competitors in terms of both efficiency and robustness. MCD, PP- MCD, PP-SPM and RAR have similar behaviors, except that PP-MCD is rather inefficient. For the estimation of the canonical correlations, MM-τ shows the best global behavior with respect to efficiency, and both versions of PP show the worst, with RAR, MCD yield intermediate acceptable results, followed by the other two 16

17 versions of MM. As respects robustness, MM-τ shows the best global behavior, with MM-τ being the best followed by MM-RV and MM-S; and both versions of PP the worst, while MCD and RAR yield intermediate values. Therefore the simulations suggest MM-τ as the estimator of choice when we are interested in a predictive approach. The example with real data also shows a clear superiority of MM over the classic and MCD approaches. Acknowledgements: This research was partially supported by grants PIP 216 from CONICET and PICT from ANPCyT, Argentina. References [1] Bai, Z., Chen, X.R., Wu, Y., On constrained M-estimation and its recursive analog in multivariate linear regression models. Statistica Sinica, 18, pp [2] Bilodeau, M., Duchesne, P., Robust estimation of the SUR model. Canad. J. Statist., 28, pp [3] Branco, J.A., Croux, C., Filzmoser, P., Oliveira, M.R., Robust canonical correlations: a comparative study. Computational Statistics, Vol. 20(2), pp [4] Breiman, L., Friedman, H., Predicting Multivariate Responses in Multiple Linear Regression. J. R. Statistics Soc B, 59, No. 1, pp [5] Brillinger, D.R., Time Series: Data Analysis and Theory. Holt, Rinehart, and Winston, New York. [6] Croux, C., Dehon, C., Analyse canonique base sur des estimateurs robustes de la matrice de covariance. La Revue de Statistique Applicque, 2, pp [7] Croux, C., Dehon, C., Estimators of the multiple correlation coefficient: Local robustness and confidence intervals. Statistical Papers, 44, pp [8] Das, S., Sen, P.K., Canonical correlations. In: P. Armitage and. T. Colton (eds.), Encyclopedia of Biostatistics, Vol. 1, Wiley, New York, pp [9] Filzmoser, P., Dehon, C., Croux, C., Outlier resistant estimators for canonical correlation analysis. J. G. Betlehem and P. G. M. van der Heijden (eds), COMPSTAT: Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, pp [10] García Ben, M., Martinez, E., Yohai, V.J., Robust estimation for the multivariate linear model based on a τ-scale. Journal of Multivariate Analysis, 97, pp

18 [11] Hotelling, H., Relations between two sets of variates. Biometrika, 28, pp [12] Huber, P.J., Robust Estimation of a Location Parameter, The Annals of Mathematical Statistics, Vol. 35, No. 1, pp [13] Huber, P.J., Robust Statistics. Wiley, New York. [14] Izenman, A.J., 1975, Reduced-rank regression for the multivariate linear model. J. Multiv. Anal., 5, pp [15] Janssens, K., Deraedt, I., Freddy, A., Veekman, J., Composition of th century archeological glass vessels excavated in Antwerp, Belgium. Mikrochimica Acta, 15 (Suppl.), pp [16] Karnel, G., Robust canonical correlation and correspondence analysis. In: The Frontiers of Statistical Scientific and Industrial Applications, (Volume II of the porceedings of ICOSCO-I, The First International Conference on Statistical Computing). American Sciences Press, Strassbourg, pp [17] Kent, J.T., Tyler, D.E., Constrained M-estimation for multivariate location and scatter. The Annals of Statistics, 24, pp [18] Kudraszow, N., Maronna, R., Estimates of MM type for the multivariate linear model. Journal of Multivariate Analysis, in press. DOI: /j.jmva [19] Lopuhaä, H.P., Multivariate τ-estimators for location and scatter. Canadian Journal of Statistics, 19 pp [20] Lopuhaä, H., Highly efficient estimators of multivariate location with high breakdown point. The Annals of Statistics, 20, pp [21] Maronna, R.A., Martin, R.D., Yohai, V.J., Robust Statistics: Theory and Methods. John Wiley and Sons, New York. [22] Renaud, O., Victoria-Feser, M.-P., A Robust Coefficient Of Determination For Regression. Journal of Statistical Planning and Inference, Vol. 140, No. 7, pp [23] Romanazzi, M., Influence in canonical correlation analysis. Psychometrika, 57, pp [24] Rousseeuw, P., Multivariate estimators with high breakdown point. W. Grossman, G. Pflug, I. Vincza, W. Wertz (Eds.), Mathematical Statistics and its Applications, vol. B, Reidel, Dordrecht, The Netherlands, pp

19 [25] Salibián-Barrera, M., Van Aelst, S., and Willems, G., PCA based on Multivariate MM-estimators with Fast and Robust Bootstrap. Journal of the American Statistical Association, 101, pp [26] Seber, G.A.F. (1984). Multivariate Observations. John Wiley & Sons, Inc. New York. [27] Taskinen, S., Croux, C., Kankainen, A., Ollila, E., Oja, H., Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices. Journal of Multivariate Analysis, Vol. 97, No. 2, pp [28] Tatsuoka, K.S., Tyler, D.E., The Uniqueness of S and M-Functionals Under Non-Elliptical Distributions. The Annals of Statistics, 28, pp [29] Van Aelst, S. Willems, G., Multivariate regression S-estimators for robust estimation and inference. Statist. Sinica 15, pp [30] Wold, H., Nonlinear estimation by iterative least square procedures. F. N. David (ed.), A Festschrift for J. Neyman, Wiley, New York, pp [31] Yohai, V., High Breakdown-point and high efficiency robust estimates for regression. The Annals of Statistics, Vol. 15, No. 2., pp [32] Yohai, V., García Ben, M., Canonical variables as optimal predictors. The Annals of Statistics, Vol. 8, No. 4, pp

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,