Generalizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence

Size: px

Start display at page:

Download "Generalizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence"

Sybil Shields
6 years ago
Views:

1 Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence arxiv: v5 [math.st] 25 Feb 2018 Ze Jin, David S. Matteson February 27, 2018 Abstract We propose three new measures of mutual dependence between multiple random vectors. Each measure is zero if and only if the random vectors are mutually independent. The first generalizes distance covariance from pairwise dependence to mutual dependence, while the other two measures are sums of squared distance covariances. The proposed measures share similar properties and asymptotic distributions with distance covariance, and capture non-linear and non-monotone mutual dependence between the random vectors. Inspired by complete and incomplete V-statistics, we define empirical and simplified empirical measures as a trade-off between the complexity and statistical power when testing mutual independence. The implementation of corresponding tests is demonstrated by both simulation results and real data examples. Key words: characteristic functions; distance covariance; multivariate analysis; mutual independence; V-statistics Research support from an NSF Award (DMS ), a Xerox PARC Faculty Research Award, and Cornell University Atkinson Center for a Sustainable Future (AVF-2017). 1

2 1 Introduction Let X = (X 1,..., X d ) be a set of variables where each component X j, j = 1,..., d is a continuous random vector, and let X = {X k = (X1 k,..., Xd k ) : k = 1,..., n} be an i.i.d. sample from F X, the joint distribution of X. We are interested in testing the hypothesis H 0 : X 1,..., X d are mutually independent, H A : X 1,..., X d are dependent, which has many applications, including independent component analysis (Matteson and Tsay, 2017), graphical models (Fan et al., 2015), naive Bayes classifiers (Tibshirani et al., 2002), etc. This problem has been studied under different settings and assumptions, including pairwise (d = 2) and mutual (d 2) independence, univariate (X 1,..., X d R 1 ) and multivariate (X 1 R p 1,..., X d R p d ) components, and more. Specifically, we focus on the general case that X 1,..., X d are not assumed jointly normal. The most extensively studied case is pairwise independence with univariate components (X 1, X 2 R 1 ): Rank correlation is considered as a non-parametric counterpart to Pearson s product-moment correlation (Pearson, 1895), including Kendall s τ (Kendall, 1938), Spearman s ρ (Spearman, 1904), etc. Bergsma and Dassios (2014) proposed a test based on an extension of Kendall s τ, testing an equivalent condition to H 0. Additionally, Hoeffding (1948) proposed a non-parametric test based on marginal and joint distribution functions, testing a necessary condition to investigate H 0. For pairwise independence with multivariate components (X 1 R p 1, X 2 R p 2 ): Székely et al. (2007), Székely and Rizzo (2009) proposed a test based on distance covariance with fixed p 1, p 2 and n, testing an equivalent condition to H 0. Further, Székely and Rizzo (2013a) proposed a t-test based on a modified distance covariance for the setting in which n is finite and p 1, p 2, testing an equivalent condition to H 0 as well. For mutual independence with univariate components (X 1,..., X d R 1 ): One natural 2

3 way to extend the pairwise rank correlation to multiple components is to collect the rank correlations between all pairs of components, and examine the norm (L 2, L ) of this collection. Leung and Drton (2015) proposed a test based on the L 2 norm with n, d, and d/n γ (0, ), and Han and Liu (2014) proposed a test based on the L norm with n, d, and d/n γ [0, ]. Each are testing a necessary condition to H 0, in general. For mutual independence with multivariate components (X 1 R p 1,..., X d R p d ): This challenging scenario has not been well studied. Yao et al. (2016) proposed a test based on distance covariance between all pairs of components with n, d, testing a necessary condition to H 0. Inspired by distance covariance in Székely et al. (2007), we propose a new test based on measures of mutual dependence with fixed d, p 1,..., p d and n in this paper, testing an equivalent condition to H 0. All computational complexities in this paper make no reference to the dimensions d, p 1,..., p d, as they are treated as constants. Our measures of mutual dependence involve V-statistics, and are 0 if and only if mutual independence holds. They belong to energy statistics (Székely and Rizzo, 2013b), and share many statistical properties with distance covariance. Besides, Pfister et al. (2016) proposed d-variable Hilbert Schmidt independence criterion (dhsic) under the same setting, which originates from HSIC (Gretton et al., 2005), and also is 0 if and only if mutual independence holds. Although dhsic involves V-statistics as well, they pursue kernel methods and overcome the computation bottleneck by resampling and Gamma approximation, while we take advantage of characteristic functions and resort to incomplete V-statistics. The weakness of testing mutual independence by a necessary condition, all pairwise independencies motivates our work on measures of mutual dependence, which is demonstrated by examples in section 5: If we directly test mutual independence based on the measures of mutual dependence proposed in this paper, we successfully detect mutual dependence. Alternatively, if we check all pairwise independencies based on distance covariance, we fail to detect any pairwise dependence, and mistakenly conclude that mutual independence holds 3

4 probably because the mutual effect averages out when we narrow down to a pair. The rest of this paper is organized as follows. In section 2, we give a brief overview of distance covariance. In section 3, we generalize distance covariance to complete measure of mutual dependence, with its properties and asymptotic distributions derived. In section 4, we propose asymmetric and symmetric measures of mutual dependence, defined as sums of squared distance covariances. We present synthetic and real data analysis in section 5, followed by simulation results in section 6 1. Finally, section 7 is the summary of our work. All proofs have been moved to appendix. The following notations will be used throughout this paper. Let (,,..., ) denote a concatenation of (vector) components into a vector. Let t = (t 1,..., t d ), t 0 = (t 0 1,..., t 0 d ), X = (X 1,..., X d ) R p where t j, t 0 j, X j R p j, such that p j is the marginal dimension, j = 1,..., d, and p = d j=1 p j is the total dimension. The assumed X under H 0 is denoted by X = ( X 1,..., X d ), where X j D = X j, j = 1,..., d, X1,..., X d are mutually independent, and X, X are independent. Let X, X be independent copies of X, i.e., X, X i.i.d., X F X, and X, X be independent copies of X, i.e., X, X, X i.i.d. F X. Let the weighted L 2 norm w of complex-valued function η(t) be defined by η(t) 2 w = R p η(t) 2 w(t) dt where η(t) 2 = η(t)η(t), η(t) is the complex conjugate of η(t), and w(t) is any positive weight function for which the integral exists. Given the i.i.d. sample X from F X, let X j = {X k j : k = 1,..., n} denote the corresponding i.i.d. sample from F Xj, j = 1,..., d, such that X = {X 1,..., X d }. Denote the joint characteristic functions of X and X as φ X (t) = E[e i t,x ] and φ X(t) = d j=1 E[ei t j,x j ], and denote the empirical versions of φ X (t) and φ X(t) as φ n X (t) = 1 n n k=1 ei t,xk and φ ñ (t) = X d j=1 ( 1 n n k=1 ei t j,xj k ). 1 An accompanying R package EDMeasure (Jin et al., 2018) is available on CRAN. 4

5 2 Distance Covariance Székely et al. (2007) proposed distance covariance to capture non-linear and non-monotone pairwise dependence between two random vectors (X 1 R p 1, X 2 R p 2 ). X 1, X 2 are pairwise independent if and only if φ X (t) = φ X1 (t 1 )φ X2 (t 2 ), t, which is equivalent to R p φ X (t) φ X(t) 2 w(t) dt = 0, w(t) > 0 if the integral exists. A class of the weight functions w 0 (t, m) = (K(p 1 ; m)k(p 2 ; m) t 1 p 1+m t 2 p 2+m ) 1 make the integral a finite and meaningful quantity composed of m-th moments according to Lemma 1 in Székely and Rizzo (2005), where K(q, m) = 2πq/2 Γ(1 m/2), and Γ is the gamma function. m2 m Γ((q+m)/2) The non-negative distance covariance V(X) is defined by V 2 (X) = φ X (t) φ X(t) 2 w 0 = R p φ X (t) φ X(t) 2 w 0 (t) dt, where w 0 (t) = (K p1 K p2 t 1 p 1+1 t 2 p 2+1 ) 1, (1) with m = 1 and K q = K(q, 1), while any following result can be generalized to 0 < m < 2. If E X <, then V(X) [0, ), and V(X) = 0 if and only if X 1, X 2 are pairwise independent. The non-negative empirical distance covariance V n (X) is defined by V 2 n(x) = φ n X (t) φ ñ X (t) 2 w 0 = R p φ n X (t) φñ X (t) 2 w 0 (t) dt. Calculating V 2 n(x) via the symmetry of Euclidian distances has the time complexity O(n 2 ). Some asymptotic properties of V n (X) are derived. If E X <, then (i) V n (X) a.s. n V(X). (ii) Under H 0, nv 2 n(x) D n ζ(t) 2 w 0 where ζ(t) is a complex-valued Gaussian process with mean zero and covariance function R(t, t 0 ) = [φ X1 (t 1 t 0 1) φ X1 (t 1 )φ X1 (t 0 1)][φ X2 (t 2 t 0 2) φ X2 (t 2 )φ X2 (t 0 2)]. (iii) Under H A, nv 2 n(x) a.s. n. 3 Complete Measure of Mutual Dependence Generalizing the idea of distance covariance, we propose complete measure of mutual dependence to capture non-linear and non-monotone mutual dependence between multiple random 5

6 vectors (X 1 R p 1,..., X d R p d ). X 1,..., X d are mutually independent if and only if φ X (t) = φ X1 (t 1 )... φ Xd (t d ) = φ X(t), t, which is equivalent to φ R p X (t) φ X(t) 2 w(t) dt = 0, w(t) > 0 if the integral exists. We put all components together instead of separating them, and choose the weight function w 1 (t) = (K p t p+1 ) 1. (2) Definition 1. The complete measure of mutual dependence Q(X) is defined by Q(X) = φ X (t) φ X(t) 2 w 1 = φ X (t) φ X(t) 2 w 1 (t) dt. R p We can show an equivalence to mutual independence based on Q(X) according to Lemma 1 in Székely and Rizzo (2005). Theorem 1. If E X <, then Q(X) [0, ), and Q(X) = 0 if and only if X 1,..., X d are mutually independent. In addition, Q(X) has an interpretation as expectations Q(X) = E X X + E X X E X X E X X. It is straightforward to estimate Q(X) by replacing the characteristic functions with the empirical characteristic functions from the sample. Definition 2. The empirical complete measure of mutual dependence Q n (X) is defined by Q n (X) = φ n X(t) φ ñ X (t) 2 w 1 = φ n X(t) φ ñ R p X (t) 2 w 1 (t) dt. 6

7 Lemma 1. Q n (X) has an interpretation as complete V-statistics Q n (X) = 2 n d+1 1 n 2d n k,l 1,...,l d =1 n k 1,...,k d,l 1,...,l d =1 X k (X l 1 1,..., X l d d ) + 1 n 2 n X k X l k,l=1 (X k 1 1,..., X k d d ) (Xl 1 1,..., X l d d ), whose naive implementation has the time complexity O(n 2d ). In view of the definition of distance covariance, it may seem natural to define the measure using the weight function w 2 (t) = (K p1... K pd t 1 p t d p d+1 ) 1, (3) which equals w 0 (t) when d = 2. Given the weight function w 2 (t), we can define the squared distance covariance of mutual dependence U(X) = φ X (t) φ X(t) 2 w 2 and its empirical counterpart U n (X) = φ n X (t) φñ X (t) 2 w 2, which equal V 2 (X) and Vn(X) 2 when d = 2. The naive implementation of U n (X) has the time complexity O(n d+1 ). The reason to favor w 1 (t) instead of w 2 (t) is a trade-off between the moment condition and time complexity. We often cannot afford the time complexity of Q n (X) or U n (X), and have to simplify them through incomplete V-statistics. An incomplete V-statistic is obtained by sampling the terms of a complete V-statistic, where the summation extends over only a subset of the tuple of indices. To simplify by replacing complete V-statistics with incomplete V-statistics, U n (X) requires the additional d-th moment condition E X 1... X d <, while Q n (X) does not require any other condition in addition to the first moment condition E X <. Thus, we can reduce the complexity of Q n (X) to O(n 2 ) with a weaker condition, which makes Q(X) and Q n (X) from w 1 (t) a more general solution. Moreover, we define the 7

8 simplified empirical version of φ X(t) as φ n X (t) = 1 n n k=1 e i d j=1 t j,x k+j 1 j = 1 n n k=1 e i t,(xk 1,...,Xk+d 1 d ), in order to substitute φ ñ (t) for simplification, where Xn+k X j is interpreted as Xj k for k > 0. Definition 3. The simplified empirical complete measure of mutual dependence Q n(x) is defined by Q n(x) = φ n X(t) φ n X (t) 2 w 1 = φ n X(t) φ n X (t) 2 w 1 (t) dt. R p Lemma 2. Q n(x) has an interpretation as incomplete V-statistics Q n(x) = 2 n 2 n 1 n 2 k,l=1 X k (X l 1,..., X l+d 1 d ) + 1 n 2 n k,l=1 n X k X l k,l=1 (X1 k,..., X k+d 1 d ) (X1, l..., X l+d 1 d ), whose naive implementation has the time complexity O(n 2 ). Using a similar derivation to Theorem 2 and 5 of Székely et al. (2007), some asymptotic distributions of Q n (X), Q n(x) are obtained as follows. Theorem 2. If E X <, then Q n (X) a.s. Q(X) and n Q n(x) a.s. Q(X). n Theorem 3. If E X <, then under H 0, we have nq n (X) D n ζ(t) 2 w 1 and nq D n(x) n ζ (t) 2 w 1, where ζ(t), ζ (t) are complex-valued Gaussian processes with mean zero and covariance func- 8

9 tions d d d R(t, t 0 ) = φ Xj (t j t 0 j) + (d 1) φ Xj (t j )φ Xj (t 0 j ) φ Xl (t l )φ Xl (t 0 l ), R (t, t 0 ) = 2R(t, t 0 ). j=1 j=1 j=1 φ Xj (t j t 0 j) l j Under H A, we have nq n (X) a.s. and n nq n(x) a.s. n. Therefore, a mutual independence test can be proposed based on the weak convergence of nq n (X), nq n(x) in Theorem 3. Since the asymptotic distributions of nq n (X), nq n(x) depend on F X, a permutation procedure is used to approximate them in practice. 4 Asymmetric and Symmetric Measures of Mutual Dependence As an alternative, we now propose the asymmetric and symmetric measures of mutual dependence to capture mutual dependence via aggregating pairwise dependencies. The subset of components on the right of X c is denoted by X c + = (X c+1,..., X d ), with t c + = (t c+1,..., t d ), c = 0, 1,..., d 1. The subset of components except X c is denoted by X c = (X 1,..., X c 1, X c +), with t c = (t 1,..., t c 1, t c +), c = 1,..., d 1. We denote pairwise independence by. The collection of pairwise independencies implied by mutual independence includes one versus others on the right {X 1 X 1 +, X 2 X 2 +,..., X d 1 X d }, (4) 9

10 one versus all the others {X 1 X 1, X 2 X 2,..., X d X d }, (5) and many others, e.g., (X 1, X 2 ) X 2 +. In fact, the number of pairwise independencies resulting from mutual independence is at least 2 d 1 1, which grows exponentially with the number of components d. Therefore, we cannot test mutual independence simply by checking all pairwise independencies even with moderate d. Fortunately, we have two options to test only a small subset of all pairwise independencies to fulfill the task. The first one is that H 0 holds if and only if (4) holds, which can be verified via the sequential decomposition of distribution functions. This option is asymmetric and not unique, having d! feasible subsets with respect to different orders of X 1,..., X d. The second one is that H 0 holds if and only if (5) holds, which can be verified via the stepwise decomposition of distribution functions and the fact that X j X j implies X j X j +. This option is symmetric and unique, having only one feasible subset. To shed light on why these two options are necessary and sufficient conditions to mutual independence, we present the following inequality that the mutual dependence can be bounded by a sum of several pairwise dependencies as d d 1 φ X (t) φ Xj (t j ) φ (Xc,Xc + )((t c, t c +)) φ Xc (t c )φ Xc + (t c +) 2. j=1 c=1 In consideration of these two options, we test a set of pairwise independencies in place of mutual independence, where we use V 2 (X) to test pairwise independence. Definition 4. The asymmetric and symmetric measures of mutual dependence R(X), S(X) 10

11 are defined by d 1 R(X) = V 2 ((X c, X c +)) and S(X) = c=1 d V 2 ((X c, X c )). c=1 We can show an equivalence to mutual independence based on R(X), S(X) according to Theorem 3 of Székely et al. (2007). Theorem 4. If E X <, then R(X), S(X) [0, ), and R(X), S(X) = 0 if and only if X 1,..., X d are mutually independent. It is straightforward to estimate R(X), S(X) by replacing the characteristic functions with the empirical characteristic functions from the sample. Definition 5. The empirical asymmetric and symmetric measures of mutual dependence R n (X), S n (X) are defined by d 1 R n (X) = Vn((X 2 c, X c +)) and S n (X) = c=1 d Vn((X 2 c, X c )). c=1 The implementations of R n (X), S n (X) have the time complexity O(n 2 ). Using a similar derivation to Theorem 2 and 5 of Székely et al. (2007), some asymptotic properties of R n (X), S n (X) are obtained as follows. Theorem 5. If E X <, then R n (X) a.s. R(X) and S n(x) a.s. S(X). n n Theorem 6. If E X <, then under H 0, we have nr n (X) d 1 D n j=1 ζj R ((t j, t j +)) 2 w 0 and ns n (X) 11 D n d ζj S ((t j, t j )) 2 w 0, j=1

12 where ζ R j ((t j, t j +)), ζ S j ((t j, t j )) are complex-valued Gaussian processes corresponding to the limiting distributions of nv 2 n((x j, X j +)), nv 2 n((x j, X j )). Under H A, we have nr n (X) a.s. and ns n(x) a.s. n n. It is surprising to find that Vn((X 2 c, X c +)), c = 1,..., d 1 are mutually independent asymptotically, and Vn((X 2 c, X c )), c = 1,..., d are mutually independent asymptotically as well, which is a crucial discovery behind Theorem 6. Alternatively, we can plug in Q(X) instead of V 2 (X) in Definition 4 and Q n (X) instead of Vn(X) 2 in Definition 5, and define the asymmetric and symmetric measures J (X), I(X) accordingly, which equal Q(X), Q n (X) when d = 2. The naive implementations of J n (X), I n (X) have the time complexity O(n 4 ). Similarly, we can replace Q n (X) with Q n(x) to simplify them, and define the simplified empirical asymmetric and symmetric measures Jn (X), In(X), reducing their complexities to O(n 2 ) without any other condition except the first moment condition E X <. Through the same derivations, we can show that J n (X), J n (X), I n (X), I n(x) have similar convergences as R n (X), S n (X) in Theorem 5 and 6. 5 Illustrative Examples We start with two examples comparing different methods to show the value of our mutual independence tests. In practice, people usually check all pairwise dependencies to test mutual independence, due to the lack of reliable and universal mutual independence tests. It is very likely to miss the complicated mutual dependence structure, and make unsound decisions in corresponding applications assuming that mutual independence holds. 12

13 5.1 Synthetic Data We define a triplet of random vectors (X, Y, Z) on R q R q R q, where X, Y N (0, I q ), W Exp(1/ 2), the first element of Z is Z 1 = sign(x 1 Y 1 )W and the remaining q 1 elements are Z 2:q N (0, I q 1 ), and X, Y, W, Z 2:q are mutually independent. Clearly, (X, Y, Z) is a pairwise independent but mutually dependent triplet. An i.i.d. sample of (X, Y, Z) is randomly generated with sample size n = 500 and dimension q = 5. On the one hand, we test the null hypothesis H 0 : X, Y, Z are mutually independent using proposed measures R n, S n, Q n, J n, I n. On the other hand, we test the null hypotheses H (1) 0 : X Y, H (2) 0 : Y Z, and H (3) 0 : X Z using distance covariance V 2 n. An adaptive permutation size B = 210 is used for all tests. As expected, mutual dependence is successfully captured, as the p-values of mutual independence tests are (Q n), (J n ), 0 (I n), (R n ) and 0 (S n ). Meanwhile, the p-values of pairwise independence tests are (X, Y ), (Y, Z), and (X, Z). According to the Bonferroni correction for multiple tests among all the pairs, the significance level should be adjusted as α/3 for pairwise tests. As a result, no signal of pairwise dependence is detected, and we cannot reject mutual independence. 5.2 Financial Data We collect the annual Fama/French 5 factors in the past 52 years between 1964 and In particular, we are interested in whether mutual dependence among three factors, X = Mkt-RF (excess return on the market), Y = SMB (small minus big), and Z = RF (risk-free return) exists, where annual returns are considered as nearly independent observations. Both histograms and pair plots of X, Y, Z are depicted in Figure 1. For one, we apply a single mutual independence test H 0 : X, Y, Z are mutually indepen- 2 Data at library.html. 13

14 dent. For another, we apply three pairwise independence tests H (1) 0 : X Y, H (2) 0 : Y Z, and H (3) 0 : X Z. An adaptive permutation size B = 296 is used for all tests. The p-values of mutual independence tests are (Q n), (J n ), (I n), (R n ) and (S n ), indicating that mutual dependence is successfully captured. In the meanwhile, the p-values of pairwise independence tests using distance covariance V 2 n are (X, Y ), (Y, Z) and (X, Z). Similarly, the significance level should be adjusted as α/3 according to the Bonferroni correction, and thus we cannot reject mutual independence, since no signal of pairwise dependence is detected. 6 Simulation Studies In this section, we evaluate the finite sample performance of proposed measures Q n, R n, S n, J n, I n, Q n, Jn, In by performing simulations similar to Székely et al. (2007), and compare them to benchmark measures Vn 2 (Székely et al., 2007) and HL τ, HL ρ (Han and Liu, 2014). We also include permutation tests based on finite-sample extensions of HL τ, HL ρ, denoted by HL τ n, HL ρ n. We test the null hypothesis H 0 with significance level α = 0.1 and examine the empirical size and power of each measure. In each scenario, we run 1,000 repetitions with the adaptive permutation size B = /n where n is the sample size, for all empirical measures that require a permutation procedure to approximate their asymptotic distributions, i.e., Q n, R n, S n, J n, I n, Q n, Jn, In, Vn. 2 In the following two examples, we fix d = 2 and change n from 25 to 500, and compare Q n, R n, S n, J n, I n, Q n, Jn, In to Vn. 2 Example 1 (pairwise multivariate normal). X 1, X 2 R 5, (X 1, X 2 ) T N 10 (0, Σ) where Σ ii = 1. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.1, i j. See results in Table 1 and 2. 14

15 Example 2 (pairwise multivariate non-normal). X 1, X 2 R 5, (Y 1, Y 2 ) T N 10 (0, Σ) where Σ ii = 1. X 1 = ln(y 2 1 ), X 2 = ln(y 2 2 ). Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.4, i j. See results in Table 3 and 4. For both example 1 and 2, the empirical size of all measures is close to α = 0.1. The empirical power of Q n, R n, S n, J n, I n is almost the same as that of V 2 n, while the empirical power of Q n, J n, I n is lower than that of V 2 n, which makes sense because we trade-off testing power and time complexity for simplified measures. In the following two examples, we fix d = 3 and change n from 25 to 500, and compare Q n, R n, S n, J n, I n to Q n, J n, I n. Example 3 (mutual multivariate normal). X 1, X 2, X 3 R 5, (X 1, X 2, X 3 ) T N 15 (0, Σ) where Σ ii = 1. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.1, i j. See results in Table 5 and 6. Example 4 (mutual multivariate non-normal). X 1, X 2, X 3 R 5. (Y 1, Y 2, Y 3 ) T N 15 (0, Σ) where Σ ii = 1. X k = ln(y 2 k ), k = 1, 2, 3. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.4, i j. See results in Table 7 and 8. For both example 3 and 4, the empirical size of all measures is close to α = 0.1. The empirical power of Q n, R n, S n, J n, I n is almost the same, the empirical power of Q n, J n, I n is almost the same, while the empirical power of Q n, J n, I n is lower than that of Q n, R n, S n, J n, I n, which makes sense since we trade-off testing power and time complexity for simplified measures. In the last example, we change d from 5 to 50 and fix n = 100, and compare R n, S n, Q n, J n, I n to HL τ, HL ρ, HL τ n, HL ρ n. Example 5 (mutual univariate normal high-dimensional). X 1,..., X d R 1. (X 1,..., X d ) T N d (0, Σ) where Σ ii = 1. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.1, i j. See results in Table 9 and

16 The empirical size of HL τ, HL ρ is much lower than α = 0.1 and too conservative, while that of other measures is fairly close to α = 0.1. The reason is probably that the convergence to asymptotic distributions of HL τ, HL ρ requires larger sample size n and number of components d. The measures R n, S n have the highest empirical power, and outperform the simplified measures Q n, Jn, In. The empirical power of simplified measures is similar to or even lower than that of benchmark measures when d = 5. However, the empirical power of simplified measures converges much faster than that of benchmark measures as d grows. Moreover, Q n shows significant advantage over Jn, In. The reason is probably that Q n is based on truly mutual dependence while Jn, In is based on pairwise dependencies, and large d compared to n introduces much more noise to Jn, In because their summation structures, which makes them more difficult to detect mutual dependence. The asymptotic analysis of our measures only allows small d compared to n, while our measures work well with large d compared to n in example 5. However, this success relies on the underlying dependence structure, which is dense since each component is dependent on any other component. In contrast, if the dependence structure is sparse as each component is dependent on only a few of other components, then all measures are likely to fail. 7 Conclusion We propose three measures of mutual dependence for random vectors based on the equivalence to mutual independence through characteristic functions, following the idea of distance covariance in Székely et al. (2007). When we select the weight function for the complete measure, we trade off between moment condition and time complexity. Then we simplify it by replacing complete V-statistics by incomplete V-statistics, as a trade-off between testing power and time complexity. These two trade-offs make the simplified complete measure both effective and efficient. 16

17 The asymptotic distributions of our measures depend on the underlying distribution F X. Thus, the corresponding tests are not distribution-free, and we use a permutation procedure to approximate the asymptotic distributions in practice. We illustrate the value of our measures through both synthetic and financial data examples, where mutual independence tests based on our measures successfully capture the mutual dependence, while the alternative checking all pairwise independencies fails and mistakenly leads to the conclusion that mutual independence holds. Our measures achieve competitive or even better results than the benchmark measures in simulations with various examples. Although we do not allow large d compared to n in asymptotic analysis, our measures work well in a large d example since the dependence structure is dense. Acknowledgements We are grateful to Stanislav Volgushev for helpful comments on a preliminary draft of this paper. We also thank an anonymous referee for helpful line-by-line comments. References W. Bergsma and A. Dassios. A consistent test of independence based on a sign covariance related to kendall s tau. Bernoulli, 20(2): , J. Fan, Y. Feng, and L. Xia. A conditional dependence measure with applications to undirected graphical models. arxiv preprint arxiv: , A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring statistical dependence with hilbert-schmidt norms. In ALT, volume 16, pages Springer, F. Han and H. Liu. Distribution-free tests of independence with applications to testing more structures. arxiv preprint arxiv: , W. Hoeffding. A non-parametric test of independence. The annals of mathematical statistics, pages , Z. Jin, S. Yao, D. S. Matteson, and X. Shao. EDMeasure: Energy-Based Dependence Measures, R package version

18 M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81 93, D. Leung and M. Drton. Testing independence in high dimensions with sums of squares of rank correlations. arxiv preprint arxiv: , D. S. Matteson and R. S. Tsay. Independent component analysis via distance covariance. Journal of the American Statistical Association, 112(518): , R. v. Mises. On the asymptotic distribution of differentiable statistical functions. The annals of mathematical statistics, 18(3): , K. Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58: , N. Pfister, P. Bühlmann, B. Schölkopf, and J. Peters. Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), C. Spearman. The proof and measurement of association between two things. The American journal of psychology, 15(1):72 101, G. J. Székely and M. L. Rizzo. A new test for multivariate normality. Journal of Multivariate Analysis, 93(1):58 80, G. J. Székely and M. L. Rizzo. Brownian distance covariance. The annals of applied statistics, 3(4): , G. J. Székely and M. L. Rizzo. The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, 117: , 2013a. G. J. Székely and M. L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8): , 2013b. G. J. Székely, M. L. Rizzo, and N. K. Bakirov. Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6): , R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 99(10): , S. Yao, X. Zhang, and X. Shao. Testing mutual independence in high dimension via distance covariance. arxiv preprint arxiv: ,

19 Table 1: empirical size (α = 0.1) in Example 1 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In Table 2: empirical power (α = 0.1) in Example 1 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In Table 3: empirical size (α = 0.1) in Example 2 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In Table 4: empirical power (α = 0.1) in Example 2 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In

20 Table 5: empirical size (α = 0.1) in Example 3 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In Table 6: empirical power (α = 0.1) in Example 3 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In Table 7: empirical size (α = 0.1) in Example 4 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In Table 8: empirical power (α = 0.1) in Example 4 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In

21 Table 9: empirical size (α = 0.1) in Example 5 with 1000 repetitions and n = 100. d HL τ HL ρ HL τ n HL ρ n Q n Q n R n S n J n Jn I n In Table 10: empirical power (α = 0.1) in Example 5 with 1000 repetitions and n = 100. d HL τ HL ρ HL τ n HL ρ n Q n Q n R n S n J n Jn I n In Appendix Proofs of Theorem 1, 2, 3, 4, 5, 6, and Lemma 1, 2. Theorem 1 Proof. (i) 0 Q(X) <. (ii) Q(X) = 0 X 1,..., X d are mutually independent. (iii) Q(X) = E X X + E X X E X X E X X. Since w 1 (t) is a positive weight function, X 1,..., X d are mutually independent if and only if Q(X) = φ R p X (t) φ X(t) 2 w 1 (t) dt is equal to zero. By the boundedness property of characteristic functions and Fubini s theorem, we have 21

22 Mkt RF SMB RF Density Density Density Mkt RF vs. SMB Mkt RF vs. RF SMB vs. RF SMB RF RF Mkt RF Mkt RF SMB Figure 1: Three annual Fama/French factors between 1964 and 2015: Mkt-RF (excess return on the market), SMB (small minus big) and RF (risk-free return). The correlations are corr(mkt-rf, SMB) = 0.238, corr(mkt-rf, RF) = , and corr(smb, RF) = Red lines in the histograms are estimated kernel densities. φ X (t) φ X(t) 2 = φ X (t)φ X (t) + φ X(t)φ X(t) φ X (t)φ X(t) φ X(t)φ X (t) = [E i t,x ]E[e i t,x ] + E[e i t, X ]E[e i t, X ] E[e i t,x ]E[e i t, X ] E[e i t, X ]E[e i t,x ] = E[e i t,x X ] + E[e i t, X X ] E[e i t,x X ] E[e i t, X X ] = E(cos t, X X ) + E(cos t, X X ) + E(cos t, X X ) + E(cos t, X X ) = E(1 cos t, X X ) + E(1 cos t, X X ) E(1 cos t, X X ) E(1 cos t, X X ). Since E( X ) < implies E( X ) <, we have E( X + X ) <. Then the triangle inequality implies E X X, E X X, E X X, E X X <. Therefore, by Fubini s 22

23 theorem and Lemma 1, it follows that Q(X) = φ X (t) φ X(t) 2 w 1 (t) dt = E(1 cos t, X X ) w 1 (t) dt + E(1 cos t, X X ) w 1 (t) dt E(1 cos t, X X ) w 1 (t) dt E(1 cos t, X X ) w 1 (t) dt = E X X + E X X E X X E X X <. Finally, Q(X) 0 since the integrand φ X (t) φ X(t) 2 is non-negative. Lemma 1 Proof. After a simple calculation, we have φ n X (t) φñ X (t) 2 = φ n X (t)φn X (t) φn X (t)φñ X (t) φñ X (t)φn X (t) + φñ X (t)φñ X (t) = 1 n 2 n k,l=1 cos t, Xk X l 2 n d+1 n k,l 1,...,l d =1 cos t, Xk (X l 1 1,..., X l d d ) + 1 n 2d n k 1,...,k d,l 1,...,l d =1 cos t, (Xk 1 1,..., X k d d ) (Xl 1 1,..., X l d d ) + V = 1 n 2 n k,l=1 [1 cos t, Xk X l ] + 2 n d+1 n k,l 1,...,l d =1 [1 cos t, Xk (X l 1 1,..., X l d d ) ] 1 n 2d n k 1,...,k d,l 1,...,l d =1 [1 cos t, (Xk 1 1,..., X k d d ) (Xl 1 1,..., X l d d ) ] + V, where V is imaginary and thus 0 as the φ n X (t) φñ X (t) 2 is real. By Lemma 1 in Székely and Rizzo (2005) Q n (X) = φ n X (t) φñ X (t) 2 w 1 = 1 n 2 n k,l=1 Xk X l + 2 n d+1 n k,l 1,...,l d =1 Xk (X l 1 1,..., X l d d ) 1 n 2d n k 1,...,k d,l 1,...,l d =1 (Xk 1 1,..., X k d d ) (Xl 1 1,..., X l d d ). Lemma 2 Proof. After a simple calculation, we have φ n X (t) φn X (t) 2 = φ n X (t)φn X (t) φn X (t)φn (t) φn X (t)φn X (t) + φn X (t)φn X (t) = 1 n 2 n k,l=1 cos t, Xk X l 2 n 2 n k,l=1 cos t, Xk (X l 1,..., X l+d 1 d ) X 23

24 + 1 n n 2 k,l=1 cos t, (Xk 1,..., X k+d 1 d ) (X1, l..., X l+d 1 d ) + V = 1 n 2 n k,l=1 [1 cos t, Xk X l ] + 2 n 2 n k,l=1 [1 cos t, Xk (X l 1,..., X l+d 1 d ) ] 1 n n 2 k,l=1 [1 cos t, (Xk 1,..., X k+d 1 d ) (X1, l..., X l+d 1 d ) ] + V, where V is imaginary and thus 0 as the φ n X (t) φn X (t) 2 is real. By Lemma Lemma 1 in Székely and Rizzo (2005) Q n(x) = φ n X (t) φn X (t) 2 w 1 = 1 n 2 n k,l=1 Xk X l + 2 n 2 n k,l=1 Xk (X l 1,..., X l+d 1 d ) 1 n n 2 k,l=1 (Xk 1,..., X k+d 1 d ) (X1, l..., X l+d 1 d ). Theorem 2 Proof. We define Q n = φ n X(t) φ ñ X (t) 2 w 1 ξ n (t) 2 w 1 and Q n = φ n X(t) φ n X (t) 2 w 1 ξ n(t) 2 w 1. For 0 < δ < 1, define the region D(δ) = {t = (t 1,..., t d ) : δ t 2 = d t j 2 2/δ}, (6) j=1 and random variables Q n,δ = ξ n (t) 2 dw 1 and Q n,δ = ξn(t) 2 dw 1. D(δ) D(δ) For any fixed δ, the weight function w 1 (t) is bounded on D(δ). Hence Q n,δ is a combination of V -statistics of bounded random variables. Similar to Theorem 2 of Székely et al. (2007), it follows by the strong law of large numbers (SLLN) for V -statistics (Mises, 1947) 24

25 that almost surely lim Q n,δ = lim Q n,δ = Q,δ = φ X (t) φ X(t) 2 dw 1. n n D(δ) Clearly Q,δ Q as δ 0. Hence, Q n,δ Q a.s. and Q n,δ Q a.s. as δ 0, n. In order to show Q n Q a.s. and Q n Q a.s. as n, it remains to prove that almost surely lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup Q n,δ Q n = 0. n and We define a mixture of X and X as Y c = ( X 1,..., X c 1, X c +), c = 1,..., d 1. By the Cauchy Bunyakovsky inequality ξ n (t) 2 = φ n X (t) d j=1 φn X j (t j ) 2 = φ n X (t) d j=1 φn X j (t j ) d 2 c=1 ( c j=1 φn X j (t j )φ n X c + (t c +))+ d 2 c=1 ( c j=1 φn X j (t j )φ n X c + (t c +)) 2 [ φ n X (t) φn X 1 (t 1 )φ n X 1 + (t 1 +) + d 2 c=1 ( c j=1 φn X j (t j )φ n X c + (t c +)) ( c j=1 φn X j (t j )φ n X c+1 (t c+1 )φ n X (c+1) + (t (c+1) +)) ]2 = [ d 1 c=1 φn (X c,y c ) (t c, t c ) φ n X c (t c )φ n Y c (t c ) ] 2 (d 1) d 1 c=1 φn (X c,y c ) (t c, t c ) φ n X c (t c )φ n Y c (t c ) 2, ξn(t) 2 = 1 n n k=1 ei t,xk 1 n = 1 n n k=1 (ei t,xk d 1 = 1 n n k=1 (d 1) d 1 c=1 1 n (d 1) d 1 c=1 ( 1 n = (d 1) d 1 c=1 ( 1 n (d 1) d 2 c=2 n c=2 ei t,(xk 1,...,Xk+c 1 c d 1 c=1 (ei t,(xk 1,...,Xk+c 1 c n k=1 ei d j=1 t j,x k+j 1 j 2,X k c + ) + d 1 c=2 ei t,(xk 1,...,Xk+c 1 c,x k c + ) e i t,(xk 1,...,Xk+c c+1,xk (c+1) + ) ) 2,X k c + ) e i d n k=1 ei t (c+1),(x1 k,...,xk+c 1 c,x k (c+1) + ) (e i t c+1,xc+1 k e i t c+1,x k+c c+1 ) 2 j=1 t j,x k+j 1 n k=1 ei t (c+1),(x k 1,...,Xk+c 1 c,x k (c+1) + ) 2 1 n n k=1 ei t c+1,x k c+1 e i t c+1,x k+c c+1 2 ) n k=1 ei t c+1,x k c+1 e i t c+1,x k+c c+1 2 ) n k=1 ( ei tc,xk c φ Xc (t c ) 2 + φ Xc (t c ) e i tc,xk+c 1 c 2 ) = 4(d 1) d 1 n c=2 n k=1 ei tc,xk c φ Xc (t c ) j ) 2

26 By the inequality sa + (1 s)b a s b 1 s, 0 < s < 1, a, b > 0, we have t 1+p = ( t c 2 + t c 2 ) 1+p 2 ( 1+pc j c p j j c p j t 2+p c t 2+p c 2 ) 1+p 2 ( t c 2(1+pc) 2+p t c 2(1+ = t c 1+ +p 2+p c t c 1+pc 2+p + j c p j t c mc+pc t c m c+ j c p j, where 0 < m c < 1, 0 < m c < 1 and consequently w 1 (t) = 1 K(p,1) t 1+p j c p j ) 2+p ) 1+p 2 K(pc,mc)K( j c p j,m c ) 1 1 K(p,1) K(p c,m c) t c mc+pc K( j c p j,m c ) t c m c + j c p j C(p, p c, j c p j) 1 K(p c,m c) t c mc+pc 1 K( j c p j,m c ) t c m c + j c p j, where C(p, p c, j c p j) is a constant depending only on p, p c, j c p j. By the fact {R p \D(δ)} { t c 2, t c 2 < δ} { t c 2 > 1/δ} { t c 2 > 1/δ} and similar steps in Theorem 2 of Székely et al. (2007), almost surely and lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup n R p \D(δ) ξ n(t) 2 dw 1 (d 1) d 1 c=1 lim sup δ 0 lim sup n R p \D(δ) φn (X c,y c ) (t c, t c ) φ n X c (t c )φ n Y c (t c ) 2 dw 1 C(p, p c, j c p j)(d 1) d 1 c=1 lim sup δ 0 lim sup n R p \D(δ) φn (X c,y c ) (t c, t c ) = 0, φ n X c (t c )φ n Y c (t c ) 2 1 K(p c,m c) t c mc+pc 1 K( j c p j,m c ) t c m c + j c p j dt c dt c lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup n R p \D(δ) ξ n(t) 2 dw 1 4(d 1) d c=2 lim sup 1 δ 0 lim sup n n n k=1 R p \D(δ) ei tc,xk c φ Xc (t c ) 2 dw 1 C(p, p c, j c p j)4(d 1) d c=2 lim sup 1 δ 0 lim sup n n n k=1 R p \D(δ) ei tc,xk c φ Xc (t c ) 2 1 K(p c,m c) t c mc+pc 1 K( j c p j,m c ) t c m c + j c p j dt c dt c = 0. Therefore, almost surely lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup n Q n,δ Q n = 0. Theorem 3 Proof. (i) Under H 0 : 26

27 Let ζ(t) denote a complex-valued Gaussian processe with mean zero and covariance functions R(t, t 0 ) = d d φ Xj (t j t 0 j) + (d 1) φ Xj (t j )φ Xj (t 0 j ) j=1 j=1 d φ Xj (t j )φ Xj (t 0 j ). j=1 φ Xj (t j t 0 j) j j We define nq n = n φ n X(t) φ ñ X (t) 2 w 1 ζ n (t) 2 w 1. After a simple calculation, we have E[ζ n (t)] = E[ζ n(t)] = 0, E[ζ n (t)ζ n (t 0 )] = (1 1 n d 1 ) d j=1 φ X j (t j t 0 j) + (n 1 (n 1)d n d 1 ) d j=1 φ X j (t j )φ Xj (t 0 j ) (n 1)d 1 [ d n d 1 j=1 φ X j (t j t 0 j) j j φ X j (t j )φ Xj (t 0 j )] + o n(1) R(t, t 0 ) as n. In particular, E ζ n (t) 2 R(t, t) d as n. Thus, E ζ n (t) 2 d + 1 for enough large n. For 0 < δ < 1, define the region D(δ) as (6). Given ɛ > 0, we choose a partition {D l (δ)} N l=1 of D(δ) into N(ɛ) measurable sets with diameter at most ɛ, and suppress the notation of D(δ), D l (δ) as D, D l. Then we define two sequences of random variables for any fixed t l D l, l = 1,..., N Q n (δ) = N l=1 D l ζ n (t l ) 2 dw 1. For any fixed M > 0, let β(ɛ) = sup t,t 0 E ζ n (t) 2 ζ n (t 0 ) 2 where the supremum is taken over all t = (t 1,..., t d ) and t 0 = (t 0 1,..., t 0 d ) s.t. max{ t 2, t 0 2 } M and t t 0 2 = d j=1 t j t 0 j 2 ɛ 2. By the continuous mapping theorem and ζ n (t) ζ n (t 0 ) as ɛ 0, we 27

28 have ζ n (t) 2 ζ n (t 0 ) 2 as ɛ 0. By the dominated convergence theorem and E ζ n (t) 2 d+1 for enough large n, we have E ζ n (t) 2 ζ n (t 0 ) 2 0 as ɛ 0, which leads to β(ɛ) 0 as ɛ 0. As a result E ζ D n(t) 2 dw 1 Q n (δ) = E N l=1 ( ζ D l n (t) 2 ζ n (t l ) 2 ) dw 1 N l=1 E ζ D l n (t) 2 ζ n (t l ) 2 dw 1 β(ɛ) 1 dw D 1 0 as ɛ 0. By similar steps in Theorem 2, we have E D ζ n(t) 2 dw 1 ζ n 2 w 1 0 as δ 0 and E D ζ n(t) 2 dw 1 ζ n 2 w 1 0 as δ 0. Therefore E Q n (δ) ζ n 2 w 1 0 as ɛ, δ 0 and E Q n(δ) ζ n 2 w 1 0 as ɛ, δ 0. On the other hand, define two random variables for any fixed t l D l, l = 1,..., N Q(δ) = N l=1 D l ζ(t l ) 2 dw 1. Similarly, we have E Q(δ) ζ 2 w 1 0 as ɛ, δ 0. By the multivariate central limit theorem, delta method and continuous mapping theorem, we have Q n (δ) = N l=1 Therefore D l ζ n (t l ) 2 dw 1 D N l=1 ζ n 2 w 1 D ζ 2 w 1 as ɛ, δ 0, n, since {Q n (δ)} have the following properties (a) Q n (δ) converges in distribution to Q(δ) as n. (b) E Q n (δ) ζ n 2 w 1 0 as ɛ, δ 0. (c) E Q(δ) ζ 2 w 1 0 as ɛ, δ 0. D l ζ(t l ) 2 dw 1 = Q(δ) as n. 28

29 Analogous to ζ(t), ζ n (t), β(ɛ), Q(δ), Q n (δ) for Q n, we can define ζ (t), ζn(t), β (ɛ), Q (δ), Q n(δ) for Q n, and prove that ζn 2 w 1 D ζ 2 w 1 as ɛ, δ 0, n through the same derivations. The only differences are E[ζn(t)ζ n(t 0 )] = 2R(t, t 0 ) and E ζn(t) 2 = 2R(t, t) 2d + 1 for enough large n. (ii) Under H A : By Theorem 1 and 2, we have Q n Q > 0 a.s. as n. Therefore nq n a.s. as n. Similarly, we can prove that nq n a.s. as n through the same derivations. Theorem 4 Proof. (i) 0 R(X) <. (ii) 0 S(X) <. (iii) R(X) = d 1 c=1 V2 (X c, X c +) = 0 X 1,..., X d are mutually independent. (iv) S(X) = d c=1 V2 (X c, X c ) = 0 X 1,..., X d are mutually independent. Since E X <, we have 0 V 2 (X c, X c +) <, c = 1,..., d 1. Thus, 0 R(X) = d 1 c=1 V2 (X c, X c +) <. 1. Similarly, we have 0 S(X) = d c=1 V2 (X c, X c ) <. = If X 1,..., X d are mutually independent, then X c and X c + are independent, c = 1,..., d By Theorem 3 of Székely et al. (2007), V 2 (X c, X c +) = 0, c = 1,..., d 1. As a result, R(X) = 0. Similarly, we can prove that S(X) = 0, since X c and X c are independent, c = 1,..., d. 29

30 = If R(X) = 0, then V 2 (X c, X c +) = 0, c = 1,..., d 1. By Theorem 3 of Székely et al. (2007), X c and X c + are independent, c = 1,..., d 1. Thus, For all t R p, we have φ (Xj,X j + )(t j, t j +) φ Xj (X j )φ Xj + (t j +) = 0, where φ Xj and φ Xj+ denote the marginal and φ (Xj,X j + ) denotes the joint characteristic function of X j and X j + respectively, j = 1,..., d. For all t R p, we have φ X (t) d j=1 φ X j (t j ) = φ X (t) d j=1 φ X j (t j ) d 2 c=1 ( c j=1 φ X j (t j )φ Xc + (t c +)) + d 2 c=1 ( c j=1 φ X j (t j )φ Xc + (t c +)) φ X (t) φ X1 (t 1 )φ X1 + (t 1 +) + d 2 c=1 c j=1 φ X j (t j )φ Xc + (t c +) c j=1 φ X j (t j )φ Xc+1 (t c+1 )φ X(c+1) + (t (c+1) +) φ X (t) φ X1 (t 1 )φ X1 + (t 1 +) + d 2 c=1 c j=1 φ X j (t j ) φ Xc + (t c +) φ Xc+1 (t c+1 )φ X(c+1) + (t (c+1) +) φ X (t) φ X1 (t 1 )φ X1 + (t 1 +) + d 2 c=1 φ X c + (t c +) φ Xc+1 (t c+1 )φ X(c+1) + (t (c+1) +) = d 1 c=1 φ (X c,x c + )(t c, t c +) φ Xc (t c )φ Xc + (t c +) = 0. Therefore, for all t R p, we have φ X (t) d j=1 φ X j (t j ) = 0, which implies that X 1,..., X d are mutually independent. Similarly, we can prove that S(X) = 0 implies that X 1,..., X d are mutually independent, since X c and X c are independent implies that X c and X c + are independent. 30

31 Theorem 5 Proof. By Theorem 2 of Székely et al. (2007) lim n V 2 n(x c, X c +) = V 2 (X c, X c +), c = 1,..., d 1, lim n V 2 n(x c, X c ) = V 2 (X c, X c ), c = 1,..., d. Therefore, the limit of sum converges to the sum of limit as R n (X) a.s. R(X) and S n(x) a.s. S(X). n n Theorem 6 Proof. (i) Under H 0 : We define d 1 d 1 nr n (X) = n Vn(X 2 c, X c +) ζn(t c (c 1) +) 2 w 0, c=1 which is the sum corresponding to the pairs {X d 1, X d }, {X d 2, (X d 1, X d )}, {X d 3, (X d 2, X d 1, X d )},..., {X 1, (X 2,..., X d )}. Any two of them can be reorganized as {X 1, X 2 } and {X 4, (X 1, X 2, X 3 )} where X 3 could be empty. Without loss of generality, next we will show φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 ) and φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 ) are uncorrelated. Then it follows that ζ c n(t (c 1) +), c = 1,..., d 1 are uncorrelated. After a simple calculation, we have E[φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 )] = E[φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 )] = 0, E[φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 )][φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 )] = 0. As a result Cov(φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 ), φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 )) = 0. For δ > 0, define the region D c (δ) = {t (c 1) + = (t c, t c +) = (t c,..., t d ) : δ t (c 1) + 2 = d j=c t j 2 2/δ}. Given ɛ > 0, we choose a partition {D l c} Nc l=1 of D c(δ) into N c (ɛ) measur- c=1 31

32 able sets with diameter at most ɛ, and define a sequence of random variables for any fixed t l (c 1) Dc, l l = 1,..., N + c as N c Q c n(δ) = ζn(t c l (c 1) +) 2 dw 0. l=1 Dc l Let ζ c (t (c 1) +) = ζ c (t c, t c +) denote a complex-valued Gaussian process with mean zero and covariance function Rc(t ζ (c 1) +, t 0 (c 1) ) = [φ + Xc (t c t 0 c) φ Xc (t c )φ Xc (t 0 c)][φ Xc + (t c + t 0 c ) + φ Xc + (t c +)φ Xc + (t 0 c )]. + By the multivariate central limit theorem, delta method and continuous mapping theorem, we have Q 1 n(δ) N 1 l=1 ζ 1 (t l ) 2 dw D1 l 0. Q d 1 n (δ) N d 1 D l=1 ζ d 1 (t l Dd 1 l (d 2) ) 2 dw + 0 as n with asymptotic mutual independence. Nd 1 l=1 N1 l=1 ζ 1 (t l ) 2 dw D1 l 0 D l d 1. ζ d 1 (t l (d 2) + ) 2 dw 0 Thus, Q c n(δ), c = 1,..., d 1 are asymptotically mutually independent. By similar steps in Theorem 5 of Székely et al. (2007), we have E Q c n(δ) ζ c n(t (c 1) +) 2 w 0 0, c = 1,..., d 1 as ɛ, δ 0. Hence ζn(t) 1 2 w 0 Q 1 n(δ) 0. P. as ɛ, δ 0. ζn d 1 (t (d 2) +) 2 w 0 Q d 1 n (δ) 0 By the multivariate Slutsky s theorem, we have ζn(t) 1 2 w 0 ζ 1 (t) 2 w 0. D. ζn d 1 (t (d 2) +) 2 w 0 ζ d 1 (t (d 2) +) 2 w 0 as ɛ, δ 0, n with asymptotic mutual independence. Therefore 32

33 ζn(t c (c 1) +) 2 w 0, c = 1,..., d 1 are asymptotically mutually independent. d 1 c=1 ζc n(t (c 1) +) 2 w 0 d 1 D c=1 ζc (t (c 1) +) 2 w 0 as n. Analogous to ζ c n(t (c 1) +), ζ c (t (c 1) +), R ζ c(t (c 1) +, t 0 (c 1) + ) for R n (X), we can define η c n(t), η c (t), R η c(t, t 0 ) for S n (X), and prove that η c n(t) 2 w 0, c = 1,..., d are asymptotically mutually independent, and d c=1 ηc n(t) 2 w 0 D d c=1 η c(t) 2 w 0 as n through the same derivations. The only differences are that we will show φ n (X 1,X 2,X 3 ) (t 1, t 2, t 3 ) φ n X 1 (t 1 )φ n (X 2,X 3 ) (t 2, t 3 ) and φ n (X 1,X 2,X 3 ) (s 1, s 2, s 3 ) φ n X 2 (s 2 )φ n (X 1,X 3 ) (s 1, s 3 ) are asymptotically uncorrelated. (ii) Under H A : By Theorem 4, we have R n R > 0 a.s. as n. Therefore nr n a.s. as n. Similarly, we can prove that ns n a.s. as n through the same derivations. Remark. Under H A, ζ c n(t (c 1) +), c = 1,..., d 1 are not asymptotically uncorrelated, and η c n(t), c = 1,..., d are not asymptotically uncorrelated. Complete Measure of Mutual Dependence Using Weight Function w 2 Except that U n (X) requires the additional d-th moment condition E X 1... X d < to be simplified, U(X) is in an extremely complicated form. Even when d = 3, U(X) already has 33

34 12 different terms as follows. U(X) = φ X (t) φ X(t) 2 w 2 = E X 1 X 1 X 2 X 2 X 3 X 3 + 2E X 1 X 1 X 2 X 2 X 3 X 3 E X 1 X 1 E X 2 X 2 E X 3 X 3 + E X 1 X 1 X 2 X 2 + E X 1 X 1 X 3 X 3 + E X 2 X 2 X 3 X 3 2E X 1 X 1 X 2 X 2 2E X 1 X 1 X 3 X 3 2E X 2 X 2 X 3 X 3 + E X 1 X 1 E X 2 X 2 + E X 1 X 1 E X 3 X 3 + E X 2 X 2 E X 3 X 3 = E X 1 X 1 X 2 X 2 X 3 X 3 + 2E X 1 X 1 X 2 X 2 X 3 X 3 E X 1 X 1 E X 2 X 2 E X 3 X 3 + E X i X i X j X j 2 1 i<j 3 1 i<j i<j 3 E X i X i X j X j E X i X i E X j X j. In general, the number of different terms in U(X) grows exponentially as d increases. Basically, we will see all combinations of all components in all moments as expectations. 34

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014