Generalizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence

Size: px
Start display at page:

Download "Generalizing Distance Covariance to Measure. and Test Multivariate Mutual Dependence"

Transcription

1 Generalizing Distance Covariance to Measure and Test Multivariate Mutual Dependence arxiv: v5 [math.st] 25 Feb 2018 Ze Jin, David S. Matteson February 27, 2018 Abstract We propose three new measures of mutual dependence between multiple random vectors. Each measure is zero if and only if the random vectors are mutually independent. The first generalizes distance covariance from pairwise dependence to mutual dependence, while the other two measures are sums of squared distance covariances. The proposed measures share similar properties and asymptotic distributions with distance covariance, and capture non-linear and non-monotone mutual dependence between the random vectors. Inspired by complete and incomplete V-statistics, we define empirical and simplified empirical measures as a trade-off between the complexity and statistical power when testing mutual independence. The implementation of corresponding tests is demonstrated by both simulation results and real data examples. Key words: characteristic functions; distance covariance; multivariate analysis; mutual independence; V-statistics Research support from an NSF Award (DMS ), a Xerox PARC Faculty Research Award, and Cornell University Atkinson Center for a Sustainable Future (AVF-2017). 1

2 1 Introduction Let X = (X 1,..., X d ) be a set of variables where each component X j, j = 1,..., d is a continuous random vector, and let X = {X k = (X1 k,..., Xd k ) : k = 1,..., n} be an i.i.d. sample from F X, the joint distribution of X. We are interested in testing the hypothesis H 0 : X 1,..., X d are mutually independent, H A : X 1,..., X d are dependent, which has many applications, including independent component analysis (Matteson and Tsay, 2017), graphical models (Fan et al., 2015), naive Bayes classifiers (Tibshirani et al., 2002), etc. This problem has been studied under different settings and assumptions, including pairwise (d = 2) and mutual (d 2) independence, univariate (X 1,..., X d R 1 ) and multivariate (X 1 R p 1,..., X d R p d ) components, and more. Specifically, we focus on the general case that X 1,..., X d are not assumed jointly normal. The most extensively studied case is pairwise independence with univariate components (X 1, X 2 R 1 ): Rank correlation is considered as a non-parametric counterpart to Pearson s product-moment correlation (Pearson, 1895), including Kendall s τ (Kendall, 1938), Spearman s ρ (Spearman, 1904), etc. Bergsma and Dassios (2014) proposed a test based on an extension of Kendall s τ, testing an equivalent condition to H 0. Additionally, Hoeffding (1948) proposed a non-parametric test based on marginal and joint distribution functions, testing a necessary condition to investigate H 0. For pairwise independence with multivariate components (X 1 R p 1, X 2 R p 2 ): Székely et al. (2007), Székely and Rizzo (2009) proposed a test based on distance covariance with fixed p 1, p 2 and n, testing an equivalent condition to H 0. Further, Székely and Rizzo (2013a) proposed a t-test based on a modified distance covariance for the setting in which n is finite and p 1, p 2, testing an equivalent condition to H 0 as well. For mutual independence with univariate components (X 1,..., X d R 1 ): One natural 2

3 way to extend the pairwise rank correlation to multiple components is to collect the rank correlations between all pairs of components, and examine the norm (L 2, L ) of this collection. Leung and Drton (2015) proposed a test based on the L 2 norm with n, d, and d/n γ (0, ), and Han and Liu (2014) proposed a test based on the L norm with n, d, and d/n γ [0, ]. Each are testing a necessary condition to H 0, in general. For mutual independence with multivariate components (X 1 R p 1,..., X d R p d ): This challenging scenario has not been well studied. Yao et al. (2016) proposed a test based on distance covariance between all pairs of components with n, d, testing a necessary condition to H 0. Inspired by distance covariance in Székely et al. (2007), we propose a new test based on measures of mutual dependence with fixed d, p 1,..., p d and n in this paper, testing an equivalent condition to H 0. All computational complexities in this paper make no reference to the dimensions d, p 1,..., p d, as they are treated as constants. Our measures of mutual dependence involve V-statistics, and are 0 if and only if mutual independence holds. They belong to energy statistics (Székely and Rizzo, 2013b), and share many statistical properties with distance covariance. Besides, Pfister et al. (2016) proposed d-variable Hilbert Schmidt independence criterion (dhsic) under the same setting, which originates from HSIC (Gretton et al., 2005), and also is 0 if and only if mutual independence holds. Although dhsic involves V-statistics as well, they pursue kernel methods and overcome the computation bottleneck by resampling and Gamma approximation, while we take advantage of characteristic functions and resort to incomplete V-statistics. The weakness of testing mutual independence by a necessary condition, all pairwise independencies motivates our work on measures of mutual dependence, which is demonstrated by examples in section 5: If we directly test mutual independence based on the measures of mutual dependence proposed in this paper, we successfully detect mutual dependence. Alternatively, if we check all pairwise independencies based on distance covariance, we fail to detect any pairwise dependence, and mistakenly conclude that mutual independence holds 3

4 probably because the mutual effect averages out when we narrow down to a pair. The rest of this paper is organized as follows. In section 2, we give a brief overview of distance covariance. In section 3, we generalize distance covariance to complete measure of mutual dependence, with its properties and asymptotic distributions derived. In section 4, we propose asymmetric and symmetric measures of mutual dependence, defined as sums of squared distance covariances. We present synthetic and real data analysis in section 5, followed by simulation results in section 6 1. Finally, section 7 is the summary of our work. All proofs have been moved to appendix. The following notations will be used throughout this paper. Let (,,..., ) denote a concatenation of (vector) components into a vector. Let t = (t 1,..., t d ), t 0 = (t 0 1,..., t 0 d ), X = (X 1,..., X d ) R p where t j, t 0 j, X j R p j, such that p j is the marginal dimension, j = 1,..., d, and p = d j=1 p j is the total dimension. The assumed X under H 0 is denoted by X = ( X 1,..., X d ), where X j D = X j, j = 1,..., d, X1,..., X d are mutually independent, and X, X are independent. Let X, X be independent copies of X, i.e., X, X i.i.d., X F X, and X, X be independent copies of X, i.e., X, X, X i.i.d. F X. Let the weighted L 2 norm w of complex-valued function η(t) be defined by η(t) 2 w = R p η(t) 2 w(t) dt where η(t) 2 = η(t)η(t), η(t) is the complex conjugate of η(t), and w(t) is any positive weight function for which the integral exists. Given the i.i.d. sample X from F X, let X j = {X k j : k = 1,..., n} denote the corresponding i.i.d. sample from F Xj, j = 1,..., d, such that X = {X 1,..., X d }. Denote the joint characteristic functions of X and X as φ X (t) = E[e i t,x ] and φ X(t) = d j=1 E[ei t j,x j ], and denote the empirical versions of φ X (t) and φ X(t) as φ n X (t) = 1 n n k=1 ei t,xk and φ ñ (t) = X d j=1 ( 1 n n k=1 ei t j,xj k ). 1 An accompanying R package EDMeasure (Jin et al., 2018) is available on CRAN. 4

5 2 Distance Covariance Székely et al. (2007) proposed distance covariance to capture non-linear and non-monotone pairwise dependence between two random vectors (X 1 R p 1, X 2 R p 2 ). X 1, X 2 are pairwise independent if and only if φ X (t) = φ X1 (t 1 )φ X2 (t 2 ), t, which is equivalent to R p φ X (t) φ X(t) 2 w(t) dt = 0, w(t) > 0 if the integral exists. A class of the weight functions w 0 (t, m) = (K(p 1 ; m)k(p 2 ; m) t 1 p 1+m t 2 p 2+m ) 1 make the integral a finite and meaningful quantity composed of m-th moments according to Lemma 1 in Székely and Rizzo (2005), where K(q, m) = 2πq/2 Γ(1 m/2), and Γ is the gamma function. m2 m Γ((q+m)/2) The non-negative distance covariance V(X) is defined by V 2 (X) = φ X (t) φ X(t) 2 w 0 = R p φ X (t) φ X(t) 2 w 0 (t) dt, where w 0 (t) = (K p1 K p2 t 1 p 1+1 t 2 p 2+1 ) 1, (1) with m = 1 and K q = K(q, 1), while any following result can be generalized to 0 < m < 2. If E X <, then V(X) [0, ), and V(X) = 0 if and only if X 1, X 2 are pairwise independent. The non-negative empirical distance covariance V n (X) is defined by V 2 n(x) = φ n X (t) φ ñ X (t) 2 w 0 = R p φ n X (t) φñ X (t) 2 w 0 (t) dt. Calculating V 2 n(x) via the symmetry of Euclidian distances has the time complexity O(n 2 ). Some asymptotic properties of V n (X) are derived. If E X <, then (i) V n (X) a.s. n V(X). (ii) Under H 0, nv 2 n(x) D n ζ(t) 2 w 0 where ζ(t) is a complex-valued Gaussian process with mean zero and covariance function R(t, t 0 ) = [φ X1 (t 1 t 0 1) φ X1 (t 1 )φ X1 (t 0 1)][φ X2 (t 2 t 0 2) φ X2 (t 2 )φ X2 (t 0 2)]. (iii) Under H A, nv 2 n(x) a.s. n. 3 Complete Measure of Mutual Dependence Generalizing the idea of distance covariance, we propose complete measure of mutual dependence to capture non-linear and non-monotone mutual dependence between multiple random 5

6 vectors (X 1 R p 1,..., X d R p d ). X 1,..., X d are mutually independent if and only if φ X (t) = φ X1 (t 1 )... φ Xd (t d ) = φ X(t), t, which is equivalent to φ R p X (t) φ X(t) 2 w(t) dt = 0, w(t) > 0 if the integral exists. We put all components together instead of separating them, and choose the weight function w 1 (t) = (K p t p+1 ) 1. (2) Definition 1. The complete measure of mutual dependence Q(X) is defined by Q(X) = φ X (t) φ X(t) 2 w 1 = φ X (t) φ X(t) 2 w 1 (t) dt. R p We can show an equivalence to mutual independence based on Q(X) according to Lemma 1 in Székely and Rizzo (2005). Theorem 1. If E X <, then Q(X) [0, ), and Q(X) = 0 if and only if X 1,..., X d are mutually independent. In addition, Q(X) has an interpretation as expectations Q(X) = E X X + E X X E X X E X X. It is straightforward to estimate Q(X) by replacing the characteristic functions with the empirical characteristic functions from the sample. Definition 2. The empirical complete measure of mutual dependence Q n (X) is defined by Q n (X) = φ n X(t) φ ñ X (t) 2 w 1 = φ n X(t) φ ñ R p X (t) 2 w 1 (t) dt. 6

7 Lemma 1. Q n (X) has an interpretation as complete V-statistics Q n (X) = 2 n d+1 1 n 2d n k,l 1,...,l d =1 n k 1,...,k d,l 1,...,l d =1 X k (X l 1 1,..., X l d d ) + 1 n 2 n X k X l k,l=1 (X k 1 1,..., X k d d ) (Xl 1 1,..., X l d d ), whose naive implementation has the time complexity O(n 2d ). In view of the definition of distance covariance, it may seem natural to define the measure using the weight function w 2 (t) = (K p1... K pd t 1 p t d p d+1 ) 1, (3) which equals w 0 (t) when d = 2. Given the weight function w 2 (t), we can define the squared distance covariance of mutual dependence U(X) = φ X (t) φ X(t) 2 w 2 and its empirical counterpart U n (X) = φ n X (t) φñ X (t) 2 w 2, which equal V 2 (X) and Vn(X) 2 when d = 2. The naive implementation of U n (X) has the time complexity O(n d+1 ). The reason to favor w 1 (t) instead of w 2 (t) is a trade-off between the moment condition and time complexity. We often cannot afford the time complexity of Q n (X) or U n (X), and have to simplify them through incomplete V-statistics. An incomplete V-statistic is obtained by sampling the terms of a complete V-statistic, where the summation extends over only a subset of the tuple of indices. To simplify by replacing complete V-statistics with incomplete V-statistics, U n (X) requires the additional d-th moment condition E X 1... X d <, while Q n (X) does not require any other condition in addition to the first moment condition E X <. Thus, we can reduce the complexity of Q n (X) to O(n 2 ) with a weaker condition, which makes Q(X) and Q n (X) from w 1 (t) a more general solution. Moreover, we define the 7

8 simplified empirical version of φ X(t) as φ n X (t) = 1 n n k=1 e i d j=1 t j,x k+j 1 j = 1 n n k=1 e i t,(xk 1,...,Xk+d 1 d ), in order to substitute φ ñ (t) for simplification, where Xn+k X j is interpreted as Xj k for k > 0. Definition 3. The simplified empirical complete measure of mutual dependence Q n(x) is defined by Q n(x) = φ n X(t) φ n X (t) 2 w 1 = φ n X(t) φ n X (t) 2 w 1 (t) dt. R p Lemma 2. Q n(x) has an interpretation as incomplete V-statistics Q n(x) = 2 n 2 n 1 n 2 k,l=1 X k (X l 1,..., X l+d 1 d ) + 1 n 2 n k,l=1 n X k X l k,l=1 (X1 k,..., X k+d 1 d ) (X1, l..., X l+d 1 d ), whose naive implementation has the time complexity O(n 2 ). Using a similar derivation to Theorem 2 and 5 of Székely et al. (2007), some asymptotic distributions of Q n (X), Q n(x) are obtained as follows. Theorem 2. If E X <, then Q n (X) a.s. Q(X) and n Q n(x) a.s. Q(X). n Theorem 3. If E X <, then under H 0, we have nq n (X) D n ζ(t) 2 w 1 and nq D n(x) n ζ (t) 2 w 1, where ζ(t), ζ (t) are complex-valued Gaussian processes with mean zero and covariance func- 8

9 tions d d d R(t, t 0 ) = φ Xj (t j t 0 j) + (d 1) φ Xj (t j )φ Xj (t 0 j ) φ Xl (t l )φ Xl (t 0 l ), R (t, t 0 ) = 2R(t, t 0 ). j=1 j=1 j=1 φ Xj (t j t 0 j) l j Under H A, we have nq n (X) a.s. and n nq n(x) a.s. n. Therefore, a mutual independence test can be proposed based on the weak convergence of nq n (X), nq n(x) in Theorem 3. Since the asymptotic distributions of nq n (X), nq n(x) depend on F X, a permutation procedure is used to approximate them in practice. 4 Asymmetric and Symmetric Measures of Mutual Dependence As an alternative, we now propose the asymmetric and symmetric measures of mutual dependence to capture mutual dependence via aggregating pairwise dependencies. The subset of components on the right of X c is denoted by X c + = (X c+1,..., X d ), with t c + = (t c+1,..., t d ), c = 0, 1,..., d 1. The subset of components except X c is denoted by X c = (X 1,..., X c 1, X c +), with t c = (t 1,..., t c 1, t c +), c = 1,..., d 1. We denote pairwise independence by. The collection of pairwise independencies implied by mutual independence includes one versus others on the right {X 1 X 1 +, X 2 X 2 +,..., X d 1 X d }, (4) 9

10 one versus all the others {X 1 X 1, X 2 X 2,..., X d X d }, (5) and many others, e.g., (X 1, X 2 ) X 2 +. In fact, the number of pairwise independencies resulting from mutual independence is at least 2 d 1 1, which grows exponentially with the number of components d. Therefore, we cannot test mutual independence simply by checking all pairwise independencies even with moderate d. Fortunately, we have two options to test only a small subset of all pairwise independencies to fulfill the task. The first one is that H 0 holds if and only if (4) holds, which can be verified via the sequential decomposition of distribution functions. This option is asymmetric and not unique, having d! feasible subsets with respect to different orders of X 1,..., X d. The second one is that H 0 holds if and only if (5) holds, which can be verified via the stepwise decomposition of distribution functions and the fact that X j X j implies X j X j +. This option is symmetric and unique, having only one feasible subset. To shed light on why these two options are necessary and sufficient conditions to mutual independence, we present the following inequality that the mutual dependence can be bounded by a sum of several pairwise dependencies as d d 1 φ X (t) φ Xj (t j ) φ (Xc,Xc + )((t c, t c +)) φ Xc (t c )φ Xc + (t c +) 2. j=1 c=1 In consideration of these two options, we test a set of pairwise independencies in place of mutual independence, where we use V 2 (X) to test pairwise independence. Definition 4. The asymmetric and symmetric measures of mutual dependence R(X), S(X) 10

11 are defined by d 1 R(X) = V 2 ((X c, X c +)) and S(X) = c=1 d V 2 ((X c, X c )). c=1 We can show an equivalence to mutual independence based on R(X), S(X) according to Theorem 3 of Székely et al. (2007). Theorem 4. If E X <, then R(X), S(X) [0, ), and R(X), S(X) = 0 if and only if X 1,..., X d are mutually independent. It is straightforward to estimate R(X), S(X) by replacing the characteristic functions with the empirical characteristic functions from the sample. Definition 5. The empirical asymmetric and symmetric measures of mutual dependence R n (X), S n (X) are defined by d 1 R n (X) = Vn((X 2 c, X c +)) and S n (X) = c=1 d Vn((X 2 c, X c )). c=1 The implementations of R n (X), S n (X) have the time complexity O(n 2 ). Using a similar derivation to Theorem 2 and 5 of Székely et al. (2007), some asymptotic properties of R n (X), S n (X) are obtained as follows. Theorem 5. If E X <, then R n (X) a.s. R(X) and S n(x) a.s. S(X). n n Theorem 6. If E X <, then under H 0, we have nr n (X) d 1 D n j=1 ζj R ((t j, t j +)) 2 w 0 and ns n (X) 11 D n d ζj S ((t j, t j )) 2 w 0, j=1

12 where ζ R j ((t j, t j +)), ζ S j ((t j, t j )) are complex-valued Gaussian processes corresponding to the limiting distributions of nv 2 n((x j, X j +)), nv 2 n((x j, X j )). Under H A, we have nr n (X) a.s. and ns n(x) a.s. n n. It is surprising to find that Vn((X 2 c, X c +)), c = 1,..., d 1 are mutually independent asymptotically, and Vn((X 2 c, X c )), c = 1,..., d are mutually independent asymptotically as well, which is a crucial discovery behind Theorem 6. Alternatively, we can plug in Q(X) instead of V 2 (X) in Definition 4 and Q n (X) instead of Vn(X) 2 in Definition 5, and define the asymmetric and symmetric measures J (X), I(X) accordingly, which equal Q(X), Q n (X) when d = 2. The naive implementations of J n (X), I n (X) have the time complexity O(n 4 ). Similarly, we can replace Q n (X) with Q n(x) to simplify them, and define the simplified empirical asymmetric and symmetric measures Jn (X), In(X), reducing their complexities to O(n 2 ) without any other condition except the first moment condition E X <. Through the same derivations, we can show that J n (X), J n (X), I n (X), I n(x) have similar convergences as R n (X), S n (X) in Theorem 5 and 6. 5 Illustrative Examples We start with two examples comparing different methods to show the value of our mutual independence tests. In practice, people usually check all pairwise dependencies to test mutual independence, due to the lack of reliable and universal mutual independence tests. It is very likely to miss the complicated mutual dependence structure, and make unsound decisions in corresponding applications assuming that mutual independence holds. 12

13 5.1 Synthetic Data We define a triplet of random vectors (X, Y, Z) on R q R q R q, where X, Y N (0, I q ), W Exp(1/ 2), the first element of Z is Z 1 = sign(x 1 Y 1 )W and the remaining q 1 elements are Z 2:q N (0, I q 1 ), and X, Y, W, Z 2:q are mutually independent. Clearly, (X, Y, Z) is a pairwise independent but mutually dependent triplet. An i.i.d. sample of (X, Y, Z) is randomly generated with sample size n = 500 and dimension q = 5. On the one hand, we test the null hypothesis H 0 : X, Y, Z are mutually independent using proposed measures R n, S n, Q n, J n, I n. On the other hand, we test the null hypotheses H (1) 0 : X Y, H (2) 0 : Y Z, and H (3) 0 : X Z using distance covariance V 2 n. An adaptive permutation size B = 210 is used for all tests. As expected, mutual dependence is successfully captured, as the p-values of mutual independence tests are (Q n), (J n ), 0 (I n), (R n ) and 0 (S n ). Meanwhile, the p-values of pairwise independence tests are (X, Y ), (Y, Z), and (X, Z). According to the Bonferroni correction for multiple tests among all the pairs, the significance level should be adjusted as α/3 for pairwise tests. As a result, no signal of pairwise dependence is detected, and we cannot reject mutual independence. 5.2 Financial Data We collect the annual Fama/French 5 factors in the past 52 years between 1964 and In particular, we are interested in whether mutual dependence among three factors, X = Mkt-RF (excess return on the market), Y = SMB (small minus big), and Z = RF (risk-free return) exists, where annual returns are considered as nearly independent observations. Both histograms and pair plots of X, Y, Z are depicted in Figure 1. For one, we apply a single mutual independence test H 0 : X, Y, Z are mutually indepen- 2 Data at library.html. 13

14 dent. For another, we apply three pairwise independence tests H (1) 0 : X Y, H (2) 0 : Y Z, and H (3) 0 : X Z. An adaptive permutation size B = 296 is used for all tests. The p-values of mutual independence tests are (Q n), (J n ), (I n), (R n ) and (S n ), indicating that mutual dependence is successfully captured. In the meanwhile, the p-values of pairwise independence tests using distance covariance V 2 n are (X, Y ), (Y, Z) and (X, Z). Similarly, the significance level should be adjusted as α/3 according to the Bonferroni correction, and thus we cannot reject mutual independence, since no signal of pairwise dependence is detected. 6 Simulation Studies In this section, we evaluate the finite sample performance of proposed measures Q n, R n, S n, J n, I n, Q n, Jn, In by performing simulations similar to Székely et al. (2007), and compare them to benchmark measures Vn 2 (Székely et al., 2007) and HL τ, HL ρ (Han and Liu, 2014). We also include permutation tests based on finite-sample extensions of HL τ, HL ρ, denoted by HL τ n, HL ρ n. We test the null hypothesis H 0 with significance level α = 0.1 and examine the empirical size and power of each measure. In each scenario, we run 1,000 repetitions with the adaptive permutation size B = /n where n is the sample size, for all empirical measures that require a permutation procedure to approximate their asymptotic distributions, i.e., Q n, R n, S n, J n, I n, Q n, Jn, In, Vn. 2 In the following two examples, we fix d = 2 and change n from 25 to 500, and compare Q n, R n, S n, J n, I n, Q n, Jn, In to Vn. 2 Example 1 (pairwise multivariate normal). X 1, X 2 R 5, (X 1, X 2 ) T N 10 (0, Σ) where Σ ii = 1. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.1, i j. See results in Table 1 and 2. 14

15 Example 2 (pairwise multivariate non-normal). X 1, X 2 R 5, (Y 1, Y 2 ) T N 10 (0, Σ) where Σ ii = 1. X 1 = ln(y 2 1 ), X 2 = ln(y 2 2 ). Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.4, i j. See results in Table 3 and 4. For both example 1 and 2, the empirical size of all measures is close to α = 0.1. The empirical power of Q n, R n, S n, J n, I n is almost the same as that of V 2 n, while the empirical power of Q n, J n, I n is lower than that of V 2 n, which makes sense because we trade-off testing power and time complexity for simplified measures. In the following two examples, we fix d = 3 and change n from 25 to 500, and compare Q n, R n, S n, J n, I n to Q n, J n, I n. Example 3 (mutual multivariate normal). X 1, X 2, X 3 R 5, (X 1, X 2, X 3 ) T N 15 (0, Σ) where Σ ii = 1. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.1, i j. See results in Table 5 and 6. Example 4 (mutual multivariate non-normal). X 1, X 2, X 3 R 5. (Y 1, Y 2, Y 3 ) T N 15 (0, Σ) where Σ ii = 1. X k = ln(y 2 k ), k = 1, 2, 3. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.4, i j. See results in Table 7 and 8. For both example 3 and 4, the empirical size of all measures is close to α = 0.1. The empirical power of Q n, R n, S n, J n, I n is almost the same, the empirical power of Q n, J n, I n is almost the same, while the empirical power of Q n, J n, I n is lower than that of Q n, R n, S n, J n, I n, which makes sense since we trade-off testing power and time complexity for simplified measures. In the last example, we change d from 5 to 50 and fix n = 100, and compare R n, S n, Q n, J n, I n to HL τ, HL ρ, HL τ n, HL ρ n. Example 5 (mutual univariate normal high-dimensional). X 1,..., X d R 1. (X 1,..., X d ) T N d (0, Σ) where Σ ii = 1. Under H 0, Σ ij = 0, i j. Under H A, Σ ij = 0.1, i j. See results in Table 9 and

16 The empirical size of HL τ, HL ρ is much lower than α = 0.1 and too conservative, while that of other measures is fairly close to α = 0.1. The reason is probably that the convergence to asymptotic distributions of HL τ, HL ρ requires larger sample size n and number of components d. The measures R n, S n have the highest empirical power, and outperform the simplified measures Q n, Jn, In. The empirical power of simplified measures is similar to or even lower than that of benchmark measures when d = 5. However, the empirical power of simplified measures converges much faster than that of benchmark measures as d grows. Moreover, Q n shows significant advantage over Jn, In. The reason is probably that Q n is based on truly mutual dependence while Jn, In is based on pairwise dependencies, and large d compared to n introduces much more noise to Jn, In because their summation structures, which makes them more difficult to detect mutual dependence. The asymptotic analysis of our measures only allows small d compared to n, while our measures work well with large d compared to n in example 5. However, this success relies on the underlying dependence structure, which is dense since each component is dependent on any other component. In contrast, if the dependence structure is sparse as each component is dependent on only a few of other components, then all measures are likely to fail. 7 Conclusion We propose three measures of mutual dependence for random vectors based on the equivalence to mutual independence through characteristic functions, following the idea of distance covariance in Székely et al. (2007). When we select the weight function for the complete measure, we trade off between moment condition and time complexity. Then we simplify it by replacing complete V-statistics by incomplete V-statistics, as a trade-off between testing power and time complexity. These two trade-offs make the simplified complete measure both effective and efficient. 16

17 The asymptotic distributions of our measures depend on the underlying distribution F X. Thus, the corresponding tests are not distribution-free, and we use a permutation procedure to approximate the asymptotic distributions in practice. We illustrate the value of our measures through both synthetic and financial data examples, where mutual independence tests based on our measures successfully capture the mutual dependence, while the alternative checking all pairwise independencies fails and mistakenly leads to the conclusion that mutual independence holds. Our measures achieve competitive or even better results than the benchmark measures in simulations with various examples. Although we do not allow large d compared to n in asymptotic analysis, our measures work well in a large d example since the dependence structure is dense. Acknowledgements We are grateful to Stanislav Volgushev for helpful comments on a preliminary draft of this paper. We also thank an anonymous referee for helpful line-by-line comments. References W. Bergsma and A. Dassios. A consistent test of independence based on a sign covariance related to kendall s tau. Bernoulli, 20(2): , J. Fan, Y. Feng, and L. Xia. A conditional dependence measure with applications to undirected graphical models. arxiv preprint arxiv: , A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring statistical dependence with hilbert-schmidt norms. In ALT, volume 16, pages Springer, F. Han and H. Liu. Distribution-free tests of independence with applications to testing more structures. arxiv preprint arxiv: , W. Hoeffding. A non-parametric test of independence. The annals of mathematical statistics, pages , Z. Jin, S. Yao, D. S. Matteson, and X. Shao. EDMeasure: Energy-Based Dependence Measures, R package version

18 M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81 93, D. Leung and M. Drton. Testing independence in high dimensions with sums of squares of rank correlations. arxiv preprint arxiv: , D. S. Matteson and R. S. Tsay. Independent component analysis via distance covariance. Journal of the American Statistical Association, 112(518): , R. v. Mises. On the asymptotic distribution of differentiable statistical functions. The annals of mathematical statistics, 18(3): , K. Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58: , N. Pfister, P. Bühlmann, B. Schölkopf, and J. Peters. Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), C. Spearman. The proof and measurement of association between two things. The American journal of psychology, 15(1):72 101, G. J. Székely and M. L. Rizzo. A new test for multivariate normality. Journal of Multivariate Analysis, 93(1):58 80, G. J. Székely and M. L. Rizzo. Brownian distance covariance. The annals of applied statistics, 3(4): , G. J. Székely and M. L. Rizzo. The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, 117: , 2013a. G. J. Székely and M. L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8): , 2013b. G. J. Székely, M. L. Rizzo, and N. K. Bakirov. Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6): , R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 99(10): , S. Yao, X. Zhang, and X. Shao. Testing mutual independence in high dimension via distance covariance. arxiv preprint arxiv: ,

19 Table 1: empirical size (α = 0.1) in Example 1 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In Table 2: empirical power (α = 0.1) in Example 1 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In Table 3: empirical size (α = 0.1) in Example 2 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In Table 4: empirical power (α = 0.1) in Example 2 with 1000 repetitions and d = 2. n Vn, 2 R n, S n Q n, J n, I n Q n, Jn In

20 Table 5: empirical size (α = 0.1) in Example 3 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In Table 6: empirical power (α = 0.1) in Example 3 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In Table 7: empirical size (α = 0.1) in Example 4 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In Table 8: empirical power (α = 0.1) in Example 4 with 1000 repetitions and d = 3. n Q n Q n R n S n J n Jn I n In

21 Table 9: empirical size (α = 0.1) in Example 5 with 1000 repetitions and n = 100. d HL τ HL ρ HL τ n HL ρ n Q n Q n R n S n J n Jn I n In Table 10: empirical power (α = 0.1) in Example 5 with 1000 repetitions and n = 100. d HL τ HL ρ HL τ n HL ρ n Q n Q n R n S n J n Jn I n In Appendix Proofs of Theorem 1, 2, 3, 4, 5, 6, and Lemma 1, 2. Theorem 1 Proof. (i) 0 Q(X) <. (ii) Q(X) = 0 X 1,..., X d are mutually independent. (iii) Q(X) = E X X + E X X E X X E X X. Since w 1 (t) is a positive weight function, X 1,..., X d are mutually independent if and only if Q(X) = φ R p X (t) φ X(t) 2 w 1 (t) dt is equal to zero. By the boundedness property of characteristic functions and Fubini s theorem, we have 21

22 Mkt RF SMB RF Density Density Density Mkt RF vs. SMB Mkt RF vs. RF SMB vs. RF SMB RF RF Mkt RF Mkt RF SMB Figure 1: Three annual Fama/French factors between 1964 and 2015: Mkt-RF (excess return on the market), SMB (small minus big) and RF (risk-free return). The correlations are corr(mkt-rf, SMB) = 0.238, corr(mkt-rf, RF) = , and corr(smb, RF) = Red lines in the histograms are estimated kernel densities. φ X (t) φ X(t) 2 = φ X (t)φ X (t) + φ X(t)φ X(t) φ X (t)φ X(t) φ X(t)φ X (t) = [E i t,x ]E[e i t,x ] + E[e i t, X ]E[e i t, X ] E[e i t,x ]E[e i t, X ] E[e i t, X ]E[e i t,x ] = E[e i t,x X ] + E[e i t, X X ] E[e i t,x X ] E[e i t, X X ] = E(cos t, X X ) + E(cos t, X X ) + E(cos t, X X ) + E(cos t, X X ) = E(1 cos t, X X ) + E(1 cos t, X X ) E(1 cos t, X X ) E(1 cos t, X X ). Since E( X ) < implies E( X ) <, we have E( X + X ) <. Then the triangle inequality implies E X X, E X X, E X X, E X X <. Therefore, by Fubini s 22

23 theorem and Lemma 1, it follows that Q(X) = φ X (t) φ X(t) 2 w 1 (t) dt = E(1 cos t, X X ) w 1 (t) dt + E(1 cos t, X X ) w 1 (t) dt E(1 cos t, X X ) w 1 (t) dt E(1 cos t, X X ) w 1 (t) dt = E X X + E X X E X X E X X <. Finally, Q(X) 0 since the integrand φ X (t) φ X(t) 2 is non-negative. Lemma 1 Proof. After a simple calculation, we have φ n X (t) φñ X (t) 2 = φ n X (t)φn X (t) φn X (t)φñ X (t) φñ X (t)φn X (t) + φñ X (t)φñ X (t) = 1 n 2 n k,l=1 cos t, Xk X l 2 n d+1 n k,l 1,...,l d =1 cos t, Xk (X l 1 1,..., X l d d ) + 1 n 2d n k 1,...,k d,l 1,...,l d =1 cos t, (Xk 1 1,..., X k d d ) (Xl 1 1,..., X l d d ) + V = 1 n 2 n k,l=1 [1 cos t, Xk X l ] + 2 n d+1 n k,l 1,...,l d =1 [1 cos t, Xk (X l 1 1,..., X l d d ) ] 1 n 2d n k 1,...,k d,l 1,...,l d =1 [1 cos t, (Xk 1 1,..., X k d d ) (Xl 1 1,..., X l d d ) ] + V, where V is imaginary and thus 0 as the φ n X (t) φñ X (t) 2 is real. By Lemma 1 in Székely and Rizzo (2005) Q n (X) = φ n X (t) φñ X (t) 2 w 1 = 1 n 2 n k,l=1 Xk X l + 2 n d+1 n k,l 1,...,l d =1 Xk (X l 1 1,..., X l d d ) 1 n 2d n k 1,...,k d,l 1,...,l d =1 (Xk 1 1,..., X k d d ) (Xl 1 1,..., X l d d ). Lemma 2 Proof. After a simple calculation, we have φ n X (t) φn X (t) 2 = φ n X (t)φn X (t) φn X (t)φn (t) φn X (t)φn X (t) + φn X (t)φn X (t) = 1 n 2 n k,l=1 cos t, Xk X l 2 n 2 n k,l=1 cos t, Xk (X l 1,..., X l+d 1 d ) X 23

24 + 1 n n 2 k,l=1 cos t, (Xk 1,..., X k+d 1 d ) (X1, l..., X l+d 1 d ) + V = 1 n 2 n k,l=1 [1 cos t, Xk X l ] + 2 n 2 n k,l=1 [1 cos t, Xk (X l 1,..., X l+d 1 d ) ] 1 n n 2 k,l=1 [1 cos t, (Xk 1,..., X k+d 1 d ) (X1, l..., X l+d 1 d ) ] + V, where V is imaginary and thus 0 as the φ n X (t) φn X (t) 2 is real. By Lemma Lemma 1 in Székely and Rizzo (2005) Q n(x) = φ n X (t) φn X (t) 2 w 1 = 1 n 2 n k,l=1 Xk X l + 2 n 2 n k,l=1 Xk (X l 1,..., X l+d 1 d ) 1 n n 2 k,l=1 (Xk 1,..., X k+d 1 d ) (X1, l..., X l+d 1 d ). Theorem 2 Proof. We define Q n = φ n X(t) φ ñ X (t) 2 w 1 ξ n (t) 2 w 1 and Q n = φ n X(t) φ n X (t) 2 w 1 ξ n(t) 2 w 1. For 0 < δ < 1, define the region D(δ) = {t = (t 1,..., t d ) : δ t 2 = d t j 2 2/δ}, (6) j=1 and random variables Q n,δ = ξ n (t) 2 dw 1 and Q n,δ = ξn(t) 2 dw 1. D(δ) D(δ) For any fixed δ, the weight function w 1 (t) is bounded on D(δ). Hence Q n,δ is a combination of V -statistics of bounded random variables. Similar to Theorem 2 of Székely et al. (2007), it follows by the strong law of large numbers (SLLN) for V -statistics (Mises, 1947) 24

25 that almost surely lim Q n,δ = lim Q n,δ = Q,δ = φ X (t) φ X(t) 2 dw 1. n n D(δ) Clearly Q,δ Q as δ 0. Hence, Q n,δ Q a.s. and Q n,δ Q a.s. as δ 0, n. In order to show Q n Q a.s. and Q n Q a.s. as n, it remains to prove that almost surely lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup Q n,δ Q n = 0. n and We define a mixture of X and X as Y c = ( X 1,..., X c 1, X c +), c = 1,..., d 1. By the Cauchy Bunyakovsky inequality ξ n (t) 2 = φ n X (t) d j=1 φn X j (t j ) 2 = φ n X (t) d j=1 φn X j (t j ) d 2 c=1 ( c j=1 φn X j (t j )φ n X c + (t c +))+ d 2 c=1 ( c j=1 φn X j (t j )φ n X c + (t c +)) 2 [ φ n X (t) φn X 1 (t 1 )φ n X 1 + (t 1 +) + d 2 c=1 ( c j=1 φn X j (t j )φ n X c + (t c +)) ( c j=1 φn X j (t j )φ n X c+1 (t c+1 )φ n X (c+1) + (t (c+1) +)) ]2 = [ d 1 c=1 φn (X c,y c ) (t c, t c ) φ n X c (t c )φ n Y c (t c ) ] 2 (d 1) d 1 c=1 φn (X c,y c ) (t c, t c ) φ n X c (t c )φ n Y c (t c ) 2, ξn(t) 2 = 1 n n k=1 ei t,xk 1 n = 1 n n k=1 (ei t,xk d 1 = 1 n n k=1 (d 1) d 1 c=1 1 n (d 1) d 1 c=1 ( 1 n = (d 1) d 1 c=1 ( 1 n (d 1) d 2 c=2 n c=2 ei t,(xk 1,...,Xk+c 1 c d 1 c=1 (ei t,(xk 1,...,Xk+c 1 c n k=1 ei d j=1 t j,x k+j 1 j 2,X k c + ) + d 1 c=2 ei t,(xk 1,...,Xk+c 1 c,x k c + ) e i t,(xk 1,...,Xk+c c+1,xk (c+1) + ) ) 2,X k c + ) e i d n k=1 ei t (c+1),(x1 k,...,xk+c 1 c,x k (c+1) + ) (e i t c+1,xc+1 k e i t c+1,x k+c c+1 ) 2 j=1 t j,x k+j 1 n k=1 ei t (c+1),(x k 1,...,Xk+c 1 c,x k (c+1) + ) 2 1 n n k=1 ei t c+1,x k c+1 e i t c+1,x k+c c+1 2 ) n k=1 ei t c+1,x k c+1 e i t c+1,x k+c c+1 2 ) n k=1 ( ei tc,xk c φ Xc (t c ) 2 + φ Xc (t c ) e i tc,xk+c 1 c 2 ) = 4(d 1) d 1 n c=2 n k=1 ei tc,xk c φ Xc (t c ) j ) 2

26 By the inequality sa + (1 s)b a s b 1 s, 0 < s < 1, a, b > 0, we have t 1+p = ( t c 2 + t c 2 ) 1+p 2 ( 1+pc j c p j j c p j t 2+p c t 2+p c 2 ) 1+p 2 ( t c 2(1+pc) 2+p t c 2(1+ = t c 1+ +p 2+p c t c 1+pc 2+p + j c p j t c mc+pc t c m c+ j c p j, where 0 < m c < 1, 0 < m c < 1 and consequently w 1 (t) = 1 K(p,1) t 1+p j c p j ) 2+p ) 1+p 2 K(pc,mc)K( j c p j,m c ) 1 1 K(p,1) K(p c,m c) t c mc+pc K( j c p j,m c ) t c m c + j c p j C(p, p c, j c p j) 1 K(p c,m c) t c mc+pc 1 K( j c p j,m c ) t c m c + j c p j, where C(p, p c, j c p j) is a constant depending only on p, p c, j c p j. By the fact {R p \D(δ)} { t c 2, t c 2 < δ} { t c 2 > 1/δ} { t c 2 > 1/δ} and similar steps in Theorem 2 of Székely et al. (2007), almost surely and lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup n R p \D(δ) ξ n(t) 2 dw 1 (d 1) d 1 c=1 lim sup δ 0 lim sup n R p \D(δ) φn (X c,y c ) (t c, t c ) φ n X c (t c )φ n Y c (t c ) 2 dw 1 C(p, p c, j c p j)(d 1) d 1 c=1 lim sup δ 0 lim sup n R p \D(δ) φn (X c,y c ) (t c, t c ) = 0, φ n X c (t c )φ n Y c (t c ) 2 1 K(p c,m c) t c mc+pc 1 K( j c p j,m c ) t c m c + j c p j dt c dt c lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup n R p \D(δ) ξ n(t) 2 dw 1 4(d 1) d c=2 lim sup 1 δ 0 lim sup n n n k=1 R p \D(δ) ei tc,xk c φ Xc (t c ) 2 dw 1 C(p, p c, j c p j)4(d 1) d c=2 lim sup 1 δ 0 lim sup n n n k=1 R p \D(δ) ei tc,xk c φ Xc (t c ) 2 1 K(p c,m c) t c mc+pc 1 K( j c p j,m c ) t c m c + j c p j dt c dt c = 0. Therefore, almost surely lim sup δ 0 lim sup n Q n,δ Q n = lim sup δ 0 lim sup n Q n,δ Q n = 0. Theorem 3 Proof. (i) Under H 0 : 26

27 Let ζ(t) denote a complex-valued Gaussian processe with mean zero and covariance functions R(t, t 0 ) = d d φ Xj (t j t 0 j) + (d 1) φ Xj (t j )φ Xj (t 0 j ) j=1 j=1 d φ Xj (t j )φ Xj (t 0 j ). j=1 φ Xj (t j t 0 j) j j We define nq n = n φ n X(t) φ ñ X (t) 2 w 1 ζ n (t) 2 w 1. After a simple calculation, we have E[ζ n (t)] = E[ζ n(t)] = 0, E[ζ n (t)ζ n (t 0 )] = (1 1 n d 1 ) d j=1 φ X j (t j t 0 j) + (n 1 (n 1)d n d 1 ) d j=1 φ X j (t j )φ Xj (t 0 j ) (n 1)d 1 [ d n d 1 j=1 φ X j (t j t 0 j) j j φ X j (t j )φ Xj (t 0 j )] + o n(1) R(t, t 0 ) as n. In particular, E ζ n (t) 2 R(t, t) d as n. Thus, E ζ n (t) 2 d + 1 for enough large n. For 0 < δ < 1, define the region D(δ) as (6). Given ɛ > 0, we choose a partition {D l (δ)} N l=1 of D(δ) into N(ɛ) measurable sets with diameter at most ɛ, and suppress the notation of D(δ), D l (δ) as D, D l. Then we define two sequences of random variables for any fixed t l D l, l = 1,..., N Q n (δ) = N l=1 D l ζ n (t l ) 2 dw 1. For any fixed M > 0, let β(ɛ) = sup t,t 0 E ζ n (t) 2 ζ n (t 0 ) 2 where the supremum is taken over all t = (t 1,..., t d ) and t 0 = (t 0 1,..., t 0 d ) s.t. max{ t 2, t 0 2 } M and t t 0 2 = d j=1 t j t 0 j 2 ɛ 2. By the continuous mapping theorem and ζ n (t) ζ n (t 0 ) as ɛ 0, we 27

28 have ζ n (t) 2 ζ n (t 0 ) 2 as ɛ 0. By the dominated convergence theorem and E ζ n (t) 2 d+1 for enough large n, we have E ζ n (t) 2 ζ n (t 0 ) 2 0 as ɛ 0, which leads to β(ɛ) 0 as ɛ 0. As a result E ζ D n(t) 2 dw 1 Q n (δ) = E N l=1 ( ζ D l n (t) 2 ζ n (t l ) 2 ) dw 1 N l=1 E ζ D l n (t) 2 ζ n (t l ) 2 dw 1 β(ɛ) 1 dw D 1 0 as ɛ 0. By similar steps in Theorem 2, we have E D ζ n(t) 2 dw 1 ζ n 2 w 1 0 as δ 0 and E D ζ n(t) 2 dw 1 ζ n 2 w 1 0 as δ 0. Therefore E Q n (δ) ζ n 2 w 1 0 as ɛ, δ 0 and E Q n(δ) ζ n 2 w 1 0 as ɛ, δ 0. On the other hand, define two random variables for any fixed t l D l, l = 1,..., N Q(δ) = N l=1 D l ζ(t l ) 2 dw 1. Similarly, we have E Q(δ) ζ 2 w 1 0 as ɛ, δ 0. By the multivariate central limit theorem, delta method and continuous mapping theorem, we have Q n (δ) = N l=1 Therefore D l ζ n (t l ) 2 dw 1 D N l=1 ζ n 2 w 1 D ζ 2 w 1 as ɛ, δ 0, n, since {Q n (δ)} have the following properties (a) Q n (δ) converges in distribution to Q(δ) as n. (b) E Q n (δ) ζ n 2 w 1 0 as ɛ, δ 0. (c) E Q(δ) ζ 2 w 1 0 as ɛ, δ 0. D l ζ(t l ) 2 dw 1 = Q(δ) as n. 28

29 Analogous to ζ(t), ζ n (t), β(ɛ), Q(δ), Q n (δ) for Q n, we can define ζ (t), ζn(t), β (ɛ), Q (δ), Q n(δ) for Q n, and prove that ζn 2 w 1 D ζ 2 w 1 as ɛ, δ 0, n through the same derivations. The only differences are E[ζn(t)ζ n(t 0 )] = 2R(t, t 0 ) and E ζn(t) 2 = 2R(t, t) 2d + 1 for enough large n. (ii) Under H A : By Theorem 1 and 2, we have Q n Q > 0 a.s. as n. Therefore nq n a.s. as n. Similarly, we can prove that nq n a.s. as n through the same derivations. Theorem 4 Proof. (i) 0 R(X) <. (ii) 0 S(X) <. (iii) R(X) = d 1 c=1 V2 (X c, X c +) = 0 X 1,..., X d are mutually independent. (iv) S(X) = d c=1 V2 (X c, X c ) = 0 X 1,..., X d are mutually independent. Since E X <, we have 0 V 2 (X c, X c +) <, c = 1,..., d 1. Thus, 0 R(X) = d 1 c=1 V2 (X c, X c +) <. 1. Similarly, we have 0 S(X) = d c=1 V2 (X c, X c ) <. = If X 1,..., X d are mutually independent, then X c and X c + are independent, c = 1,..., d By Theorem 3 of Székely et al. (2007), V 2 (X c, X c +) = 0, c = 1,..., d 1. As a result, R(X) = 0. Similarly, we can prove that S(X) = 0, since X c and X c are independent, c = 1,..., d. 29

30 = If R(X) = 0, then V 2 (X c, X c +) = 0, c = 1,..., d 1. By Theorem 3 of Székely et al. (2007), X c and X c + are independent, c = 1,..., d 1. Thus, For all t R p, we have φ (Xj,X j + )(t j, t j +) φ Xj (X j )φ Xj + (t j +) = 0, where φ Xj and φ Xj+ denote the marginal and φ (Xj,X j + ) denotes the joint characteristic function of X j and X j + respectively, j = 1,..., d. For all t R p, we have φ X (t) d j=1 φ X j (t j ) = φ X (t) d j=1 φ X j (t j ) d 2 c=1 ( c j=1 φ X j (t j )φ Xc + (t c +)) + d 2 c=1 ( c j=1 φ X j (t j )φ Xc + (t c +)) φ X (t) φ X1 (t 1 )φ X1 + (t 1 +) + d 2 c=1 c j=1 φ X j (t j )φ Xc + (t c +) c j=1 φ X j (t j )φ Xc+1 (t c+1 )φ X(c+1) + (t (c+1) +) φ X (t) φ X1 (t 1 )φ X1 + (t 1 +) + d 2 c=1 c j=1 φ X j (t j ) φ Xc + (t c +) φ Xc+1 (t c+1 )φ X(c+1) + (t (c+1) +) φ X (t) φ X1 (t 1 )φ X1 + (t 1 +) + d 2 c=1 φ X c + (t c +) φ Xc+1 (t c+1 )φ X(c+1) + (t (c+1) +) = d 1 c=1 φ (X c,x c + )(t c, t c +) φ Xc (t c )φ Xc + (t c +) = 0. Therefore, for all t R p, we have φ X (t) d j=1 φ X j (t j ) = 0, which implies that X 1,..., X d are mutually independent. Similarly, we can prove that S(X) = 0 implies that X 1,..., X d are mutually independent, since X c and X c are independent implies that X c and X c + are independent. 30

31 Theorem 5 Proof. By Theorem 2 of Székely et al. (2007) lim n V 2 n(x c, X c +) = V 2 (X c, X c +), c = 1,..., d 1, lim n V 2 n(x c, X c ) = V 2 (X c, X c ), c = 1,..., d. Therefore, the limit of sum converges to the sum of limit as R n (X) a.s. R(X) and S n(x) a.s. S(X). n n Theorem 6 Proof. (i) Under H 0 : We define d 1 d 1 nr n (X) = n Vn(X 2 c, X c +) ζn(t c (c 1) +) 2 w 0, c=1 which is the sum corresponding to the pairs {X d 1, X d }, {X d 2, (X d 1, X d )}, {X d 3, (X d 2, X d 1, X d )},..., {X 1, (X 2,..., X d )}. Any two of them can be reorganized as {X 1, X 2 } and {X 4, (X 1, X 2, X 3 )} where X 3 could be empty. Without loss of generality, next we will show φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 ) and φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 ) are uncorrelated. Then it follows that ζ c n(t (c 1) +), c = 1,..., d 1 are uncorrelated. After a simple calculation, we have E[φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 )] = E[φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 )] = 0, E[φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 )][φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 )] = 0. As a result Cov(φ n (X 1,X 2 ) (t 1, t 2 ) φ n X 1 (t 1 )φ n X 2 (t 2 ), φ n (X 1,X 2,X 3,X 4 ) (s 1, s 2 ) φ n (X 1,X 2,X 3 ) (s 1)φ n X 4 (s 2 )) = 0. For δ > 0, define the region D c (δ) = {t (c 1) + = (t c, t c +) = (t c,..., t d ) : δ t (c 1) + 2 = d j=c t j 2 2/δ}. Given ɛ > 0, we choose a partition {D l c} Nc l=1 of D c(δ) into N c (ɛ) measur- c=1 31

32 able sets with diameter at most ɛ, and define a sequence of random variables for any fixed t l (c 1) Dc, l l = 1,..., N + c as N c Q c n(δ) = ζn(t c l (c 1) +) 2 dw 0. l=1 Dc l Let ζ c (t (c 1) +) = ζ c (t c, t c +) denote a complex-valued Gaussian process with mean zero and covariance function Rc(t ζ (c 1) +, t 0 (c 1) ) = [φ + Xc (t c t 0 c) φ Xc (t c )φ Xc (t 0 c)][φ Xc + (t c + t 0 c ) + φ Xc + (t c +)φ Xc + (t 0 c )]. + By the multivariate central limit theorem, delta method and continuous mapping theorem, we have Q 1 n(δ) N 1 l=1 ζ 1 (t l ) 2 dw D1 l 0. Q d 1 n (δ) N d 1 D l=1 ζ d 1 (t l Dd 1 l (d 2) ) 2 dw + 0 as n with asymptotic mutual independence. Nd 1 l=1 N1 l=1 ζ 1 (t l ) 2 dw D1 l 0 D l d 1. ζ d 1 (t l (d 2) + ) 2 dw 0 Thus, Q c n(δ), c = 1,..., d 1 are asymptotically mutually independent. By similar steps in Theorem 5 of Székely et al. (2007), we have E Q c n(δ) ζ c n(t (c 1) +) 2 w 0 0, c = 1,..., d 1 as ɛ, δ 0. Hence ζn(t) 1 2 w 0 Q 1 n(δ) 0. P. as ɛ, δ 0. ζn d 1 (t (d 2) +) 2 w 0 Q d 1 n (δ) 0 By the multivariate Slutsky s theorem, we have ζn(t) 1 2 w 0 ζ 1 (t) 2 w 0. D. ζn d 1 (t (d 2) +) 2 w 0 ζ d 1 (t (d 2) +) 2 w 0 as ɛ, δ 0, n with asymptotic mutual independence. Therefore 32

33 ζn(t c (c 1) +) 2 w 0, c = 1,..., d 1 are asymptotically mutually independent. d 1 c=1 ζc n(t (c 1) +) 2 w 0 d 1 D c=1 ζc (t (c 1) +) 2 w 0 as n. Analogous to ζ c n(t (c 1) +), ζ c (t (c 1) +), R ζ c(t (c 1) +, t 0 (c 1) + ) for R n (X), we can define η c n(t), η c (t), R η c(t, t 0 ) for S n (X), and prove that η c n(t) 2 w 0, c = 1,..., d are asymptotically mutually independent, and d c=1 ηc n(t) 2 w 0 D d c=1 η c(t) 2 w 0 as n through the same derivations. The only differences are that we will show φ n (X 1,X 2,X 3 ) (t 1, t 2, t 3 ) φ n X 1 (t 1 )φ n (X 2,X 3 ) (t 2, t 3 ) and φ n (X 1,X 2,X 3 ) (s 1, s 2, s 3 ) φ n X 2 (s 2 )φ n (X 1,X 3 ) (s 1, s 3 ) are asymptotically uncorrelated. (ii) Under H A : By Theorem 4, we have R n R > 0 a.s. as n. Therefore nr n a.s. as n. Similarly, we can prove that ns n a.s. as n through the same derivations. Remark. Under H A, ζ c n(t (c 1) +), c = 1,..., d 1 are not asymptotically uncorrelated, and η c n(t), c = 1,..., d are not asymptotically uncorrelated. Complete Measure of Mutual Dependence Using Weight Function w 2 Except that U n (X) requires the additional d-th moment condition E X 1... X d < to be simplified, U(X) is in an extremely complicated form. Even when d = 3, U(X) already has 33

34 12 different terms as follows. U(X) = φ X (t) φ X(t) 2 w 2 = E X 1 X 1 X 2 X 2 X 3 X 3 + 2E X 1 X 1 X 2 X 2 X 3 X 3 E X 1 X 1 E X 2 X 2 E X 3 X 3 + E X 1 X 1 X 2 X 2 + E X 1 X 1 X 3 X 3 + E X 2 X 2 X 3 X 3 2E X 1 X 1 X 2 X 2 2E X 1 X 1 X 3 X 3 2E X 2 X 2 X 3 X 3 + E X 1 X 1 E X 2 X 2 + E X 1 X 1 E X 3 X 3 + E X 2 X 2 E X 3 X 3 = E X 1 X 1 X 2 X 2 X 3 X 3 + 2E X 1 X 1 X 2 X 2 X 3 X 3 E X 1 X 1 E X 2 X 2 E X 3 X 3 + E X i X i X j X j 2 1 i<j 3 1 i<j i<j 3 E X i X i X j X j E X i X i E X j X j. In general, the number of different terms in U(X) grows exponentially as d increases. Basically, we will see all combinations of all components in all moments as expectations. 34

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Bivariate Paired Numerical Data

Bivariate Paired Numerical Data Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

arxiv: v1 [math.st] 24 Aug 2017

arxiv: v1 [math.st] 24 Aug 2017 Submitted to the Annals of Statistics MULTIVARIATE DEPENDENCY MEASURE BASED ON COPULA AND GAUSSIAN KERNEL arxiv:1708.07485v1 math.st] 24 Aug 2017 By AngshumanRoy and Alok Goswami and C. A. Murthy We propose

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang

Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang Supplementary Materials: Martingale difference correlation and its use in high dimensional variable screening by Xiaofeng Shao and Jingsi Zhang The supplementary material contains some additional simulation

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Solving Classification Problems By Knowledge Sets

Solving Classification Problems By Knowledge Sets Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose

More information

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data David S. Matteson Department of Statistical Science Cornell University matteson@cornell.edu www.stat.cornell.edu/~matteson

More information

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION Gábor J. Székely Bowling Green State University Maria L. Rizzo Ohio University October 30, 2004 Abstract We propose a new nonparametric test for equality

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Package dhsic. R topics documented: July 27, 2017

Package dhsic. R topics documented: July 27, 2017 Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

A consistent test of independence based on a sign covariance related to Kendall s tau

A consistent test of independence based on a sign covariance related to Kendall s tau Contents 1 Introduction.......................................... 1 2 Definition of τ and statement of its properties...................... 3 3 Comparison to other tests..................................

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Multi Level Risk Aggregation

Multi Level Risk Aggregation Working Paper Series Working Paper No. 6 Multi Level Risk Aggregation Damir Filipović First version: February 2008 Current version: April 2009 Multi-Level Risk Aggregation Damir Filipović Vienna Institute

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Causal Inference on Discrete Data via Estimating Distance Correlations

Causal Inference on Discrete Data via Estimating Distance Correlations ARTICLE CommunicatedbyAapoHyvärinen Causal Inference on Discrete Data via Estimating Distance Correlations Furui Liu frliu@cse.cuhk.edu.hk Laiwan Chan lwchan@cse.euhk.edu.hk Department of Computer Science

More information

arxiv: v1 [cs.lg] 22 Jun 2009

arxiv: v1 [cs.lg] 22 Jun 2009 Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Layered Synthesis of Latent Gaussian Trees

Layered Synthesis of Latent Gaussian Trees Layered Synthesis of Latent Gaussian Trees Ali Moharrer, Shuangqing Wei, George T. Amariucai, and Jing Deng arxiv:1608.04484v2 [cs.it] 7 May 2017 Abstract A new synthesis scheme is proposed to generate

More information

2 Chance constrained programming

2 Chance constrained programming 2 Chance constrained programming In this Chapter we give a brief introduction to chance constrained programming. The goals are to motivate the subject and to give the reader an idea of the related difficulties.

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

On detection of unit roots generalizing the classic Dickey-Fuller approach

On detection of unit roots generalizing the classic Dickey-Fuller approach On detection of unit roots generalizing the classic Dickey-Fuller approach A. Steland Ruhr-Universität Bochum Fakultät für Mathematik Building NA 3/71 D-4478 Bochum, Germany February 18, 25 1 Abstract

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man

More information

Optimal detection of weak positive dependence between two mixture distributions

Optimal detection of weak positive dependence between two mixture distributions Optimal detection of weak positive dependence between two mixture distributions Sihai Dave Zhao Department of Statistics, University of Illinois at Urbana-Champaign and T. Tony Cai Department of Statistics,

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

A Note on Hilbertian Elliptically Contoured Distributions

A Note on Hilbertian Elliptically Contoured Distributions A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution

More information

Minimum distance tests and estimates based on ranks

Minimum distance tests and estimates based on ranks Minimum distance tests and estimates based on ranks Authors: Radim Navrátil Department of Mathematics and Statistics, Masaryk University Brno, Czech Republic (navratil@math.muni.cz) Abstract: It is well

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Copula modeling for discrete data

Copula modeling for discrete data Copula modeling for discrete data Christian Genest & Johanna G. Nešlehová in collaboration with Bruno Rémillard McGill University and HEC Montréal ROBUST, September 11, 2016 Main question Suppose (X 1,

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications

Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications Random Correlation Matrices, Top Eigenvalue with Heavy Tails and Financial Applications J.P Bouchaud with: M. Potters, G. Biroli, L. Laloux, M. A. Miceli http://www.cfm.fr Portfolio theory: Basics Portfolio

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES Hisashi Tanizaki Graduate School of Economics, Kobe University, Kobe 657-8501, Japan e-mail: tanizaki@kobe-u.ac.jp Abstract:

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Discussion of Hypothesis testing by convex optimization

Discussion of Hypothesis testing by convex optimization Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Optimization and Testing in Linear. Non-Gaussian Component Analysis

Optimization and Testing in Linear. Non-Gaussian Component Analysis Optimization and Testing in Linear Non-Gaussian Component Analysis arxiv:1712.08837v2 [stat.me] 29 Dec 2017 Ze Jin, Benjamin B. Risk, David S. Matteson May 13, 2018 Abstract Independent component analysis

More information

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek J. Korean Math. Soc. 41 (2004), No. 5, pp. 883 894 CONVERGENCE OF WEIGHTED SUMS FOR DEPENDENT RANDOM VARIABLES Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek Abstract. We discuss in this paper the strong

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

Experience Rating in General Insurance by Credibility Estimation

Experience Rating in General Insurance by Credibility Estimation Experience Rating in General Insurance by Credibility Estimation Xian Zhou Department of Applied Finance and Actuarial Studies Macquarie University, Sydney, Australia Abstract This work presents a new

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Recent Developments in Post-Selection Inference

Recent Developments in Post-Selection Inference Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

A measure of radial asymmetry for bivariate copulas based on Sobolev norm A measure of radial asymmetry for bivariate copulas based on Sobolev norm Ahmad Alikhani-Vafa Ali Dolati Abstract The modified Sobolev norm is used to construct an index for measuring the degree of radial

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

LOCAL TIMES OF RANKED CONTINUOUS SEMIMARTINGALES

LOCAL TIMES OF RANKED CONTINUOUS SEMIMARTINGALES LOCAL TIMES OF RANKED CONTINUOUS SEMIMARTINGALES ADRIAN D. BANNER INTECH One Palmer Square Princeton, NJ 8542, USA adrian@enhanced.com RAOUF GHOMRASNI Fakultät II, Institut für Mathematik Sekr. MA 7-5,

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis : Asymptotic Inference and Empirical Analysis Qian Li Department of Mathematics and Statistics University of Missouri-Kansas City ql35d@mail.umkc.edu October 29, 2015 Outline of Topics Introduction GARCH

More information