Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12

Size: px

Start display at page:

Download "Partitioned Covariance Matrices and Partial Correlations. Proposition 1 Let the (p + q) (p + q) covariance matrix C > 0 be partitioned as C = C11 C 12"

Solomon Payne
5 years ago
Views:

1 Partitioned Covariance Matrices and Partial Correlations Proposition 1 Let the (p + q (p + q covariance matrix C > 0 be partitioned as ( C11 C C = 12 C 21 C 22 Then the symmetric matrix C > 0 has the following partitioned form: ( C C C ( 12C22 C = where C22 C 21C C22 + C 22 C 21C C 12C22 C = C 11 C 12 C 22 C 21 Further, C > 0 is equivalent to the fact that both C 22 and C are regular (invertible matrices Proof Introduce the auxiliary matrix D = ( Ip C 12 C 22 O I q Note that D = 1, so D is regular With it, ( DCD T C O =, O C 22 from where and C = C C 22 ( C C = D T O that implies the required result O C 22 D Theorem 1 Let (X T 1, X T 2 T N p+q (µ, C be a random vector, where the expectation µ and the covariance matrix C are partitioned (with block sizes p and q in the following way: µ = ( µ1 µ 2, C = ( C11 C 12 C 21 C 22 Here C 11, C 22 are covariance matrices of X 1 and X 2, whereas C 12 = C T 21 is the cross-covariance matrix Then the conditional distribution of the random vector X 1 conditioned on X 2 = x 2 is N p (C 12 C 22 (x 2 µ 2 + µ 1, C distribution Proof Observe that it suffices to prove for the (X T 1, X T 2 T N p+q (0, C case The conditional distribution of X 1 conditioned on X 2 = x 2 is determined by the conditional pdf as follows: f(x 1 x 2 = (2π p+q 2 C 1 2 exp[ 1 2 (xt 1, x T 2 C (x T 1, x T 2 T ] (2π q 2 C exp[ 1 2 xt 2 C 22 x 2] 1 = C C C 12C22 C22 C 21C C22 C 21C11 C 12C22,

2 By Proposition 1, after writing C in the block-matrix form, and writing the quadratic form in the numerator as the sum of four terms, this simplifies to f(x 1 x 2 = (2π p 2 C 1 2 exp[ 1 2 (x 1 C 12 C 22 x 2 T C (x 1 C 12 C 22 x 2 which is a p-dimensional Gaussian density with the stated parameters Note that the conditional covariance matrix C does not depend on x 2 of the condition Further, for the conditional expectation, which is the expectation of the conditional distribution, we get that Therefore, E(X 1 X 2 = x 2 = C 12 C 22 (x 2 µ 2 + µ 1 E(X 1 X 2 = C 12 C 22 (X 2 µ 2 + µ 1 which is a linear function of the coordinates of X 2 In the p = q = 1 case, it is called regression line, while in the q = 1, p > 1 case, regression plane We will not deal with the q > 1, p > 1 case, which is the topic of the Canonical Correlation Analysis (CCA, Partial Least Squares Regression (PLS and the Structural Equation Modeling (SEM Summarizing, in case of the multidimensional Gaussian distribution, the regression functions are linear functions of the variables in the condition, which fact has important consequences in the multivariate statistical analysis Remark 1 Assume that E(X = 0 Then with the notation ε = X 1 E(X 1 X 2 = X 1 C 12 C22 X 2 we can decompose X 1 as the sum of E(X 1 X 2 and the error term ε By construction, E(ε = 0 and E(C 12 C22 X 2ε T = O Therefore, the covariance matrix of X 1 is decomposed as C 11 = C 12 C 22 C 21 + C, or else, the formula for total variances is applicable: Var(X 1 = Var(E(X 1 X 2 + E(Var(X 1 X 2 Theorem 2 Let X = (X 1,, X m T N m (0, C be a random vector, and let Γ := {1,, m} denote the index set of the variables, m 3 Assume that K = (k ij = C > 0 Then r XiX j X Γ\{i,j} = k ij kii k jj i j, where r XiX j X Γ\{i,j} denotes the partial correlation coefficient between X i and X j after eliminating the effect of the remaining variables X Γ\{i,j} Further, k ii = (Var(X i X Γ\{i}, i = 1, m is the reciprocal of the conditional variance of X i conditioned on the other variables X Γ\{i} Proof Without loss of generality we prove that r X1X 2 X Γ\{1,2} = k 12 k11 k 22 2

3 We use the result of Proposition 1 with p = 2, q = m 2, and partition C and K as in Figure (a and (b Then regressing X 1 with X Γ\{1,2} we get that and the error term is E(X 1 X Γ\{1,2} = c T 1 C 22 X Γ\{1,2} ε 1 = X 1 c T 1 C 22 X Γ\{1,2,} Likewise, regressing X 2 with X Γ\{1,2} we get that and the error term is By definition, E(X 2 X Γ\{1,2} = c T 2 C 22 X Γ\{1,2} ε 2 = X 2 c T 2 C 22 X Γ\{1,2} r X1X 2 X Γ\{1,2} = Corr(ε 1, ε 2 To find this correlation, we find the covariance and the variances of ε 1 and ε 2 As E(ε 1 = 0 and E(ε 2 = 0, with the notation of Figures (a and (b, Cov(ε 1, ε 2 = E(ε 1 ε T 2 = E(X 1 X T 2 c T 1 C 22 E(X Γ\{1,2}X T 2 E(X 1 X T Γ\{1,2} C 22 c 2 + c T 1 C 22 E(X Γ\{1,2}X T Γ\{1,2} C 22 c 2 = c 12 c T 1 C 22 c 2, where we used that E(X Γ\{1,2} X T Γ\{1,2} = C 22 Likewise, and Var(ε 1 = E(ε 1 ε T 1 = c 11 c T 1 C 22 c 1 Var(ε 2 = E(ε 2 ε T 2 = c 22 c T 2 C 22 c 2 Putting these together, and using that, by Theorem 2, the upper left 2 2 block of K is: K 11 = C, with the notation of Figure (c we get that r X1X 2 X Γ\{1,2} = b ad = b ad b 2 a d ad b 2 ad b 2 k 12 = k11 k 22 ( k11 k In the last equality we utilized that K 11 = 12 and k k 21 k 12 = k To prove the statement for k ii, it suffices to prove it for k 11 By Theorem 1, the conditional distribution of X 1 conditioned on X Γ\{1} is univariate N 1 ( c C X Γ\{1}, C distribution, where Var(X 1 X Γ\{1} = C = c 11 c T 12 C 22 c 12 = k 11 if we apply the result of Proposition 1 with p = 1, q = m 1, see Figure (d From Proposition 1 is also clear that C > 0 implies that k 11 0 This completes the proof 3

4 Definition 1 Assume X N m (0, C with C > 0 Consider the regression plane E(X 1 X Γ\{1} = x Γ\{1} = β 1j Γ\{1} x j, where x j s are the coordinates of x Γ\{1} Then we call the coefficient β 1j Γ\{1} jth partial regression coefficient for j Γ \ {1} The definition naturally extends to β ij Γ\{i} for j Γ \ {i} Theorem 3 β 1j Γ\{1} = k1j k 11 for j Γ \ {1} To prove the theorem, first we need a lemma Lemma 1 C 12 = O is equivalent to K 12 = O and K 11 K 12 = C 12 C 22 Proof of the Lemma By Proposition 1, an easy calculation shows that K11 K 12 = C K 12 = C ( C C 12C22 = C 12C22 that finishes the proof Proof of Theorem 3 With the help of Figure (d, the equation of the regression plane is E(X 1 X Γ\{1} = x Γ\{1} = c T 1 = C 22 x Γ\{1} = ( k11 K 12 j x j = ( c T 1 k 1j k 11 x j, C 22 jx j where in the last line we used the Lemma This finishes the proof Corollary 1 An important consequence of Theorem 3 is that β ij Γ\{i} = k ij kjj = r XiX k j X Γ\{i,j}, for j i ii k ii So only the variables X j s whose partial correlation with X i (after elimination the effect of the remaining variables is 0, enter into the regression of X i with the other variables We sometimes say that the partial regresion coeficient β ij Γ\{i} shows the change of X i under a unit change of X j keeping the other variables fixed, in latin wirds, ceterum paribus If we form a graph G on the vertex-set Γ and i j k ij 0, i j, then only the variables corresponding to vertices connected to vertex i enter into the regression of X i with the other variables This is called Gaussian graphical model and it is the base of the Path Analysis If the Gaussian graphical model is decomposable (see Graphical models in the Information Theory and Statistics course, then the maximal cliques, together with their separators (with multiplicities, form a so-called junction tree 4

5 structure Denote C the set of the maximal cliques and S the set of their separators in G Then the ML estimator of K can be calculated based on the S matrices (see the ML estimators of the multivariate normal parameters Let n be the sample size for the underlying m-variate normal distribution, Γ = {1,, m}, and assume that m < n For the maximal clique C C, let [S C ] Γ denote n times the empirical covariance matrix corresponding to the variables {X i : i C} complemented with zero entries to have an m m (symmetric, positive semidefinite matrix Likewise, for the separator S S, let [S S ] Γ denote n times the empirical covariance matrix corresponding to the variables {X i : i S} complemented with zero entries to have an m m (symmetric, positive semidefinite matrix Then the ML estimator of the mean vector is the sample average (as usual, while the ML estimator of the concentration matrix is } further, ˆK = n { [S C C C ˆK = n m ]Γ S S [S S ]Γ S S S S C C S C ; Testing hypothesis about partial correlations For i j we want to test H 0 : r XiX j X Γ\{i,j} = 0, ie, that X 1 and X 2 are contitionally independent conditioned on the remaining variables Equivalently, H 0 means that there is no edge in G between vertices i and j, or equivalently, β ij Γ\{i} = 0, β ji Γ\{j} = 0, or simply, k ij = k ji = 0 (C > 0 should be assumed To test H 0 in some form, several exact tests are known that are usually based on likelihood ratio tests or on Basu s theorem (see below The following test uses the empirical partial correlation coefficient, denoted by ˆr XiX j X Γ\{i,j}, and the following statistic based on it: B = 1 (ˆr XiX j X Γ\{i,j} 2 = S Γ\{i,j} S Γ S Γ\{i} S Γ\{j} It can be proven that under H 0, the test statistic t = 1 n m B 1 = ˆr XiX j X Γ\{i,j} n m 1 (ˆr XiX j X 2 Γ\{i,j} is distributed as Student s t with n m degrees of freedom Therefore, we reject H 0 for large values of t Theorem 4 (Basu If T = T (X is a sufficient and complete statistic with respect to the family P of distributions, and the distribution of Y = s(x does not depend on P P, then T and Y are independent 5

6 Figure 1: Figures about convenient block partitions of C and K 6

Multivariate Regression

Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the