A Characterization of the Normal Conditional Distributions MATSUNO 79 Therefore, the function ( ) = G( : a/(1 b2)) = N(0, a/(1 b2)) is a solu- tion for the integral equation (10). The constant times of ( ) as well as ( ) = 0 can also be its solutions, though they are not probability density functions. 4 4.1 Consistent Bivariate Distribution Derivation of the marginal distribution We obtain the following results from Lemma 3. Theorem 4 The marginal distribution ( ) consistent with a pair of conditional distributions (6) and (7), where 0 < 1 β 2 δ 2, is Proof. Assuming = 0, = 0 without loss of generality, we consider the case of (6) and (7). Replacing conditional distribution β2 22 of Lemma 1 by, we get a = N(bw, a), where b = β δ and a = > 0. Regarding this function 11 + as the kernel H(, w), we obtain the integral equation (10). Then, under the condition 0 <1 b2 = 1 β 2 δ 2, we get its solution ( ) = N(0, a/(1 b2)) from Lemma 3. The variance of this solution distribution is given as where If 0, 0, then we can consider the distributions of = and = which have zero means, and then we go back to the zero mean case mentioned above. A similar result follows for the marginal distribution h( ) of Y: Theorem 5 The marginal distribution h( ) consistent with a pair of conditional distributions (6) and (7), where 0 < 1 β 2 δ 2, is
80 4.2 Derivation of the joint distribution Theorem 6 The joint distribution consistent with a pair of conditional distributions (6) and (7), where 0 < 1-β 2 δ 2, is Proof. We can assume = 0, = 0 without loss of generality as in Proof of Theorem 4. If we consider the case of (6) and (7), then Theorem 4 shows that the consistent marginal distribution ( ) is (12) ( ) = N(, /θ). The joint distribution is given by the product of (12) and the conditional distribution (7). So we have After rearranging this, we get where φ = δ 2 11 + 22 and = δ( / ) 1/2. Under the assumption 0 < 1- β 2 δ 2, the function (15) is the density function of the joint distribution (14) with = 0, = 0. We also get the following result in a similar way. Theorem 7 The joint distribution consistent with a pair of conditional distributions (6) and (7), where 0 < 1-β 2 δ 2, is
A Characterization of the Normal Conditional Distributions(MATSUNO) 81 where the correlation coefficient is = β( / ) 1/2. So far, we have obtained two families of joint distributions consistent with (6) and (7), that is (15) and (16). The regression coefficients of (15) are And those of (16) are The coefficient (17) may be or may not be equal β. The coefficient (20) may be or may not be equal δ. In other words, the four parameters (β,, 11, 22) of (6) and (7) have much " degrees of freedom to determine uniquely the parameters of the joint distribution. So that, we have obtained two families of joint distributions. To make the two families of joint distributions of Theorems 6 and 7 coincide, we introduce another restriction in addition to the one 0 < 1 -β 2 δ 2 already imposed. Theorem 8 The joint distribution consistent with a pair of conditional distributions (6) and (7) under the assumptions
82 is N(, ), where Proof. Under the assumptions (21), Theorems 6 and 7 hold. The further assumption (22) makes the variance matrices of (14) and (16) of Theorems have the common values This can be rearranged to get It would be possible to show that provided the joint distribution is N(, ) with (23) and (24), the conditional distributions are (6) and (7). Theorem 8 gives the sufficient condition for that the distributions (6) and (7) are consistent with the bivariate normal distribution N(, ) defined by (23) and (24). Now, we summarize the discussion for the bivariate distributions: (i) If the conditional distributions (6) and (7) are derived from the
A Characterization of the Normal Conditional Distributions(MATSUNO) 83 joint distribution N(, ) with (23) and (24), then the conditions 1 > βδ 0 and δ 11 = β 22 hold. (ii) If the conditions 0 < 1-β 2 δ 2 and δ 11 = β 22 hold, then the conditional distributions (6) and (7) are derived from the joint distribution N(, ) with (23) and (24). (iii) The condition 1 > βδ 0 implies 0 < 1-β 2 δ 2, but not the converse. 5 Multivariate Normal Distribution We here turn to a general multivariate distribution case and consider an s-dimension random vector Z whose probability density function is That is, Z is assumed to be distributed as N(, ) with a mean vector every element of which lies in the interval (-, + ), and with a positive definite (symmetric) covariance matrix. According to the partition of Z as Z' = [ X', Y' ] = [(1 p), (1 q)], we partition the parameters, where 12 = ' 21. Letting be the conditional density function of X given Y =, and be the one of Y given X =, we have 2) 2) See, for instance, T. W. Anderson(1958) : An Introduction to Mulitivariate Analysis, New York: John Wiley and Sons.
84 Letting the regression coefficients B = and =, we have a constraint on the coefficients, 5.1 The problem Let us now consider random vectors X = ( p 1) and Y = (q 1). Without loss of generality, it is assumed that p q. Suppose in general that the normal conditional distributions of X and Y are, respectively, The problem now is to find the p + q dimension joint distribution or their joint density function and their marginal density functions ( ) and h( ). In particular, we try to find condition under which the joint distribution is multivariate normal. In a similar way to get (10), we have an integral equation where with d denoting a q-dimensional integration. Now, the problem is to find first a solution ( ) for the integral equation (27), and second a joint distribution, which is given as a product of this ( ) and the conditional distribution (26),. It will be seen that the problem in a general multivariate case will need cumbersome calculations. The preparation for that matter is the next section.
A Characterization of the Normal Conditional Distributions(MATSUNO) 85 6 Preparation 2 Some of the propositions given here may be fairly well known so that we don t provide proofs for them. 6.1 Matrix algebra Lemma 9 For two non-singular matrices A = (m m) and C = (n n), and two rectangular matrices B = (m n) and D = (n m), define then we have We will make use of the following propositions concerning characteristic roots of matrices. The characteristic roots in the propositions can be multiple roots and/or zeros. Lemma 10 (a) If the characteristic roots of a square matrix A= (n n) are λ 1,..., λ n, then the characteristic roots of A' are λ 1,..., λ n. (b) For two rectartgular matrices A = (m n) and B = (n m), where m n, let the characteristic roots of AB = (m m) be λ 1,..., λ n (including multiple roots), then the characteristic roots of BA = (n n) are λ 1,..., λ n and n-m zeros. Proof. (a) See Dhrymes (1978, p49, Proposition 42, (a)). (b) See Dhrymes (1978, p.51, Corollary 5). 3)
86 Lemma 11 For the regression coefficients B = (p q) and = (q p) of the conditional distributions (25) and (26), let the characteristic roots of the product B be λ 1,..., λ n (including multiple roots), then we have: (a) The characteristic roots of 'B' = (p p) are λ 1,..., λ p. (b) The characteristic roots of B = (q q) are λ 1,..., λ p and q-p zeros. (c) The characteristic roots of B' ' = (q q) are λ 1,..., λ p and q-p zeros. Proof. (a), (b) and (c) are applications of Lemma 10. 6.2 Matrix equation It will be shown in Section 7 that the problem turns out to be solving an integral equation, and to be solving a matrix equation. We here deal with the matrix equation in advance. The matrices appearing in this subsection are assumed to be transformed into a diagonalized form. Section 8 discusses more general cases where the matrices are transformed into a block diagonalized form. Now, let us consider two square matrices U = (m m) and V = (n n), and a rectangular matrix W = (m n). We assume that U and V can be diagonalized (real and symmetric, for example), and that the characteristic roots of U be λ 1,..., λ m (distinct roots) and those of V be v 1,..., v n (distinct roots). Then, we get the following proposition concerning a matrix equation where X = (m n) is an unknown matrix. Lemma 12 Concerning the equation (28), we haυe the propositions (a), (b), (c) and (d) below: (a) Under the assumption 3) P. J. Dhrymes(1978) : Mathematics for Econometrics, New York: Springer Verlag.