On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case BY RAJESHWARI MAJUMDAR* AND SUMAN MAJUMDAR

Size: px

Start display at page:

Download "On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case BY RAJESHWARI MAJUMDAR* AND SUMAN MAJUMDAR"

Oliver Russell
6 years ago
Views:

1 On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case BY RAJESHWARI MAJUMDAR* AND SUMAN MAJUMDAR We show that the orthogonal projection operator onto the range of the adjoint of a linear operator X can be represented as YX, where Y is an invertible linear operator. Given a Normal random vector ] and a linear operator X, we use this representation to obtain a linear operator Xs such that X] s is independent of X] and ] X] s is an affine function of X]. We then use this decomposition to prove that the conditional distribution of a Normal random vector ] given g ], where g is a linear transformation, is again a multivariate Normal distribution. This result is equivalent to the well-known result that given a 5-dimensional component of a -dimensional Normal random vector, where 5, the conditional distribution of the remaining 5-dimensional component is a 5-dimensional multivariate Normal distribution, and sets the stage for approximating the conditional distribution of ] given 1 ], where 1 is a continuously differentiable vector field. 1. Introduction. What can we ascertain about the conditional distribution of a multivariate Normal random vector ] d given 1], where 1Àd Èd 7 is a measurable function? Clearly, given a particular functional form of 1, the problem is a very specific one, and depending on the functional form, may or may not have a closed form solution. Our objective is to derive an approximation to the conditional distribution in question based on some regularity properties of 1. Specifically, in this paper we find the conditional distribution when 1 is a linear transformation, and in a companion paper use that to derive the desired approximation when 1 is a continuously differentiable ( G " ) vector field by exploiting the local linearity of 1. Before proceeding further, we present a brief review of what is known about the conditional distribution when 1 is a linear transformation, to be denoted by g in what follows. Casella and Berger (2002) define the bivariate Normal distribution by specifying the joint density in terms of the five parameters of the distribution - the means." and., the variances 5" and 5, and the correlation 3 [Definition ]. They calculate the marginal density of ] " and note that (using the joint and marginal densities) it is easy to verify that the conditional distribution of ] given ] œ C is Normal with " " 5 mean œ. 3 C". " and variance œ 5 "3. 5" Anderson (194, Section 2.5.1) and Flury (1997, Theorem 3.3.1) generalize this result to MSC 2010 subject classifications: 15A1, 46B2, 47A05, 60E99, 62B05, 62E15. Keywords and phrases: Positive operator; Orthogonal projection operator; Spectral Theorem; Multivariate Normal distribution; Independence; Sufficiency; Analysis of Variance.

2 the multivariate Normal distribution. While Anderson (194) deals with the Lebesgue density of the multivariate Normal distribution (which exists only when the covariance matrix is of full rank), Flury (1997) avoids dealing with the density by defining the multivariate Normal distribution in terms of linear functionals, but requires the covariance matrix of the conditioning component to be of full rank. Though their approaches to defining the multivariate Normal distribution differ, both Muirhead (192, Theorem ) and Eaton (193, Proposition 3.13) obtain, as described below, the conditional distribution without any restriction on the rank of the covariance matrix. Let ] " have the multivariate Normal distribution with mean vector. " and covariance matrix D. Partition ],., and Das ]"." D"" D" ]œ ß. œ ß D œ ]. D" D where ]" and." are 5 " and D"" is 5 5. Then ] D" D "" ]" and ]" are jointly Normal and uncorrelated, hence independent, where D"" is a generalized inverse of D"". Consequently, the conditional distribution of ] D" D "" ]" given ]" equals the unconditional distribution of ] D" D "" ]", which is multivariate Normal in 5 dimensions with mean. D" D ""." and covariance D D" D"" D". Thus, the conditional distribution of ] given ]" is multivariate Normal in 5dimensions with mean. D D ]. and covariance D D D D. " "" " " " "" " Given two topological vector spaces Z and [, let _Zß[ denote the linear space of continuous linear transformations from Z to [, and let _ ZßZ œ _ Z. Now note 7 w 7 that, if g _ d ßd, the joint distribution of g ]ß] d is multivariate Normal with mean œ E. E E E and covariance œ D D w. DE D [ Muirhead (192, Theorem 1.2.6)], where the 7 matrix E represents the 7 transformation g with respect to the standard orthonormal bases of d and d. By the results of Muirhead (192) and Eaton (193) cited in the previous paragraph, we obtain that the conditional distribution of ] given g ] is multivariate Normal in d with mean w w w w. DE ED E E ]. and covariance DDE EDE ED. Unfortunately, this derivation of the conditional distribution of ] given g ], because of its dependence on manipulative matrix algebra, is not of much help when it comes to " approximating the conditional distribution of ] given 1 ] for a 1 in G by exploiting the local linearity of 1. In what follows, we present an alternative derivation of the conditional distribution of ] given g ]. Given X _d, we find in Theorem 3 X s _d, depending on X and the covariance of ], such that X] s is independent of X] ] X] s 7 and is an affine function of X]. In Theorem 4, given g _d ßd, we find X _d such that the conditional distribution of ] given X] equals that given g ], and use the decomposition obtained in Theorem 3 to obtain the conditional distribution of ] given X], hence that given g ]. Our derivation facilitates the w

3 approximation of the conditional distribution when the transformation 1 is nonlinear but continuously differentiable; see Remark 4 for a brief outline. We define a multivariate Normal distribution in terms of an affine transformation of the random vector with coordinates independent and identically distributed (iid, hereinafter) standard Normals, as in Muirhead (192), but work with the covariance operator (instead of matrix) and characteristic function. This coordinate-free approach allows us to seamlessly subsume the possibility that the multivariate Normal distribution is supported on a proper subspace of d which is not spanned by a subset of the standard orthonormal basis /ßâß/ ". In the spirit of Axler (2015), references to which in the sequel are by omission only, this relegates the manipulative matrix algebra that so dominates the multivariate Normal literature to the back burner. The paper is organized as follows. In Section 2, we introduce all the notations, definitions, and results from linear algebra used in Section 3. The main result of this section is Theorem 1, where we show, using the Spectral Theorem, that the orthogonal projection operator onto the range of the adjoint of a linear operator X is YX, where Y is an invertible linear operator. Theorem 1 is used in our proof of Theorem 3. In section 3, we first summarize the basic facts about the multivariate Normal distribution in our coordinate-free setup. Theorem 2 shows that if ] is d -valued multivariate 7 : 7 Normal, then f]ß g ] is d d -valued multivariate Normal, where f _ d ßd : and g _ d ßd. Corollary 1, which is used in our proof of Theorem 3, then formulates a necessary and sufficient condition in terms of the operators H, f, and g for the independence of f] and g ], where H is the covariance of ]. Note that, since a component of a vector is a linear transformation of the vector and a linear transformation of a multivariate Normal random variable is another multivariate Normal random variable [Lemma 5], Theorem 4 allows us to deduce Theorem (b) of Muirhead (192) (and Proposition 3.13 of Eaton (193)) as an immediate corollary. We also present alternative derivations of the independence of the sample mean and the sample variance of a random sample from a Normal distribution [Remark 1], the "partial out" formula for population regression in the Normal model [Corollary 2], and the sufficiency of the sample mean in the Normal location model [Remark 3]. We simplify the expressions for the conditional mean and covariance obtained in Theorem 4 in Remark 2, leading to a direct verification of the iterated expectation and the analysis of variance formulae. We outline a direction in which our method can possibly be extended in Remark 5. The Appendix contains two technical lemmas that are used in the proofs of Theorem 4 and Corollary 2. The following notational conventions and their consequences are used throughout the rest of the paper. The equality of two random variables, unless otherwise mentioned, implies equality almost surely. For any Polish space Ë, let UË denote the Borel 5-algebra of Ë.

4 Let 2 be a map from an arbitrary set Ì into a measurable space Íßm. For an arbitrary subset F of Í, let 2 " F denote C Ì À 2C F, whereas for an arbitrary subset E " of Ì, let 2E denote 2C ÀC E. Note that 2 2E œefor every subset Eof Ì. Let 52 denote the smallest 5-algebra of subsets of Ì that makes 2 measurable. Since 2 " F À F m is a 5 -algebra of subsets of Ì [Dudley (199, page 9)], we " obtain 52 œ 2 F À F m. 2. The results from Linear Algebra. Let Zß[ be finite-dimensional vector spaces. For any f _ [ßZ, let ef Z denote the range of f and af [ denote the null space of f. The result of Lemma 1 is used in the proof of Theorem 1. It is mentioned in Exercise 3.D.3; we present a proof for the sake of completeness. Lemma 1. Let Z be a finite-dimensional vector space, [ a subspace of Z, and h _ [ßZ. There exists an invertible operator Y _ Z such that YA œ ha for every A [ if and only if h is injective. Proof of Lemma 1. Let Y _Z be invertible such that YAœhA for every A [. Let A [ be such that haœ!, implying YAœ!. Since Y is invertible, hence injective by Theorem 3.69, we conclude that Aœ!, showing that h is injective. To show the converse, let U be a direct sum complement of [ and \ a direct sum complement of eh, where the existence of U and \ are guaranteed by Theorem Let ;ßâß; " 5 be a basis for Uand BßâB " 7 a basis for \. By the fundamental theorem of linear maps [Theorem 3.22], dim eh Ÿ dim [, implying, by Theorem 2.43, 5Ÿ7. Recall that Z can be uniquely decomposed as A;, where A [ and ; U. Define Y _Z as 5 5 Y@œh AB, where Bœ-B \ and ;œ-; œ" 4œ" Since \ is a direct sum complement of eh, Y@œ! implies haœbœ!. Since h is injective and BßâB " 5 is linearly independent, Yis injective, hence invertible (again by Theorem 3.69). In the sequel we assume that [ and Z are real inner product spaces. For g _Zß[, let g _ [ßZ denote the adjoint of g [Definition 7.2]. For any subspace U of Z, let ¼ U denote the orthogonal complement of U [Definition 6.45] and C U _Z denote the orthogonal projection operator onto U [Definition 6.53]. Theorem 1. Given X _ Z, there exists Y _ Z, invertible and depending on X, such that œyx. C e X

5 Proof of Theorem 1. We first observe that X X is a positive operator [Definition 7.31]. By Theorems 7.6(e) and 7.6(c), X X is self-adjoint; the observation follows, since, by the definition of the adjoint operator, for Z, X X@ß@ œ X@!. 1 By the Real Spectral Theorem [Theorem 7.29(b)], Z has an orthonormal basis 0ßâß0 " consisting of eigenvectors of X X with corresponding eigenvalues -" ßâß -. By Theorem 7.35(b), -4! for all "Ÿ4Ÿ. Since, by 1, -4 œx04, we obtain - - œ!íx0 œ!!íx0 Á! If œ! for every 4œ"ßâß, then X Xis the zero operator, implying, by 1, that X is the zero operator. By Theorem 7.7, X is the zero operator as well, and the theorem trivially holds with Y œm. Hence, Ã œ4à"ÿ4ÿß-4! is non-empty without - loss of generality; let Ã œ4à"ÿ4ÿß- œ!. - 4 X For 4 Ã, since X œ0, we have 0 ex, equivalently, - 4 C ex œ 0. 3 For 4 Ã, 0 ax by 2 ; since ax œex [Theorem 7.7(c)], By 3 and 4, for any B Z, C e X C ex 4 ¼ 0 œ!. 4 B œ ØBß 0 Ù Ã By 2 and the definition of Ã -, for any B Z, 4 4 XB œ ØBß0 ÙX Ã 4 4 By definition of 04 and -4, "Ÿ4Ÿ, the list of vectors X04X04 À4 Ã in Z is orthonormal, and consequently, by Theorem 6.26, linearly independent; the same conclusion holds for the list 04 À 4 Ã. For [ œ span X04 À 4 Ã, h _ [ßZ defined by hx04 œ 04 is clearly injective. By Lemma 1 there exists an invertible operator Y _Z such that Now note that, for any YA œ h Afor every A [. 7 B Z, by 6, 7, the definition of h, and 5, in that order,

6 Y X B œ ØBß 0 ÙY X 0 œ ØBß 0 ÙhX 0 œ ØBß 0 Ù0 œ C B, e X 4 Ã 4 Ã 4 Ã completing the proof. Lemma 2. Given H _ Z positive, there exists H _ Zpositive such that H H œ H H œ C eh and where H denotes the unique positive square root of H. Proof of Lemma 2. By Theorem 7.36, HH œh œh H, 9 H exists and is defined by H 0 œ - 0, 10 where 0ß0ßâß0 " is an orthonormal basis of Z consisting of eigenvectors of Hwith - corresponding eigenvalues -" ß- ßâß-, and Ã and Ã are as in the proof of Theorem 1. Let H _Z be defined by H if 4 Ã 0 œ 11 -! if 4 Ã. Clearly, H BßC œ - ØBß0 ÙØCß0 Ùœ BßH C, showing that H is self- 4 Ã adjoint and implying H BßB!, that is, H _Z is positive. For any C Z, by 10 and 11, H H C œ ØCß 0 Ù0 œ H H C Ã Since H is self-adjoint, by Theorem 7.7(d), eh œ ah ¼. 13 Since 04 À 4 Ã is contained in ah and 04 À 4 Ã in eh, it follows that 04 À 4 Ã is a basis of eh, and follows from 12. Note that 9 follows from the observation that both HH C and H HC equal ØCß 0 Ù- 0 œ H C. 4 Ã The multivariate Normal distribution results. Let ^ßâß^ " be iid standard Normal random variables. The distribution of

7 ^œ /^ is defined to be the standard multivariate Normal distribution Á is the identity operator. 5œ" !ß M, where M _ d Lemma 3. The characteristic function G!ßM of the Á!ß M distribution is given by G!ßM > œ I exp3ø>ß^ù œ exp > Î. Proof of Lemma 3. Follows from Proposition 9.4.2(a) of Dudley (199). Lemma 4. Given ^µ Á!ßM,. d, and X _ d, let Then, for any =ß > d, ]œ. X^. 15 " I 3Ø>ß ] Ù œ 3Ø>ß Ù exp exp. >ß X X > IØ>ß ] Ù œ Ø>ß. Ù Ø>ß ] Ùß Ø=ß ] Ù œ Cov >ß X X =. Proof of Lemma 4. Straightforward algebra using the bilinearity of the inner product and the definitions of X, ^ in 14, and ] in 15, along with the fact that ^" ßâß^ are iid standard Normal random variables, proves the lemma. Definition 1. The distribution of ] in 15 is defined to be the multivariate Normal distribution with mean. and covariance XX. Recall that a distribution on d is uniquely determined by its characteristic function [Dudley (199, Theorem 9.5.1)]; since the characteristic function of ] is determined by its mean. and covariance XX, the multivariate Normal distribution Á. ßH with mean. and covariance H (where H is any positive operator on d ) is uniquely defined in terms of the characteristic function " G.ßH> œ exp 3Ø>ß. Ù Ø>ß H>Ù. 16 Lemma 5. Given ] µ Á. ßH and f _ d ßd, 7 f]µ Á f. ßfHf. 7 Proof of Lemma 5. The proof is a straightforward consequence of Definition 1. Given =ß> ß =ß> d d œd, the inner product on d d is given by " " 7 : 7: 7 : =ß> ß =ß> œø=ß=ùø>ß>ù " " " ". Theorem 2. Given ] µ Á. ßH, f _ d ßd, and g _ d ßd, the random 7 :

8 7 : vector f]ß g ] µ Á7: with mean œf. ßg. d d and covariance operator 7 : ^ _ d d given by ^=ß> œ fh f =fh g >ßg H f =g H g >. 7 : 7 : Proof of Theorem 2. We first verify that ^ Àd d Èd d is a positive operator. 7 : The verification of ^ _ d d being linear and self-adjoint is routine. Since fh f =ß= fh g >ß= g H f =ß> g H g >ß> œ H f = H g >ßH f = H g >, the non-negativity of 7 : ^ =ß > ß =ß > for every =ß > d d follows. By the 7 : definition of the inner product in d d and 16, the characteristic function G of the 7 : random vector f]ß g ] taking values in d d is given by G " f g. =ß> œ exp 3 = >ß f = g >ßH f = g >. 17 Since f = g >ß. œ =ß> ßf. ßg. and f = g >ßH f = g > f f f g œ =ß H = H > >ßg H f =g H g > œ =ß> ßfH f =fh g >ßg H f =g H g >, 1 the proof follows. 7 : Corollary 1. Given ] µ Á. ßH, f _ d ßd, and g _ d ßd, f] and g ] : 7 7 : are independent if and only if fhg _ d ßd, equivalently g Hf _ d ßd, is the zero operator. Proof of Corollary 1. From 17, " G f. g. =ß > œ exp 3 =ß exp 3 >ß exp f = g >ß H f = g >. Now by 1, f = g >ßH f = g > œ =ßfH f = =ßfH g > >ßg H g >. Thus, by 16 and Lemma 5, G f g =ß > œ I exp 3 =ß ] I exp 3 >ß ] exp =ß fh g >. That is, the characteristic function of f]ß g ] is the product of two factors; one factor is the characteristic function of the product measure of the distributions induced by f] 7 : on U and g on U, whereas the other factor is d ] d exp =ßfH g >. Therefore, f] and g ] are independent if and only if exp =ßfH g > œ " for every

9 7 : = d and > d, equivalently, =ßfH g > œ! for every = and >. Since g Hf œ fhg by Theorems 7.6(e) and 7.6(c), the proof follows. Remark 1. For \ßâß\ iid Normal random variables with mean ) and variance 5, " " " \œ \ 3 and W œ \ 3 \ " 3œ" 3œ" are independent [ Casella and Berger (2002, Theorem 5.3.1(a) )]. Most textbooks prove this result by working with the (joint) density of the sample and using the Jacobian formula for finding the density of the transformation that maps the sample to the sample mean and the sample variance. Some textbooks use Basu's (1955) Theorem on an ancillary statistic (the sample variance) being independent of a complete sufficient statistic (the sample mean) to prove this result. We are going to show that this result is a straightforward consequence of Corollary 1. Let N be the sum of the standard orthonormal basis vectors, and Nthe span of N. we have BœN ØBßNÙN, C N " \œn C \ßNand W œn " M C \, N N Since where \œ \/ µ Á ) Nß5 M. By Corollary 1, using the fact that an orthogonal 5œ" 5 5 projection operator is self-adjoint, we obtain CN] and M CN] are independent if and only if MC NH CNis the zero operator, where ] µ Á. ßH. Clearly, MCNHCNBœ! for all B d if and only if MCNHNœ!, equivalently, N is an eigenvector of H. Since N is an eigenvector of 5 M, the independence of \ and W follows from that of C \ and M C \. N N The class of positive operators with N as an eigenvector does not reduce to the singleton set M. For œ, define H _ d by H? " ß? œ? "? ß? "? ; clearly, Nœ "ß" is an eigenvector for H. Thus, for the sample mean and the sample variance to be independent, it is not necessary for the sample to be iid. As long as the joint distribution of the sample is multivariate Normal such that N is an eigenvector of the covariance, the independence of the sample mean and the sample variance holds. However, for Normal random variables that are dependent, whether the joint distribution is multivariate Normal becomes a modeling question, as the joint distribution of even pairwise uncorrelated Normal random variables may not be multivariate Normal. // Theorem 3. Given ]µ Á. ßH and X _ d, define WœXH ; 19

10 then H C a H ] is independent of X] 20 W and ]H C a H ] is an affine function of X]. 21 W Proof of Theorem 3. For any B d and J _d, we obtain from 9, aj aj aj, XH H H C B œ XHH C H B œ XH C H B implying, by 19, that XH H C a W H is the zero operator, whence 20 follows from Corollary 1. To prove 21 we first observe that, by 13 and in that order, eh ah ah ]œ C ] C ]œh H ] C ]Þ 22 Now we are going to show that C ] œ C., 23 ah ah implying, by 22, that ]œh H ] C H. a Let 04, -4, Ã, and Ã be as in the proof of Lemma 2. Recall that 04 À 4 Ã is an orthonormal basis for ah and 04 À 4 Ã is an orthonormal basis for eh. For 4 Ã - and? d, by 16, " Iexp3? ] ß 04 œ exp3?04ß.?04ß H?04 œ exp3?. ß 04, implying ]ß0 - œ. ß0for all 4 Ã, that is, By Theorems 6.47 and 7.7(c), MœC C, 25 ew aw implying, by 24, aw ew ah ]H C H ] œh C H ]C.. 26 By Theorem 1, there exists Y _d, invertible, such that C ew œyw; 27 applying 27, 19, and 24 in that order, we obtain that RHS26 equals

11 H YWH ]C. œh YX] MH YX C., ah ah thereby establishing Theorem 4. Given ]µ Á. ßH and g _ d, d, the conditional distribution of ] given g ] is multivariate Normal on Ud. Proof of Theorem 4. The idea is to first construct X _ d such that 5g ] œ5x] and then use Theorem 3 to find the conditional distribution of ] given X]. Let Ç _ d, eg be defined by ÇB œ g B for every B d. Clearly, Ç is surjective, hence injective and invertible [Theorem 3.69]. By Theorem 7.7, Ç _ g d e, is invertible as well. Let X _ d be defined by XBœÇ ÇB; note that X, being the composition of two invertible transformations, is invertible. Let HY ß ßT denote the probability space underlying ]. Since g is continuous, hence measurable, g ]ÀHßY Èd 7 ßUd 7 is measurable, implying g ] " I Y for every I Ud 7. Note that, g ] " I œ Ç] " I eg. Further, eg, by 7 7 virtue of being closed, is an element of Ud, and Ueg is the trace of Ud on 7 eg ; that is, Ueg œ I eg À I Ud, implying 5g ] œ 5Ç ]. Since Ç and X are injective (and measurable), by Lemma A.1, 5Ç œ Ud œ 5X; that is, Ç " K À K Ue g œ Ud œ X " J À J Ud. Since, for any K Ue g and J Ud, we obtain Ç] " " " K œ ] Ç K X] " " J œ ] " and X J, 5Ç " Ç " ] œ ] K À K Ue g œ X] J À J Ud œ 5X]. That completes the proof that 5g ] œ 5X]. By the pull-out property of conditional expectation, 20, and 21, Iexp3 >ß] X] œ exp3 >ß] H H ] Iexp3>ßH C a C a H ], where W is as in 19. By Lemma 5 and 16, W Iexp œ exp " 3>ßH Ca W H ] 3>ßH C a W H. >ßK>, where the operator KœH CaWH HH CaW H equals, by 9 and, H CaWCeHCaW H. Therefore, the conditional distribution of ] given X], hence that given g ], is Normal with mean ] œ ]H C a WH ]. and covariance K. W

12 Remark 2. The expressions for the mean and the covariance K of the conditional distribution of ] given g ] can be considerably simplified, rendering the verification of the iterated expectation formula and the analysis of variance formula immediate. Since ]. œ C eh]. by 23, applying 25 and in that order, ] œ. H C H ].. 2 ew To simplify the expression of the covariance operator K (which does not depend on ]), first note that, by and 9, By 27, 19, and 29, in that order, Consequently, K equals H H C e œ H H H œ HH œ H. 29 ew eh eh ew C C œyxh C œyxh œ C. 30 H CeHCaWH H CeW CeHCaWH by 25 œ H CaWH H CeW CeHCaWH " Î œ H CaWH H CeW CaWH by 29 by 30 œh C a W H by Recall that if Z and [ are random vectors in d such that Z is an affine function of [, i.e., Z œ +V[, where + d and V _d, then IZ œ+vi[ and CovZ œv Cov [V, implying II ] X] œ., verifying the iterated expectation formula. Defining the expected value of an operator-valued random element E as the operator IE such that I E= œ I E = for every = d, provided the expected value on the right hand side is well defined for every =, we conclude from 31 that By 2 and Lemma 5, I ] X] Cov œ H H. C a W Cov I ] X] œh C H HH H œh H C C, ew ew ew where the second equality follows by 9,, and 30. The analysis of variance formula, I Cov] X] CovI ] X] œ Cov ], is verified by 25. // An immediate corollary of Theorem 4 is the "partial out" formula for population regression in the Normal model.

13 Corollary 2. If \ß] ß^ is multivariate Normal such that ^ œ ^", âß^ d and \ß] d, then Cov\ß] l^ I ]l\ß^i ]l^ œ Var \l^! \ I \l^. 32 Var\l^ " Proof of Corollary 2. Let c "ß$ _ d ßd and c $ _ d ßd be defined by c AœBßD and c AœD, where AœBßCßD. By 2, "ß$ $ e W W LHS32 œ H C C e H [. ß/ 33 "ß$ $, where [œ \ß]ß^,. d is the mean of [ and H _ d is the covariance of [, W"ß$ œt"ß$ H with T"ß$ _d defined by T"ß$ AœBß!ßD, and W$ œt$ H with T _d defined by T A œ!ß!ß D. $ $ To show RHS32 œ RHS33, we first observe, using 31 and 2, Cov \ß] l^ œ H C H / ß/ a W " $ Var\l^ œ H CaW H /" ß/ " œ CaW H /" 34 $ $ \I \l^ œmh C ew H [. ß/ $ ". Since T "ß$/ 4 œ/ 4 if 4Á, T "ß$/ œ!, T/ $ 4 œ/ 4 if 4 $, and T/ $ 4 œ! if 4œ"ß, a W"ß$ œ spanh /" ßH / $ ßâßH / a W œ ¼ span H / ßâßH /. ¼ $ $ If H /" span H / $ ßâßH / œ ew $, then RHS33 œ!, in which case Var \l^, a constant function of ^, is also equal to!, thereby vacuously satisfying 32. If H / Â span H / ßâßH /, by Lemma A.2, " $ C C H [. ew ew "ß$ $ œ H / C C H [. ßC H / C H / aw " aw aw " aw " Therefore, RHS 33 equals C $ $ $ $. H [. ßCa W H / " H Ca W H / $ " ß/, C H / " aw $ $ a W $ which, by the first two rows in 34, equals, since C a W$ is self-adjoint and idempotent,

14 Cov\ß] l^ H C a H [. ß/ $ Var\l^ W " Since H C a W$ H œ MH CeW H C $ ah by 25,, and 13, the proof follows by the third row in 34 and 23.. Remark 3. Given a random sample \" ßâß\ from the Normal distribution with mean ) and (known) variance 5, \ is a sufficient statistic for ) [Example 6.2.4, Casella and Berger (2002)]. While the typical proof uses the powerful factorization theorem [Theorem 6.2.6, Casella and Berger (2002)], we are going to show that the conditional distribution of the sample \œ \/ 5 5 µ Á ) Nß5 M given \, that is, " C N \, 5œ" does not depend on ), thereby proving the sufficiency of \ directly. The said conditional distribution, by Theorem 2, 2, and 31, is multivariate Normal with mean ) NH C ew H \) N and covariance H CaWH, where Hœ5 M and " " Wœ CNH, implying W œ5 CNand consequently, Ce W œcn, further implying that the mean equals C \ and the covariance equals 5 M C. // N Remark 4. In a companion paper we are currently working on, we use the local linearity " 7 of a G vector field 1 ( on d that takes values in d ) and the decomposition of ] in Theorem 3 to approximate the conditional distribution of ] given 1 ]. If 1 is constant, then the 5-algebra generated by 1] is the trivial 5-algebra consisting of the empty set and the entire sample space, making ] independent of 1 ], so that the conditional distribution of ] given 1] is the unconditional distribution of ]. If 1is injective, 51 equals Ud by Lemma A.1; consequently, the conditional distribution of ] given 1 ] is the point mass at ]. Thus, the interesting problem unfolds when 1 is a non-constant, non-injective, G " vector field. The conditional distribution of ] given 1 ] will not be multivariate Normal if 1 is not linear. Let Àd È_ d ßd 7 be defined to be the continuous function that maps + d to the total derivative of 1 at +. Given %!, there exists a compact subset O% of d such that T ] O% "%. The function, restricted to O%, is uniformly continuous, and admits a uniformly continuous modulus of continuity ' %. We are working to show that, for a suitably chosen metric for weak convergence of probability measures and $! (that depends on the given % through the supremum of the modulus of continuity ' % on the closed interval!ß F %, where F % is the diameter of O % ), the conditional distribution of ] given 1 ] is within $ -neighborhood of the family of multivariate Normal distributions. // Remark 5. Proposition 3.13 of Eaton (193) holds at a much greater level of generality; see Bogachev (199, Theorem ). Extending our approach, in the absence of an inner product, to finding the conditional distribution of a Gaussian random element given a (non-injective, non-trivial) linear transformation appears to be an interesting area of future research. In particular, if ] is the Brownian motion, i.e., the random element in N

15 V!ß " distributed according to the Wiener measure, /" ß âß / 5 are measures on 5 U!ß ", and g À V!ß " È d is given by g 0 œ 0. /" ß âß 0. / 5, what is the conditional distribution of ] given g ]? // Appendix. We present a lemma on measurability [Lemma A.1] that was used in the proof of Theorem 4 and another lemma on orthogonal projection operators in a Hilbert space [Lemma A.2] that was used in the proof of Corollary 2. Lemma A.1 Let Ë and Ì be Polish spaces. If 2ÀË ÈÌ is Borel measurable and injective, then 52 œ UË. Proof of Lemma A.1. The inclusion 52 UË follows from the pertinent definitions. For Q UË, define Fœ2Q. Since 2 " F œqand F UÌ by Theorem I.3.9 of Parthasarathy (1967), the reverse inclusion follows. Lemma A.2 For a proper and closed subspace Z of a Hilbert space L and B LZ, let Z B denote the closure of the Zß- d. Then, for any C L, C Z B CC CœC ¼B C ¼CßC ¼BC ¼BÞ 35 Z Z Z Z Z Proof of Lemma A.2. By the definition of Z B there exists a À " Z and a sequence - À " d such that C Z Cœ B Ä - B. 36 Since Z Z B, CZCœCZCZ B C; since CZ is continuous, by 36, C Cœ - C B. 37 Z Z Ä Since BC Z BœC Z ¼B, subtracting 37 from 36, LHS35 œ lim - C ¼ B. 3 Ä Since LHS35 œ CZ ¼CC Z C, taking the inner product of both sides of 3 with B ¼ C Z ¼B and using the linearity and homogeneity of inner product, we obtain C ¼Cß C ¼B C Cß C ¼B œ lim - C ¼B. 39 Z Z ZB ¼ Z Z Ä Since B ZB and C Z B Z Z B, CZ ¼B ZB, implying C Z CßCZ ¼B œ! ; since B ¼ BÂZ, that is, C Z ¼B Á!, the lemma follows from 39 and 3. Z

16 References nd Anderson, T. W. (194). An Introduction to Multivariate Statistical Analysis. 2 Ed., John Wiley & Sons, New York. Axler, Sheldon (2015). rd Linear Algebra Done Right. 3 Ed., Springer, New York. Basu, D. (1955). On Statistics Independent of a Complete Sufficient Statistic. Sankhya, Vol 15, Bogachev, Vladimir I. (199). Gaussian Measures. 1 Ed., American Mathematical Society, Providence, Rhode Island. nd Casella, George and Roger L. Berger (2002). Statistical Inference. 2 Ed., Brooks/Cole, California. st Dudley, R. M. (199). Brooks/Cole, California. st Real Analysis and Probability. 1 Ed., Wadsworth and Eaton, Morris L. (193). Multivariate Statistics: A Vector Space Approach. 1 Ed., John Wiley & Sons, New York. Flury, Bernard (1997). York. st A First Course in Multivariate Statistics. 1 Ed., Springer, New st Muirhead, Robb J. (192). Aspects of Multivariate Statistical Theory. 1 Ed., John Wiley & Sons, New York. Parthasarathy, K. R. (1967). New York, NY. Probability Measures on Metric Spaces, Academic Press, st RAJESHWARI MAJUMDAR SUMAN MAJUMDAR rajeshwari.majumdar@uconn.edu suman.majumdar@uconn.edu PO Box 47 1 University Place Coventry, CT 0623 Stamford, CT 06901

On Least Squares Linear Regression Without Second Moment

On Least Squares Linear Regression Without Second Moment BY RAJESHWARI MAJUMDAR University of Connecticut If \ and ] are real valued random variables such that the first moments of \, ], and \] exist and