A Note on Hilbertian Elliptically Contoured Distributions

Size: px

Start display at page:

Download "A Note on Hilbertian Elliptically Contoured Distributions"

Eugenia Hope Reeves
5 years ago
Views:

1 A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution for random variables defined on a separable Hilbert space. It is a generalization of the multivariate elliptically contoured distribution to distributions on infinite dimensional spaces. Some theoretical properties of the Hilbertian elliptically contoured distribution are discussed, examples on functional data are investigated to illustrate the applications of such distributions. Keywords: Elliptically contoured distribution, functional data, Hilbertian random variable. 1 Introduction Elliptically contoured distribution is an important class of distribution in Multivariate Analysis, with some very nice symmetry properties. It is widely used in statistical practices, for example in dimension reduction (Li, 1991, Cook and Weisberg 1991, Schott, 1994) and regression graphs (Cook, 1998). The most important example of this class of distribution is of course the multivariate Gaussian distribution. Properties of multivariate elliptically contoured distribution are well studied, see for example Cambanis, Huang, and Simons (1981), Eaton (1986). Recent developments in statistics has lead us to look beyond random vectors on the Euclidian space. Statistical models for random vectors defined on infinite dimensional Hilbert spaces are in demand. One important example is the functional data analysis (Ramsay and Silverman, 2005), where the data are vectors in a functional space, e.g. the L 2 space. Among Hilbertian distributions, the Gaussian distribution is still the most wellunderstood. For example, in functional data analysis, the random functions are 1 addresses: yehuali@uga.edu

2 usually modeled as a Gaussian processes. The class of elliptically contoured distributions is an important generalization from Gaussian. It has important applications in dimension reduction, see recent literature on functional sliced inverse regression (Ferré and Yao, 2002, Li and Hsing, 2007), yet its theoretical properties are not well studied. The goal of this paper is to fill in this gap. The rest of the paper is organized as the following. We introduce some backgrounds and definitions regarding linear operators and Hilbertian random variables in Section 2. A random representation for Hilbertian elliptically contoured random variables is introduced in Section 3. Some theoretical properties of Hilbertian elliptically contoured distribution are discussed in Section 4, including marginal and conditional distributions of a random variable X when it is mapped into different Hilbert spaces. Finally, we give some examples in Section 5 to illustrate the applications of the theories derived in the previous sections, especially their application in functional data analysis. 2 Definitions and Backgrounds 2.1 Linear operators We first introduce some notation and backgrounds for linear operators on Hilbert spaces. More theories on linear operators can be found in Dunford and Schwartz (1988). We will restrict our discussion to separable Hilbert spaces. A separable Hilbert space H is a Hilbert space with a countable basis, {e 1, e 2, }. For two Hilbert spaces H and H, a linear operator T : H H is a linear map from H to H, i.e. T (ax) = a (T x), T (x + y) = T x + T y, for any x, y H and any scalar value a. T is bounded if T x H M x H, x H, for some non-negative real number M. Denote the class of bounded linear operator from H to H as B(H, H ), when H = H it is simplified as B(H). 2

3 The adjoint of an operator T B(H, H ), denoted as T, is an operator mapping from H to H, with y, T x H = T y, x H, x H, y H. When H = H, T is called self-adjoint if T = T. 2.2 Hilbertian random variables Let H be a separable Hilbert space with inner product, H, (Ω, F, P ) be a probability space, then a Hilbertian valued random variable is a map X : (Ω, F, P ) H. Since finite dimensional Hilbert space are isomorphic to the Euclidian space, and theories in Multivariate Analysis apply, we are generically interested in random variables on an infinite dimensional Hilbert space. In functional data analysis, H could be L 2 functional space, or Sobleve space, etc. The mean of X, if exits, is defined as µ X = EX = X(ω)dP (ω), which is an element in H satisfying b, EX = E b, X for all b H. The variance of X is an operator on H, defined as V X (g) = E{(X EX) (X EX)}(g) = E{ X EX, g (X EX)}, for g H. It is easy to show that V X is a self-adjoint, nonnegative definite operator. The characteristic function of a Hilbertian random variable is φ X (f) = E{exp(i f, X )}, (1) for all f H. For a separable Hilbert space, there is a countable basis, {e 1, e 2, }. Define x j = e j, X, they are univariate random variables. Then X has the coordinate (x 1, x 2, ), which is a l 2 random variable. Denote the space of H-valued random variables as X, then X is isomorphic to the space of l 2 random variables. An operator T is nuclear if the trace of it is finite and independent of the choice of the basis. The trace of an operator is defined as tr(t ) = e i, T e i, i=1 3

4 for any complete orthonormal basis of H. The covariance operator of X considered in this paper is a self-adjoint, nonnegative definite, nuclear operator, denoted as Γ. Γ has a spectrum decomposition Γ = j λ j ψ j ψ j, (2) where λ 1 λ 2,, 0 are eigenvalues of Γ, ψ j are corresponding eigenvectors. If {ψ j } is incomplete, we can always make them into a complete basis for H by including basis of the null space of Γ. ψ j s are the principal components of the random vector X. Definition 1 A Hilbertian random variable X has an elliptically contoured distribution if the characteristic function of X µ has the form φ X µ (f) = φ 0 ( f, Γf ) for a univariate function φ 0, where Γ is a self-adjoint, non-negative definite, nuclear operator on H. Denote the distribution of X as HEC H (µ, Γ, φ). One important example of elliptically contoured distribution is the Gaussian distribution, whose characteristic function has the form φ X µ (f) = exp( f, Γf /2). 3 Random representation for Hilbertian valued elliptically contoured random variables For a fixed self-adjoint, non-negative definite, nuclear operator Γ, we can define a metric in H, d Γ (x, y) = x y Γ = x y, Γ(x y) 1/2 H. Lemma 2 Suppose φ X µ (f) = φ 0 ( f, Γf ) is the characteristic function of an elliptically contoured distribution, then φ 0 ( ) is a non-negative definite function on H with respect to the metric d Γ (, ), i.e. n a i a j φ 0 {d 2 Γ(f i, f j )} 0, j=1 for any finite collections of {f i ; i = 1,, n} H and for any real values {a i ; i = 1,, n}. 4

5 Lemma 2 is a straight forward application of the Sazanov theorem (Vakhania, Tarieladze and Chobemyan, 1987), which is a generalization of the Bochner s Theorem to the infinity dimensional Hilbert space. By the definition of the characteristic function, one can easily see that φ X µ (f g) = φ X µ (g f), which leads to φ 0 ( f g 2 Γ ) = φ 0( f g 2 Γ ). This means φ 0( ) must be real valued. Theorem 3 X HEC H (µ, Γ, φ), if and only if X = d µ + RU, (3) where U Gaussian(0, Γ), R is a nonnegative univariate random variable independent of U. Proof: We first prove the if part. Suppose (3) is true, and let F be the distribution function of R, then φ X µ (f) = E exp(ir f, U ) = [0, ) exp( r 2 f, Γf /2)dF (r). (4) By Definition 1, X is a Hilbertian-valued elliptically contoured random variable. Conversely, suppose X is an elliptically contoured Hilbertian random variable with the characteristic function φ X µ (f) = φ 0 ( f Γ ), by Lemma 2 g(t) = φ 0 (t 2 ) is a positive definite function. By Theorem 2 in Schoenberg (1938), g(t) = 0 exp( t 2 u 2 )dα(u), for some bounded non-decreasing function α(u). By the definition of the characteristic function, we have 1 = φ 0 (0) = 0 dα(u). Therefore, α( ) is the cumulative distribution function of a non-negative random variable. We now change variable, let t = f Γ and define a random variable R, such that 2 1/2 R has distribution function α( ). Let F be the distribution function of R, then F (r) = α(2 1/2 r). We have φ X µ (f) = φ 0 ( f, Γf ) = Therefore, X has the stochastic representation (3). 0 exp( r 2 f, Γf /2)dF (r), 5

6 4 Properties of elliptically contoured distribution We first discuss moment properties of elliptically contoured distribution. Suppose the first two moments of X exist. By (3), EX = µ,, V (X) = E(R 2 )Γ. On the other hand, if we start from the characteristic function, assuming that φ is twice differentiable, V (X) = φ (2) X µ (0) = 2φ 0( f, Γf )Γ + 4φ (2) 0 ( f, Γf )(Γf) (Γf) f=0 = 2φ 0(0)Γ. To make Γ identifiable, we can let E(R 2 ) = 2φ 0 (0) = 1, then Γ = V (X) is the covariance operator of X. Theorem 4 Let H, H 1 and H 2 be separable Hilbert spaces, suppose X HEC H (µ, Γ, φ 0 ), P 1 B(H, H 1 ), P 2 B(H, H 2 ) are two bounded operators. Define X i = P i X, µ i = P i µ, Γ ij = P i ΓP j, for i, j = 1, 2. Suppose Γ 12 = 0. Then X 1 HEC H1 (µ 1, Γ 11, φ 0 ). If Γ 22 is a finite dimensional operator, then X 1 X 2 HEC H1 (µ 1, Γ 11, φ T (X2 )), (5) where φ T (X2 )(t 2 ) is a non-negative definite function depends on T (X 2 ) = X 2 µ 2, Γ 22 (X 2 µ 2 ) 1/2 H 2, Γ 22 is a generalized inverse of Γ 22. If Γ 22 is an infinite dimensional operator, then X 1 X 2 Gaussian H1 (µ 1, r 2 (X 2 )Γ 11 ), (6) where r( ) is a deterministic function given in (7). Proof: By (3), P i X = P i µ + RU i, where U i = P i U Gaussian(0, Γ ii ), for i = 1, 2. Therefore, X 1 HEC H1 (µ 1, Γ 11, φ 0 ) by Theorem 3. Since Cov(U 1, U 2 ) = P 1 ΓP2 = 0, by the property of Gaussian variables, U 1 is independent of U 2. X 1 depends on X 2 only though the information on R provided by X 2. Suppose Γ 22 is finite dimensional, then X 2 R Gaussian H2 (µ 2, R 2 Γ 22 ). 6

7 Notice that Γ 22 is a finite dimensional operator (matrix) with a generalized inverse Γ 22. From the theory of finite dimensional Gaussian, T (X 2) is a sufficient statistic for R, i.e. X 2 T (X 2 ) is independent of R. Therefore, (5) is obtained by Theorem 3. X 1 X 2 d = P1 µ + U 1 {R T (X 2 )}. On the other hand, if Γ 22 is infinite dimensional, we claim that X 2 provide all information about R. It is is easy to see that Γ 22 is self-adjoint, non-negative definite and nuclear, therefore it has a spectrum decomposition, Γ 22 = λ j ψ j ψ j, j=1 where λ j s are the positive eigenvalues of Γ 22. Define r n (X 2 ) = 1 n n j=1 λ 1/2 j X 2 µ 2, ψ j H2. Notice that X 2 µ 2 = RU 2, and therefore λ 1/2 d j X 2 µ 2, ψ j H2 = RU2j, where U 2j are i.i.d Normal(0, 1) independent of R. By Law of Large Numbers, r(x 2 ) = lim n r n(x 2 ) = R (7) with probability 1. Therefore, X 1 X 2 d = µ1 + r(x 2 )U 1 which is the Gaussian distribution in (6). Theorem 4 gives the conditional distribution of X 1 given X 2 when they are uncorrelated, i.e. Γ 12 = Cov(P 1 X, P 2 X) = 0. The following corollary gives the condition distribution for the more general cases. Corollary 5 Let H, H 1 and H 2 be separable Hilbert spaces, suppose X HEC H (µ, Γ, φ 0 ), P 1 B(H, H 1 ), P 2 B(H, H 2 ), define X i = P i X, µ i = P i µ, Γ ij = P i ΓPj, for i, j = 1, 2. Define µ 1 = µ 1 + Γ 12 Γ 22 (X 2 µ 2 ) and Γ 11 = Γ 11 Γ 12 Γ 22 Γ 21. If Γ 22 is a finite dimensional operator, X 1 X 2 HEC H1 (µ 1, Γ 11, φ T (X2 )), (8) where φ T (X2 )(t 2 ) is a non-negative definite function depends on T (X 2 ) = X 2 µ 2, Γ 22 (X 2 µ 2 ) 1/2 H 2. 7

8 If Γ 22 is an infinite dimensional operator, then X 1 X 2 Gaussian H1 (µ 1, r 2 (X 2 )Γ 11), (9) where r( ) is a deterministic function given in (7). Proof: First of all, Γ ij = Cov(X i, X j ) are bounded operators, like in multivariate analysis, by Cauchy s inequality, we have Γ 11 bounded and positive semidefinite. Also, µ 1 is well defined, since Cov(µ 1 ) = Γ 12Γ 22 Γ 21 is bounded by Γ 11 and therefore P (µ 1 H 1) = 1. Let U i = P i U, i = 1, 2. Since U i s are Gaussian, it is easy to check though moment calculations that (U 1, U 2 ) = d (Z 1 + Γ 12 Γ 22 Z 2, Z 2 ), where Z 1 Gaussian H1 (0, Γ 11 ), Z 2 Gaussian H2 (0, Γ 22 ), and they are independent. Therefore X 1 X 2 d = {µ1 + RZ 1 + Γ 12 Γ 22 (RZ 2)} X 2 d = µ 1 + Γ 12 Γ 22 (X 2 µ 2 ) + RZ 1 X 2 d = µ 1 + Z 1 (R X 2 ). (10) Here we used the fact that Z 1 is independent of X 2. When the range of the operator Γ 22 is finite dimensional, X 2 is also finite dimensional. Use the arguments like in the proof of Theorem 4, we can show that (R X 2 ) is a nonnegative random variable, which depends on value of X 2 only through the statistic T (X 2 ). Then (8) follows from a direct application of Theorem 3. When Γ 22 is of infinite dimension, like in the proof of Theorem 4, (R X 2 ) is a deterministic function r(x 2 ) given by (7). (9) is proved. Remark: Although, in this paper, we are only interested in the case that X is defined on an infinite dimensional Hilbert space H, we do allow the Hilbert spaces H 1 and H 2 that P 1 and P 2 mapped into to be finite dimensional. For example, we allow H 1 and H 2 to be the Euclidian space R m. See our examples in Section 5. 8

9 5 Applications To show the usefulness of our theory in statistical practices, we will provide a few examples which are direct results of the theorems in Section 4. Example 1: (Principal Component Analysis) Suppose X HEC H (µ, Γ, φ 0 ), and the covariance operator Γ has the spectrum decomposition as in (2). ψ j s are the principal components of X. Define the principal component score ξ j = ψ j, X H for j = 1, 2,, then X has the following decomposition X = µ + ξ j ψ j. (11) j=1 Such decomposition in functional analysis is also called the Karhunen-Loéve decomposition (Ash and Gardner, 1975). For any finite collection, {ψ j1,, ψ jm }, define an operator P : H R m by P = ψ j1 e ψ jm e m, where e j are the j th column vector of the identity matrix. Then by Theorem 4, (ξ j1,, ξ jm ) T = P (X µ) HEC R m{0, diag(λ j1,, λ jm ), φ 0 }. In other words, any finite collection of principal component scores of X follows a multivariate elliptically contoured distribution. This example also suggest a way to simulate a HEC H (µ, Γ, φ) random variable. Since λ j 0 as j, we can truncate the series in (11) at a large number m and simulate the first m principal component scores from a multivariate elliptical contoured distribution. This is very useful in simulating functional data. Example 2: (Conditional moments) Let X HEC H (0, Γ, φ), P 1 B(H, H 1 ), P 2 B(H, H 2 ). By Corollary 5, E(P 1 X P 2 X) = µ 1 + Γ 12 Γ 22 (X 2 µ 2 ). Suppose µ = 0, P 1 ΓP2 = 0, then Γ 12 = 0 and we have E(P 1 X P 2 X) = 0. 9

10 On the other hand, by the random representation (10), Var(P 1 X P 2 X) = Γ 11E(R 2 X 2 ). When Γ 22 is a finite dimensional operator, E(R 2 X 2 ) = g{t (X 2 )} for some univariate function g depends on the elliptically contoured distribution, T (X 2 ) is defined in Theorem 4. Therefore Var(P 1 X P 2 X) = g{t (X 2 )}Γ 11. When Γ 22 is an infinite dimensional operator, by (9), Var(P 1 X P 2 X) = r 2 (X 2 )Γ 11. If P 1 ΓP2 = 0, Γ 11 = Γ 11. Example 3: (Functional sliced inverse regression) Suppose the Hilbert space H is the L 2 [0, 1] functional space, X H are random functions defined on the [0, 1] interval. A general functional regression model is given by Y = f( β 1, X, β 2, X,, β K, X, ɛ), (12) where Y is a scalar response variable, ɛ is the error term independent with X, β 1,, β K are linearly independent coefficient function, f is a nonparametric link function. Model (12) is very general, it can be very useful in many applications, see the discussion in Ferré and Yao (2003) and Li and Hsing (2007). Since we do not impose any structure on the link function f, the coefficient functions β k s are usually unidentifiable, but the subspace spanned by these function is. This subspace is called the Effective Dimension Reduction space, or the EDR space. We can chose any K orthonormal basis functions in the EDR space as β k s, these functions are also called the EDR directions. The functional sliced inverse regression (FSIR) approach can be used to estimate the EDR directions and to decide the dimension of EDR space. We will show that the class of process X with a Hilbertian Elliptically Contoured distribution satisfies a key assumption for (FSIR), and we will discuss an important result for elliptically contoured functional predictor, which is useful for FSIR. For a more comprehensive 10

11 account for the method and theory of FSIR, we refer to Ferré and Yao (2003), Li and Hsing (2007). One key assumption for FSIR is that, for any β 0 H, then K E( β 0, X β 1, X, β 2, X,, β K, X ) = c 0 + c k β k, X (13) for some constants c 0,, c K, see Ferré and Yao (2003). We will show this assumption is satisfied if X is elliptically contoured. Define operators P 1 x = β 0, x, P 2 x = ( β 1, x, β 2, x,, β K, x ) T for x H. Notice that H 2 = R K, and P 2 is clearly a finite dimensional operator. For any vector v = (v 1,, v K ) T, K P2 v, x H = v, P 2 x H2 = v k β k, x, x H. Therefore, P2 v = (β 1,, β K )v. One can also show that Γ 22 is a K K matrix with the (j, k) th entry equal to β j, Γβ k. Similarly, Γ 12 = ( β 0, Γβ 1,, β 0, Γβ K ) is a 1 K matrix. By Corollary 5, E(P 1 X P 2 X) = P 1 µ + Γ 12 Γ 22 (P 2X P 2 µ). Therefore, assumption (13) is satisfied. k=1 k=1 Example 4: (Functional sliced inverse regression, continued) Suppose X is elliptically contoured, with mean µ = 0. Let P 2 be the operator defined as in the previous example, define the operator P 1 x = ( γ 1, x,, γ m, x ) T for a set of orthonormal vectors in H, {γ 1,, γ m }. Suppose P 1 ΓP2 = 0, i.e. γ j, Γβ k = 0 for j = 1,, m, k = 1,, K. By model (12), all the information in Y about X are contained in P 2 X, we have E(P 1 X Y ) = E{E(P 1 X P 2 X) Y }. By Example 2, E(P 1 X P 2 X) = 0, therefore P 1 E(X Y ) = 0. (14) 11

12 Equation (14) provides information about the shape of the inverse regression curve E(X Y ). On the other hand, Var(P 1 X Y ) = E{Var(P 1 X P 2 X) Y } + Var{E(P 1 X P 2 X) Y } = E{Var(P 1 X P 2 X) Y }. (by E(P 1 X P 2 X) = 0) Again, by Example 2, Var(P 1 X P 2 X) = g{t (P 2 X)}Γ 11, therefore Var(P 1 X Y ) = E[g{T (P 2 X)} Y ]Γ 11. Since Γ 11 is the marginal covariance for P 1 X, this result shows that the conditional covariance of P 1 X given Y is proportional to the the marginal covariance. This result is important to constructing tests for FSIR. References [1] Ash, R. B. and Gardner, M. F. (1975). Topics in stochastic processes, Academic press. [2] Cambanis, S., Huang, S. and Simons, G. (1981). On the theory of elliptically contoured distributions, Journal of Multivariate Analysis, 11, [3] Cook, D. R. and Weisberg, S. (1991). Comments on Sliced Inverse Regression for Dimension Reduction, by K. C. Li, Journal of the American Statistical Association, 86, [4] Eaton, M. L. (1986). A characterization of spherical distributions, Journal of Multivariate Analysis, 20, [5] Ferré, L. and Yao, A. (2003). Functional sliced inverse regression analysis. Statistics, 37, [6] Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 414, [7] Li, Y. and Hsing, T. (2007). Determination of the Dimensionality in Functional Sliced Inverse Regression, manuscript. 12

13 [8] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd Edition. Springer-Verlag, New York. [9] Schoenberg, I. J. (1938). Metric spaces and completely monotone functions, Annals of Mathematics, 39, No.4, [10] Vakhania, N. N., Tarieladze, V. I. and Chobemyan, S.A. (1987). Probability Distributions on Banach Spaces. D. Reidel, Dordrecht. 13

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix jean-charles.croix@emse.fr Génie Mathématique et Industriel (GMI) First workshop on Gaussian processes at Saint-Etienne