Journal of Statistical Studies ISSN 10-4734 A CLASS OF ORTHOGONALLY INVARIANT MINIMAX ESTIMATORS FOR NORMAL COVARIANCE MATRICES PARAMETRIZED BY SIMPLE JORDAN ALGEBARAS OF DEGREE Yoshihiko Konno Faculty of Science, Japan Women s University, Tokyo, Japan Dedicated to Professor A.K. Md. E. Saleh on the occasion of his 75th birthday Normal covariance models parametrized by simple Jordan algebras of degree, equivalently the Lorentz cone, were first discussed by Tolver Jensen 1988 who described, for the family of normal distributions, the structure of those statistical models which are linear in both covariance and inverse covariance. Recently Konno 007 developed minimax estimation for normal covariance matrices parametrized by the irreducible symmetric cones which include the Lorentz cone. In this paper a new class of minimax estimators is proposed for the Lorentz Wishart models under Stein s loss function. This class includes analogue of estimators by Dey and Srinivasan 1985 and Perron 199 for the real Wishart models. Keywords and phrases: Wishart distribution, Lorentz cone, Stein estimators, Jordan algebras, Symmetric cones. 1. Introduction James and Stein 1961 first employed the Stein loss function and considered the problem of minimax estimation of a mean matrix of the real Wishart distribution. They used a result of Kiefer 1957 to construct minimax estimators having a constant risk. Later Stein 1977 pointed out that the eigenvalues of the Wishart matrix spread out more than the eigenvalues of the expected value of the Wishart matrix. This phenomenon suggests that the eigenvalues of the Wishart matrix should be shrunk toward a middle value of the eigenvalues. Furthermore he gave an unbiased risk estimate for a class of orthogonally invariant estimators, from which he obtained minimax estimators which are uniformly better than the James and Stein estimator. Recently Konno 007 extended these results to the estimation problem of general Wishart distributions on the symmetric cones. These general models include the Wishart models of real, complex, and quaternion entries, and the Lorentz Wishart models. The Lorentz Wishart models were originally discussed by Tolver Jensen 1988 who describe, for the family of normal distributions, the structure of those statistical models which are linear in both covariance and inverse covariance. In this paper we focus on the problem of estimating the mean of the Lorentz Wishart distributions which are the special case of the Wishart models on the symmetric cones. In Section, a brief introduction to simple Jordan algebras of degree is given. In Section 3, estimation problem is introduced while in Section 4, minimax estimators are given by following the results in Konno 007. In Section 5, a class of orthogonally invariant minimax estimators is given by using an unbiased risk estimate for the orthogonally invariant estimators given in Konno 007. In the Appendix, technical proofs are given. Simple Jordan algebras of degree In this section we review some notion of simple Jordan algebra of degree which is a special case of finite-dimensional Euclidean simple Jordan algebras. See Faraut and Korányi 1994 for recent results on the symmetric cones and Jordan algebras. Let W be a real vector space of dimension v 1v 3 with a symmetric bilinear form B. We denote by B a norm induced by this symmetric bilinear form B. We assume that V = R W has a Jordan multiplication defined for α, a, β, b in R W by α, a β, b =αβ + Ba b, αb + βa..1
Journal of Statistical Studies Y. Konno Throughout the paper, the small cap alphabets are used for elements in V, the small cap Greek letters are used for elements in R, and bold faced small cap alphabets are used for elements in the vector space W. For example, we write a =α, a R W for a V. Note that the identity element in V is given by 1, 0 where 0 is the zero vector in W and that the multiplication on V satisfies for all a, b V ; a b = b a, a a b = a a b, where a = a a. However, the multiplication does not satisfy the communicative law. The associated symmetric cone Ω is the well-known Lorentz cone in the enveloping space V defined by Ω={a =α, a R W ; α>0, α Ba a > 0}. We define the trace and determinant of a =α, a V = R W by If deta 0, then we define the inverse of a by Note that, by., tra =α, deta =α Ba a.. a 1 = 1 α, a. deta a a 1 = a 1 a = 1 deta α Ba a, 0 =1, 0, and that, by.1 and., V is equipped with the inner product a b = tra b ={αβ + Ba b}, for α, a, β, b in R W. We denote by a norm on V induced by the above inner product. Note that, from the definition of the norms on V and W and the multiplication rule on V,wehave 0, a =tr{0, a 0, a} =trba a, 0 =Ba a = a B,.3 for a W. We write a Jordan frame, a complete system of orthogonal primitive idempotents, as c 1 = 1 1, h, c = 1 1, h, where h W is fixed such that Bh h =1. Let V 1, V, and V 1 be a subspace of V defined by V 1 = Rc 1, V = Rc, V 1 = {0, u; u W,Bu h =0}. Then note that c 1 = c 1,c = c,c 1 c =0, and c 1 + c =1, 0. Then we see that, for all u V 1, c 1 u =1/u and c u =1/u. It is known that V is decomposed into V = V 1 V 1 V.In fact, we have, for a =α, a V, a =α + Ba hc 1 0, a Ba hh α Ba hc. This decomposition is a special case of the Peirce decomposition for finite-dimensional Euclidean Jordan algebras.
Covariance Estimation For a =α, a and b =β, b in V, the quadratic representation is defined by P ab = a a b a b = αβ + Ba bα, a + deta β, b..4 For a, b V, we write a = α 1 c 1 + α c + a 1 and b = β 1 c 1 + β c + b 1, where α 1,α,β 1,β R and a 1 =0, a, b 1 =0, b V 1. From the definition of c 1,c, note that α = α 1 + α and that β = β 1 + β. Then a triangular subgroup transformation is defined by T ab = α 1 β 1c 1 α 1 β 1 a 1 + α 1 α b 1 β 1 a B +α Ba b+α β c..5 We denote by T the set of all triangular transformations given by.5. From the fact that 1, 0 =c 1 + c and.5,wehave T b1, 0 =β 1c 1 β 1 0, b b B + β c, where b = β 1 c 1 0, b β c. Then we have T b1, 0 =a if we set β 1 = α α + Ba h, β = a B α + Ba h, b = 1 a Ba hh..6 α + Ba h 3 Estimation problem Consider that Λ : R W R p p S is a one-to-one Jordan algebra homeomorphism, i.e., Λ is a one-toone linear mapping such that Λxy =1/ΛxΛy+ΛyΛx for x, y R W. Here R p p S is the space of p p symmetric matrices and I p denotes the p p identity matrix. We denote by Det and Tr the determinant and the trace of linear transformations. Let X 1,X,..., X n be a random sample from a p-dimensional multivariate normal distribution { f Z z =π p/ DetΛσ 1/ exp 1 } z Λσ 1 z, 3.1 with σ Ω, and consider the problem of estimating σ based on X =X 1,X,..., X n. We define a Wishart random variable w =ω, w in the closure of Ω as TrXX Λa = a w for any a =α, a in R W. From this and the fact that Λ1, 0 =I p,wehave αtrxx +TrXX Λ0, a = αω +Ba w 3. for any a =α, a in R W. From Proposition 4 in Konno 007, we can see that w has a density with respect to the Lebesgue measure as follows: { f Ω w n, v, p, σ = np/ Γ Ω np/4 detσ np/4 detw np/4 v/ exp 1 } σ 1 w, 3.3 where Γ Ω s =π v / Π j=1γ s v j 1. Furthermore, it can be seen from Konno 007 that the maximum likelihood estimator for σ is given by σ mle = σ 0, σ, where σ 0 = 1 np TrXX and Ba σ = 1 np TrXX Λ0, a 3
Journal of Statistical Studies Y. Konno for any a W. We employ a loss function L σ, σ =σ 1 σ log det σ + log detσ, 3.4 where σ is an estimator for σ. Note that L, is a strictly convex of its first argument and that it is nonnegative and minimized at σ = σ as usual. The loss function is a counterpart of the usual Stein loss function for the problem of estimating a normal covariance matrix. The risk function is defined as Rˆσ, σ =E[σ 1 σ log detˆσ + log detσ ], where the expectation above is taken with respect to 3.3. 4 Minimax estimation In this section we give a brief summary of the minimax estimation theory for the mean matrix of the Wishart models on simple Jordan algebras of the degree. For a more detailed exposure to this theory see Konno 007. 4.1 Minimax estimator with constant risk To describe the minimax risk, we first consider a class of estimators having the form σtw=t σw 4.1 for any element T in the triangular transformation group T. Using.6, we decompose w = T b1, 0 where T b T and b = β 1 c 1 + β c +0, b with β 1 = ω + Bw h, β = ω w B ω + Bw h, b = 1 w Bw hh ω + Bw h. 4. Then a standard argument such as those in Muirhead 198 and Eaton 1989 shows that 4.1 holds if and only if, for some δ 1 > 0, δ > 0 and d 1 V 1, σw =T bδ 1 c 1 + δ c + d 1. 4.3 Furthermore, we can see that, from Proposition 11 in Konno 007, the estimator T bδ 1 c 1 + δ c with δ1 1 = np/+v, δ 1 = np/ v, 4.4 is minimax. Since the maximum likelihood estimator σ mle belongs to the class of estimators of the form 4.1, it is improved by the estimator T bδ 1 c 1 + δ c with 4.4. Furthermore, its minimax risk is s follow chi-squared distributions with the degree of freedom np/ v j 1 j =1,. To express the minimax estimator in terms of w =ω, w, set T bδ 1 c 1 + δ c =χ, x. Using.5,wehave given by j=1 {log δ j + E[log u j ]}, where u j T bδ 1 c 1 + δ c = δ 1β1 1, h δ 1β 1 0, b 1 δ1 β1 = + δ 1 0, b + δ β 4, δ 1 β1 δ β Furthermore note that, from 4., δ1 0, b + δ β 1, h δ 1 0, b h + δ1 β 1 b 4 0, b = tr 0, b 0, b = tr Bb b, 0 = b B = w B B w h. ω + Bw h 4. 4.5
Covariance Estimation Putting this equation and 4. into 4.5,wehaveT bδ 1 c 1 + δ c =χ, x, where χ = δ 1 = δ 1 β 1 + b δ β B + ω + Bw h+ w B B w h ω + Bw h + δ ω w B ω + Bw h and = δ 1 ω +ωbw h+ w B ω + Bw h + δ ω w B ω + Bw h x = δ 1 = δ 1 β 1 b B h + δ1 β 1 b δ β h ω + Bw h w B B w h ω + Bw h δ = δ 1 w + δ 1 ω w B ω + Bw h h ω w B ω + Bw h h δ ω w B ω + Bw h h. h + δ 1 w Bw hh 4. Orthogonally invariant estimators and their unbiased risk estimates To describe the orthogonally invariant estimators we need the following lemma which states the singular value decomposition of an element in Ω. Lemma 4.1 For any w =ω, w Ω, set z = Then we have w = P zλ 1 c 1 + λ c, where 1 4ω + w B Bw h w B ω, w w B h. 4.6 λ 1 = ω + w B and λ = ω w B. 4.7 Furthermore, we have DetP z = 1. Proof. The first assertion can be obtained from a straightforward application of Corollary 1 in Faybusovich and Tsuchiya 003. Then, using Lemma 1ii in Konno 007 and noting that det z =1, we can complete the proof. We use Lemma 4.1 to decompose the element w as { ω + w B w = P z 1, h+ ω w } B 1, h = P zω, w B h, where z is given by 4.6. For a decomposition of w stated in Lemma 4.1, we consider orthogonally invariant estimators of the form ϕ1 λ 1,λ σ ϕ = P z 1, h+ ϕ λ 1,λ 1, h, 4.8 where ϕ 1 and ϕ are differentiable functions from R to R. Let R # σ ϕ,σ=e[σ 1 σ ϕ log det σ ϕ ]. 5
Journal of Statistical Studies Y. Konno It is easily seen that comparison between two estimators of the form 4.8 in terms of the risk R is equivalent to that in terms of R #. The next theorem is a generalization of Lemma.1 in Dey and Srinivasan 1985 to the setting of the Lorentz Wishart distributions, which is derived from a general results in Konno 007. Lemma 4. Consider the estimators given by 4.8. Then an unbiased risk estimate for R # σ ϕ,σ is given as R # σ ϕ = { j=1 ϕ j λ j + np v ϕj +v ϕ } 1 ϕ log ϕ j, 4.9 λ j λ 1 λ i.e., we have R # σ ϕ,σ=e[ R # σ ϕ ]. 5 A new class of estimators Using Lemma 4., Konno 007 showed that the estimator σ m = P zδ 1 λ 1 c 1 + δ λ c 5.1 is minimax, where δ j s and λ j s are given by 4.4 and 4.7. However, this estimator does not satisfy a natural restrictions on the estimated eigenvalues, i.e., ϕ 1 ϕ in 4.8. We can construct explicit form of orthogonally invariant estimators which include estimators corresponding to those in Dey and Srinivasan 1985, Perron 199, and Takemura 1984. We again decompose w into P zλ 1 c 1 + λ c where P z, λ 1, and λ are defined as in Lemma 4.1, and we consider a class of estimators for σ, being of the form where γ is a positive constant, and σ γ = P z{φ γ 1 λ 1c 1 + φ γ λ c }, 5. φ γ λ γ 1 1 = λ γ 1 + δ 1 + λγ λγ λ γ 1 + δ λγ and φ γ λ γ = λ γ 1 + δ 1 + λγ 1 λγ λ γ 1 + δ. λγ Note that, if γ 1, the eigenvalues of the estimator 5. are order-preserving. If γ =1then the estimator 5. corresponds to an analogue of the estimator for the normal covariance matrix given by Perron 199 while it corresponds to an analogue of the estimator for the normal covariance matrix given by Takemura 1984 if γ =1/. Theorem 5.1 Let X =X 1,X,..., X n where X 1,X,..., X n are independently and identically distributed as 3.1 for some σ in Ω and assume that w is an element in the closure of Ω such that TrXX Λa = a w for any element a in R W. Then the estimators given by 5. are minimax if γ 1. Furthermore, the eigenvalues of the estimators satisfy the natural order φ 1 1 λ 1 φ 1 λ when γ =1. 6
Covariance Estimation Proof. We apply Lemma 4. with ϕ j = φ γ j λ j j =1,. From straightforward calculation we have j=1 ϕ j = δ 1 + δ + γλ 1λ γ λ j λ γ 1 + δ λγ 1 δ δ 1 + δ, 5.3 ϕ 1 ϕ λ 1 λ = { λ γ 1 λ γ 1 + δ 1 + λγ λγ λ γ 1 + δ λ 1 λγ = λγ 1 + λγ λ 1 λ +λ 1 λ λ γ 1 1 λ γ 1 λ γ 1 + λγ λ 1 λ λ γ λ γ 1 + δ 1 + λγ 1 λγ λ γ 1 + δ 1 λ }/λ 1 λ λγ δ 1 + λ 1λ λ γ 1 λ γ 1 1 λ γ 1 + λγ λ 1 λ δ = δ 1 + λ 1λ λ γ 1 1 λ γ 1 λ 1 λ λ γ 1 + λγ δ 1 δ δ 1, 5.4 provided γ 1. Furthermore, from the convexity of the logarithmic function and Jensen s inequality, we have log det σ γ = logφ γ 1 φγ + logλ 1λ {log λ j + log δ j }. 5.5 Putting 5.3-5.5 into 4.9 and using Corollary 9 in Konno 007, we have np R σ γ,σ δ 1 + δ + v δ 1 + δ +v δ 1 j=1 logδ 1 δ E[logλ 1 λ ] + log det σ = {E[log u j ] + log δ j}, j=1 where Lu j =χ np/ v j 1 j =1,. This completes the proof. Remark 5.1 Letting γ in 5. go to the infinity, we can see that the estimator σ γ tends to 5.1, an analogue of Dey and Srinivasan s estimator for the real normal covariance matrix, i.e., σ = σ m. On the other hand, if γ =1/, then the estimator σ 1/ becomes an analogue of an estimator of Takemura 1984 for the real normal covariance matrix. We conject that σ 1/ is minimax. However, we can not show that it is minimax because of the complex nature of the Jordan algebras of the degree. Acknowledgements The author would like to thank a referee for her/his careful reading of the manuscript and helpful comments. This work was in part supported by the Japan Society for the Promotion of Science through Grants-in-Aid for Scientific Research C No.17500185. References: Dey, D,K. and Srinivasan, C. 1985: Estimation of a covariance matrix under Stein s loss, Ann. Statist. 13, 1581-1591. Eaton, M.L. 1989: Group Invariance Application in Statistics, Regional conference series in Probability and Statistics Vol. 1, Institute of Mathematical Statistics. Faraut, J. and Korányi, K. 1994: Analysis on Symmetric Cones, Oxford Science Publications. Faybusovich, L. and Tsuchiya, T. 003: Primal-dual algorithms and infinite-dimensional Jordan algebras of finite rank, Math. Program Ser. B 97, 471-493. 7
Journal of Statistical Studies Y. Konno Haff, L.R. 199: The variational form of certain Bayes estimators, Ann. Statist. 19, 1163-1190. James, W. and Stein, C. 1961: Estimation with quadratic loss, in Proc. Fourth Berkeley Symp. Math. Statist. Prob. 1 361-380, Univ. California Press. Kiefer, J 1957: Invariance, minimax and sequential estimation and continuous time processes, Ann. Math. Statist. 8, 573-601. Konno, Y. 007: Estimation of normal covariance matrices parametrized by irreducible symmetric cones under Stein s loss, J. Multivariate Anal. 98, 95-316. Muirhead, R.J. 198: Aspects of Multivariate Statistical Analysis, John Wiley & Sons, Inc. Perron, F. 199: Minimax estimators of a covariance matrix, J. Multivariate Anal. 43, 6-8. Sheena, Y. and Takemura, A. 199: Inadmissibility of non-order-preserving orthogonally invariant estimators of the covariance matrix in the case of Stein s loss, J. Multivariate Anal. 41, 117-131. Stein, C. 1977: Lectures on the theory of estimation of many parameters, in Studies in the Statistical Theory of Estimation II. A. Ibragimov and M. S. Nikulin, eds.. Takemura, A. 1984: An orthogonally invariant minimax estimators of the covariance matrix of a multivariate normal populations, Tsukuba J. Math. 8, 365-376. Tolver Jensen, S. 1988: Covariance hypotheses which are linear in both the covariance and the inverse covariance, Ann. Statist. 16, 30-3. 6 Appendix 6.1 Proof of.4 Since a b =αβ + Ba b, αb + βa and a =α + a B, αa, wehave a a b = α, a αβ + Ba b, αb + βa = α β +αba b+β a B, αβ + Ba ba + α b, a b = α + a B, αa β, b = α β + β a B +αba b, α + a Bb +αβa. From these equations, we have P ab = α β +αba b+β a B, αβ + Ba ba + α b α β + β a B +αba b, α + a Bb +αβa = α β +αba b+β a B, αβ + Ba ba +α a B b = αβ + Ba bα, a+α a B β, b, which completes the proof of.4. 6. Proof of.5 For elements x, y in V, L is a map from V to V defined by Lxy = x y. Forz V 1, let τ c1 is a map from V to V defined by τ c1 zx =x 1 Lzx 1 + x 1 Lc Lz x 1 +Lc Lzx 1 + x, 6.1 where x = x 1 x 1 x is the Peirce decomposition with respect to the idempotent c 1 such that x 1 {y V c 1 y = y}, x {y V c y = y}, and x 1 V 1 = {y V c 1 y = c y =1/y}. From Faraut and Korányi 1994, it is seen that the map from Ω to the triangular subgroup of the general linear group over V is given by, for a = α 1 c 1 + α c +0, a Ω, T a =P a 1 τ c1 a 1 P a, 8
Covariance Estimation where a 1 = α 1 c 1 + c, a = c 1 + α c, and a 1 =0, a. For b = β 1 c 1 + β c + b 1 with b 1 =0, b, we first compute P a b as P a b = {La La }β 1 c 1 + β c + b 1 = La c 1 + α c β 1 c 1 + β c + b 1 c 1 + α c β 1 c 1 + β c + b 1 = La β 1 c 1 + α β c + 1 1 + α b 1 β 1 c 1 + α β c + 1 1 + αb 1 = β 1 c 1 +α β c + 1 1 + α b 1 β 1 c 1 + α β c + 1 1 + α b 1 = β 1 c 1 + α β c + α b 1. Next we use 6.1 and Lemma 1vii in Konno 007 in order to compute as τ c1 a 1 β 1 c 1 + α β c + α b 1 τ c1 a 1 β 1 c 1 + α b 1 + α β c = β 1 c 1 {La 1 β 1 c 1 + α b 1 } {Lc La 1 β 1 c 1 +Lc La 1 α b 1 + α β c }. Furthermore we have and from which it follows that Finally, Lc La 1 β 1 c 1 = β 1 Lc a 1 c 1 + c = β 1 a 1 c, Lc La 1 α b 1 = α Lc a 1 b 1 = α Lc Ba b, 0 = α Ba bc, τ c1 a 1 β 1 c 1 + α β c + α b 1 β1 = β 1 c 1 {β 1 a 1 + α b 1 } a 1 + α β +α Ba b c =: τ. T ab = La 1 {α 1 c 1 + c τ} α 1c 1 + c τ { α1 β 1 + β 1 =La 1 α 1 β 1 c 1 a 1 + α 1α + α b 1 } β1 a 1 +α Ba b+α β c α α 1 β 1c 1 1 β 1 + β 1 a 1 + α 1 α + α β1 b 1 a 1 +α Ba b+α β c = α β1 1β 1 c 1 {α 1 β 1 a 1 + α 1 α b 1 } a 1 +α Ba b+α β, which complete the verification of.5. 9