Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Journal of Statistical Research 007, Vol. 4, No., pp. 5 Bangladesh ISSN 056-4 X ESTIMATION OF AUTOREGRESSIVE COEFFICIENT IN AN ARMA(, ) MODEL WITH VAGUE INFORMATION ON THE MA COMPONENT M. Ould Haye School of Mathematics and Statistics 5 Colonel By, Carleton University, Canada KS 5B6 Email: ouldhaye@math.carleton.ca summary In this paper we investigate the asymptotic properties of various estimators of autocorrelation parameter of an ARMA(,) model when uncertain non sample prior information on the moving average component is available. In particular we study the preliminary test and the shrinkage estimators of the autocorrelation parameter and we compare their efficiency with respect to the maximum likelihood estimator designated as the unrestricted estimator. It is shown that near the prior information on MA parameter, both preliminary test and shrinkage estimators are superior to the MLE while they lose their superiority as the MA parameter moves away from the prior information although preliminary test estimator gains its efficiency to some extent but the shrinkage estimator attains its lower bound of its efficiency. Keywords and phrases: Time series, ARMA processes, sub hypotheses, uncertain non sample prior information, restricted, preliminary test and shrinkage estimators, asymptotic relative efficiency Introduction Classical estimators of unknown parameters are based exclusively on the sample data. The notion of non sample information has been introduced to improve the quality of the estimators. We expect that the inclusion of additional information would lead to a better estimator. Since the seminal work of Bancroft (944) on preliminary test estimators, many papers in the area of the so called improved estimation have been published. Stein (956, 98) developed the shrinkage estimator for multivariate normal population and proved that it performs better than the usual maximum likelihood estimator in terms of the square error loss function. Saleh (006) explored these two classes of improved estimators in a variety of contexts in his recent book. Many other researchers have been working in this area, c Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka 000, Bangladesh. This research was supported by NSERC, grant no 8888

HAYE notably Sclove et al. (97), Judge and Bock (978), and Efron and Morris (97, 973). This type of analysis is also useful to control overmodelling of specific models in question. More specific reasons and benefits of such analysis may be found in the recent book of Saleh (006). To the best of our knowledge, these alternative estimators have not yet been applied to ARMA model parameter estimation. We consider the ARMA(,) model given by X t ρx t = ɛ t αɛ t. (.) where ɛ t are i.i.d. N (0, σ ). We are interested in the estimation of the autoregressive parameter ρ when it is suspected but one is not sure that α = α 0. Such a situation may arise when there is prior information that α = α 0 or is very close to α 0 and we want to reduce the number of parameters to be estimated in the model (.) and still improve on estimating ρ with better efficiency. More specifically we consider four estimators, namely ) maximum likelihood estimator MLE, ρ n (the unrestricted estimator); ) restricted estimator, ˆρ n which generally performs better than the MLE when α is equal to α 0 (or very close to it), but if α is away from α 0 the restricted estimator may be considerably poorer than the MLE. (see (Saleh, 99) in the over modeling context); 3) the preliminary test estimator PTE, ˆρ P n T. This estimator may be useful in case of uncertainty about the prior information α = α 0. We attempt to strike a compromise between ρ n and ˆρ n via an indicator function depending on the size γ of the preliminary test on α and 4) the shrinkage estimator ˆρ S n (see saleh (006)), which is basically a smoothed version of the PTE. We obtain explicit forms of the asymptotic distributional bias (ADB) and the asymptotic distributional MSE (ADMSE) of each of these estimators. We give a detailed comparison study of the relative efficiency of these estimators. Put h(z) = ρz and g(z) = αz and assume that ρ and α are less than. We know (see (Dzhaparidze 986)) that the spectral density of the process X t is given by f(λ) = σ g(z) with z = e iλ π h(z) Its covariance function r(k) satisfies the recurrence equation r(k) ρr(k ) = 0, k =, 3,... Let θ = (ρ, α, σ ) be the unknown parameter of the ARMA(,) model in (.). The

Estimation of Autoregressive Coefficient... 3 maximum likelihood estimator MLE ( ρ n, α n, σ n ) of θ satisfies where π π I n (λ) h(z) g(z) 4 cos λ α n ] dλ = 0 (.) π π and σ n = I n (λ) g(z) cos λ ρ n ] dλ = 0 (.3) π π I n (λ) = I n (λ) ρ n α n dλ (.4) n X j e ijλ j= is the periodogram of X, X,..., X n. The limit of the Fisher information matrix is given by Γ θ = ρ ρα ρα 0 α 0 0 0 σ 4 If we write θ = (θ, θ, θ 3 ), each entry (j, k) of Γ θ corresponds to 4π π π θ j ln f(λ) θ k ln f(λ)dλ For example the value ( ρα) in Γ θ is obtained as equal to 4π π π ln f(λ) ln f(λ)dλ, α ρ using the variable change z = e iλ, and taking γ = {z : z = } and applying the residuals theorem. The covariance matrix of ( ρ n, α n, σ n ) is given by Σ n = n(ρ α) ( ρ )( ρα) ( ρ )( α )( ρα) 0 ( ρ )( α )( ρα) ( α )( ρα) 0 +o(/n). 0 0 σ 4 (ρ α) Also we know (see (Dzhaparidze 986)) that as n, n( ρn ρ) = N (0, Σ) (.5) n( αn α)

4 HAYE where Σ = ( ρ )( ρα) ( ρ )( α )( ρα) (ρ α). (.6) ( ρ )( α )( ρα) ( α )( ρα) If we approximate the vector n( ρ n ρ, α n α) by its asymptotic normal distribution with the previous matrix we get after some calculations and E( ρ n ρ α n α) = ρ ρα ( α n α) (.7) E Var( ρ n α n α) ] = ( ρ ). (.8) The equality (.7) suggests one to consider the restricted estimator of ρ when α = α 0 as ˆρ n = ρ n ρ n ρ n α 0 ( α n α 0 ). (.9) Now due to uncertainty that α = α 0, we test H 0 : α = α 0 versus H a : α α 0 based on the test statistic L n = n( α n α 0 ) ( ρ n α 0 ) ( α 0 )( ρ nα 0 ) = n( α n α 0 ) (ρ α 0 ) ( α 0 )( ρα 0) + o P () (.0) which under H 0 and given (.5) and (.6), has asymptotically a χ distribution. Next we define the preliminary test estimator (PTE) as ˆρ P n T = ˆρ n I {Ln<χ (γ)} + ρ n I {Ln χ (γ)} (.) = ρ n ( ρ n ˆρ n )I {Ln<χ (γ)} = ρ n ρ n ρ n α 0 ( α n α 0 )I {Ln<χ (γ)} = ρ n ρ ρα 0 ( α n α 0 )I {Ln<χ (γ)} + o P () Instead of making extreme choices between ˆρ n and ρ n using the PTE, a nicer compromise would be a smooth choice that depends on the value of L n. This can be reached by considering ˆρ S n = ρ n ( ρ n ρˆ n ) c Ln where c is some constant = ρ n c( α 0) / ( ρ n) n ρn α 0 α n α o ( α n α 0 ) (.) = ρ n c( α 0) / ( ρ ) n ρ α0 α α 0 ( α n α 0 ) + o P (/ n)

Estimation of Autoregressive Coefficient... 5 Bias and MSE under Contiguous Alternatives We consider the alternative H n : α n = α 0 + δ n where δ 0. We give two theorems: one for the bias and the other one is for the MSE. Theorem. We have i) ii) n( αn α 0 ) = N ( δ, (ρ α 0 ) ( α 0 )( ρα 0) ) (.) lim E n( ρ n ρ)] = 0 lim E n(ˆρ n ρ) ] = lim E n( ρ n ρ ] ρ lim E n ] n( αn α o ) ρ n α 0 where we put iii) = 0 ρ ρα o δ = ρ ρα o δ = C(ρ, α 0 ) C(ρ, α 0 ) = ( ρ )( α 0 ) / δ(ρ α 0 ) and = ρ α 0 ( α 0 )/ ( ρα 0 ) lim E n(ˆρ P T n ρ)] = C(ρ, α 0 ) H 3 (χ (γ), ) where H m (x, ) denotes the cdf at x of a noncentral χ distribution with m degrees of freedom and noncentrality parameter / and where χ m (γ) is the γ level critical value under H m (x, 0). iv) lim E n(ˆρ S n ρ)] = c( α 0 )/ ( ρ ) Φ( ) ] = c C(ρ, α 0 ) Φ( ) ] ρ α 0 where Φ is the cdf of the standard normal distribution. Theorem. and hence i) lim E n( ρ n ρ) ] = (ρ α 0 ) ( ρ )( ρα 0 ) AVar( n( α n ) = (ρ α 0 ) ( α 0)( ρα 0 ) = α 0 ρ AVar( n ρ n ), (.)

6 HAYE ii) iii) lim E n(ˆρ n ρ) ] = ( ρ ) + C (ρ, α 0 ) lim E n(ˆρ P n T ρ) ] = Var( { n ρ n ) K(ρ, α 0 ) H 3 (χ (γ), ) ( )}] H 3 (χ (γ), ) H 5 (χ (γ), ) where for a random variable Y n, AVar(Y n ) represents the asymptotic variance and K(ρ, α 0 ) = ( ρ )( α 0) ( ρα 0 ) vi) lim E n(ˆρ S n ρ) ] = AVar( n ρ n ) + K(ρ, α 0 ) ( ρα 0) ( )] (ρ α 0 ) e / π Proof. (Theorem ) Of course (.) is straightforward from (.5). i) follows from (.7). ii) Obvious. iii) We use the following result: (see Judge and Bock (978) and Saleh (006)). If Z N (, ) then E(Zf(Z )) = E f(χ 3 ( )) ] (.3) and E(Z f(z )) = E f(χ 3 ( )) ] + E f(χ 5 ( )) ] (.4) We will use these equalities with the function and referring to (.5), f(z) = I {Z<χ (γ)} Z = (ρ α 0) n( α n α 0 ) ( α 0 )/ ( ρα 0 ) = AVar( n α n ) ] / n( αn α 0 ) (.5) That is, asymptotically Z N (, ). We obtain lim E n(ˆρ P T n ρ) ] = ρ lim ρα E ( ] n( α n α 0 )I {Ln<χ 0 (γ)} = ρ ( α 0 )/ ( ρα 0 ) H 3 (χ ρα 0 (ρ α 0 ) (γ), ) = C(ρ, α 0 ) H 3 (χ (γ), δ ).

Estimation of Autoregressive Coefficient... 7 vi) Let U be a standard normal distribution. iv) follows from ( αn α ) 0 E α n α 0 = P n( α n α 0 ) > 0 ] P n( α n α 0 ) < 0 ] = P (U > ) P (U < ) = Φ( ) Proof. (Theorem ) i) can be shown from (.6) since α = α n α 0. ii) We have lim E n(ˆρ n ρ) ] = lim Var n(ˆρ n ρ) ) ] + lim E n(ˆρ n ρ) ]. We only need to show that lim Var n(ˆρ n ρ)] = ( ρ ) since lim E n(ˆρ n ρ) ] is given by ii) in theorem. lim Var n(ˆρ n ρ)] = lim = Var n( ρ n ρ) (ρ α 0 ) ( ρ )( ρα 0 ) ( ) ρ + ( α ρα 0)( ρα 0 ) 0 ρ n n( αn α 0 ) ] ρ n α 0 ρ ρα 0 ( ρ )( α 0)( ρα 0 ) = ( ρ ) iii) Using (.4), (.3), and (.7), and some conditional expectation arguments, we get ( ) lim E n(ˆρ P n T ρ) ] = lim n( ρn E ρ) ρ ] n( αn α 0 )I ρα {Ln<χ (γ)} 0 = AVar( n ρ n ) ( + ρ ρα 0 ( ) ρ AVar( n α n )H 3 (χ ρα (γ), ) 0 ) AVar( n α n ) H 3 (χ (γ), ) H 5 (χ (γ), ) ]. Replacing AVar( n α n ) by its value in (.) we get the right hand side of iii). vi) We will use the fact that if Z N (, ), then E( Z ) = π e / + Φ( ) ] (.6) ]

8 HAYE From (.), we can write (in a similar way as in the calculations in iii) and applying (.6) with Z as in (.5), ( ) lim E n(ˆρ S n ρ) ] = lim n( ρn E ρ) c( α 0) / ( ρ n) α n α ] 0 ρ n α 0 α n α 0 = AVar( n ρ n ) + c ( α 0 )( ρ ) (ρ α 0 ) c( α 0 )/ ( ρ ) ρ ( AVar( ) / n α n ) ρ α 0 ρα 0 ( / π e + Φ( ) ] ) δ Φ( ) ]] = AVar( n ρ n ) + ( α 0)( ρ ) ] (ρ α 0 ) c / c π e The value of c that minimizes this quantity is c = π /. Making c independent of e, we chose c = /π. Plugging this optimal value in the previous equality we get the minimal value for MSE lim E n(ˆρ S n ρ) ] = AVar( n ρ n ) + ( α 0)( ρ ) ] (ρ α 0 ) e / π which can be written in the form of the right hand side of vi). 3 MSE under Fixed Alternative Hypothesis In this section we show that under fixed alternative α = α 0 + δ (3.) there is no gain in terms of MSE when we consider the PTE estimator. Theorem 3. Under (3.) we have Proof. We have lim E n(ˆρ P T n ρ n) ] = 0 E n(ˆρ P T ρ) ] ( ρ ) = n n ( αn α 0 ) I ρ n α {Ln<χ (γ)} 0 = n ( α n α 0 ) ( (ρ α0 ) ( ρ ) ( α α 0 ρα 0 ρ α 0 )I {Ln<χ (γ)} + o P () 0 ( ρ ) ( = α ρ α 0 )L n I {Ln<χ (γ)} + o P (). 0

Estimation of Autoregressive Coefficient... 9 Thus we need to show that ] lim E L n I {Ln<χ (γ)} = 0. (3.) Under the fixed alternative hypothesis (3.), L n = Z where asymptotically Z N (, ) with = nδ (ρ α 0 ) ( α 0 )( ρα 0). Now using (.4) and writing = na and χ (γ) = b the expectation in (3.) can be written as P (χ 3 (na) b) + nap (χ 5 (na) b) (3.3) With U N (0, ), clearly each probability in (3.3) is bounded by P (U na) b] = P b + na U b + na] P U na b] P U na] / as n. π na e (na) The last equivalence is well known, which may be shown via integration by parts. Therefore (3.) is established. We mention that this is not the case for the smoothed version ρ S n or for the restricted estimator ˆρ n. Actually we can see that lim E n(ˆρ S n ρ n) ] = c ( α 0 )( ρ ) (ρ α 0 ). 4 Comparative Study 4. Comparing Quadratic Bias Functions We note that from theorem, the usual PMLE estimate ρ n is asymptotically unbiased and the alternative estimates ˆρ n, ˆρ P n T and ˆρ S n have the following quadratic bias functions: i) B(ˆρ n ) = C (ρ, α 0 ), and ii) B(ˆρ P T n ) = C (ρ, α 0 ) H 3, iii) B(ˆρ S n ) = c C (ρ, α 0 )Φ( ) ]. We note that for = 0, B ( ρ n ) = B (ˆρ n ) = B (ρ P T n ) = B (ρ S n ) = 0

0 HAYE i.e. all the estimators are asymptotically unbiased. Also as, we get B (ρ P T n ) 0 while B (ρ S n ) π K (ρ, α 0 ), if we keep the value π for c. Hence, as, 0 = B (ρ P T n ) < B (ρ S n) < B (ˆρ n ), and for a given, we always have B (ρ P n T ) < B (ˆρ n ) but the other comparisons will depend on both and the level γ. Clearly we observe that in terms of smallest bias, the PTE estimator is better than the unrestricted estimator but the shrinkage estimator can give good results if we decide to choose the constant c very small. However, the usual PMLE remains the best one since it is asymptotically unbiased. The following graphical presentation in Figure () shows these properties. Figure : Graph of the asymptotic quadratic bias of the three estimators. 4. Asymptotic Relative Efficiency (ARE) In this section we compare the proposed estimators to the PMLE which we call here unrestricted estimator (UE). This comparison will be based on the ADMSE and the so-called asymptotic relative efficiency (ARE).

Estimation of Autoregressive Coefficient... Comparing the PTE Against the UE: Denote by ARE(ˆρ P n T : ρ n ) the ARE of ˆρ P n T reciprocal MSE s. So we have ARE(ˆρ P n T : ρ n ) = MSE( n ρ n ) MSE( nˆρ P n T ) = with respect to ρ n. That is the quotient of their ] K(ρ, α 0 )H 3 + (H 3 H 5 ) In order to make notations easier, we omitted the arguments (χ (γ), ) in the functions H 3 and H 5. From the previous equation we conclude the following: The graph of ARE(ˆρ P n T : ρ n ) as a function of for fixed γ, is decreasing crossing the line to a minimum at = min (γ), then it increases towards the line as. Thus, if (H 3 /(H H 3 ) then ˆρ P n T is better than ρ while if (H 3 /(H H 3 ), ρ is better than ˆρ P n T. The maximum of ARE(ˆρ P n T : ρ n ) at = 0 is given by ], max ARE(ˆρP T ( ρ )( α n : ρ n ) = KH 3 K = 0) ( ρα 0 ) where H 3 = H 3 (χ (γ), 0) for all γ A, the set of all possible values of γ and K = K(ρ, α 0). The value of max ARE(ˆρP T n : ρ n ) decreases as γ increases, while if γ = 0 and varies, the graph of ARE(ˆρ P n T : ρ n ) = and ARE(ˆρ P n T : ρ n ) intersects at =. In general ARE(ˆρ P n T : ρ n ) at γ = γ and γ intersect within the interval 0, the value of at the intersection increases as γ increases, therefore, for two values of γ, the ARE(ˆρ P n T : ρ n ) s will always be below the line. In order to obtain optimum level of significance γ for the application of PTE, we prefix a minimum guaranteed efficiency, say E 0 and follow the following procedure i) If 0, use ρ n because ARE(ˆρ P n T : ρ n ) is always in this region of. However, is generally unknown and there is no way to choose uniformly best estimator and look for the value of γ in the set A γ = {γ ARE(ˆρ P T n : ρ n ) E 0 }. ii) The PT estimator chosen maximizes ARE(ˆρ P n T solve the equation : ρ n ) over all γ A γ and. Thus we min ARE(ˆρP T n : ρ n ) = ARE(ˆρ P n T : ρ n )(γ, 0 ) = E 0. The solution γ obtained this way gives a PTE with minimum guaranteed efficiency of E 0 which may increase to ARE(ˆρ P n T : ρ n ) at = 0. The sample table in Table () gives the minimum and maximum ARE(ˆρ P n T : ρ n ) for chosen values of K over γ = 0.05(0.05)0.5. To illustrate the use of the table, let us set E 0 = 0.8 as the minimum guaranteed ARE(ˆρ P n T : ρ n ) for K = 0.5. Then we look for 0.8 or near it for K = 0.5 and find γ = 0.. Hence, using a PTE with γ = 0. we obtain a minimum guaranteed ARE of 0.8 with a possible maximum ARE of. if is close to 0.

HAYE Table : Maximum and minimum ARE of the PTE for selected K, γ values K 0.0 0.0 0.30 0.40 0.50 0.60 0.70 0.80 0.90.00 γ 0.05 ARE max.08.7.8.4.56.76.0.36.85 3.58 ARE min 0.87 0.77 0.69 0.63 0.58 0.53 0.49 0.46 0.43 0.4 4.84 4.84 4.84 4.84 4.84 4.84 4.84 4.84 4.84 4.84 0.0 ARE max.06.3.0.9.39.5.65.8.0.8 ARE min 0.9 0.84 0.78 0.7 0.68 0.64 0.60 0.57 0.54 0.5 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 4.00 0.5 ARE max.05.0.5..8.36.45.55.66.79 ARE min 0.94 0.88 0.83 0.79 0.75 0.7 0.68 0.65 0.6 0.59 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 0.0 ARE max.04.08..6..7.3.39.46.54 ARE min 0.95 0.9 0.87 0.83 0.80 0.77 0.74 0.7 0.69 0.66 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.4 0.5 ARE max.03.06.09..6.0.4.8.33.38 ARE min 0.96 0.93 0.90 0.87 0.84 0.8 0.79 0.77 0.75 0.7 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.4 3.4 0.30 ARE max.0.05.07.09..5.8..4.8 ARE min 0.97 0.95 0.9 0.90 0.87 0.85 0.83 0.8 0.79 0.78.89.89.89.89.89.89.89.89.89.89 0.35 ARE max.0.03.05.07.09..3.6.8.0 ARE min 0.98 0.96 0.94 0.9 0.90 0.89 0.87 0.85 0.84 0.8.89.89.89.89.89.89.89.89.89.89 0.40 ARE max.0.03.04.05.07.08.0..3.5 ARE min 0.98 0.97 0.95 0.94 0.93 0.9 0.90 0.89 0.87 0.86.89.89.89.89.89.89.89.89.89.89 0.45 ARE max.0.0.03.04.05.06.07.08.0. ARE min 0.99 0.98 0.97 0.95 0.94 0.93 0.9 0.9 0.90 0.89.89.89.89.89.89.89.89.89.89.89 0.50 ARE max.0.0.0.03.04.04.05.06.07.08 ARE min 0.99 0.98 0.97 0.97 0.96 0.95 0.94 0.94 0.93 0.9.56.56.56.56.56.56.56.56.56.56

Estimation of Autoregressive Coefficient... 3 Comparing the Shrinkage Estimator Against the UE We have ARE(ˆρ S n : ρ n ) = + ( ρ )( α 0 ) ]] (ρ α 0 ) e / π Following a similar discussion as we did for the relative efficiency for PTE, we can see from this equation that for ln 4 ˆρ S n is better than ρ n and for Also, as, ln 4 ARE(ˆρ S n : ρ n ) which is the lower bound of ARE(ˆρ S n : ρ n). ρ n is better than ˆρ S n ] + ( ρ )( α 0) (ρ α 0 ) π if The maximum ARE(ˆρ S n : ρ n) is π K ( ρα 0) ] KH3 (ρ α 0 ) (χ (γ), 0)] H 3 (χ (γ), 0) π ( ρα 0 ) (ρ α 0 ). Comparing the Restricted Estimator RE Against UE From vi) in theorem, we can rewrite the asymptotic MSE of nˆρ n as and we can write the MSE of n ρ n as and therefore we can conclude ( ρ ) + ( ρ ) α 0 (ρ α 0 ) ] ( ρ ) ( ρα 0) (ρ α 0 ) = ( ρ ) + ( ρ )( α 0 ) ] (ρ α 0 ) if then RE is better than UE and if then UE is better than RE The graphs in the Figure () of the ARE s depict the ARE properties of the estimators.

4 HAYE Figure : Graph of ARE for shrinkage estimator for selected values of γ, α, K. Acknowledgments The author is grateful Professor A. K. Md. E. Saleh for his invaluable help and his insightful suggestions about this paper. References ] Bancroft, T.A. (944). On biases in estimation due to the use of preliminary tests of significance. Ann. Math. Statist. 5, 90 04. ] Dzhaparidze, K. (986). Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Time Series. Springer Verlag. 3] Efron, B. and Morris, C. (97). Limiting the risk of Bayes and Empirical Bayes estimators. Part II: the empirical Bayes case. J. Amer. Statist. Assoc.. 67, 30 39.

Estimation of Autoregressive Coefficient... 5 4] Efron, B. Morris, C. (973). Stein s estimation rule and competitors. An empirical Bayes approach. J. Amer. Statist. Assoc. 68, 7 30. 5] Judge, G.G. and Bock, M.E. (978). The Statistical Implications of Pre test and Stein rule Estimation in Econometrics. North Holland, New York. 6] Saleh A.K.Md.E. (99.) On Shrinkage Estimation of The Parameters of An Autoregressive Gaussian Process. Theory of Probab. and its Appl. 50 60. 7] Saleh, A.K.Md.E. (006). Theory of Preliminary Test and Stein Type Estimation With Applications. Wiley InterSience. Wiley & Sons Inc. New York. 8] Sclove, S.L., Morris, C. and Rao, C.R. (97). Non optimality of preliminary test estimators for the mean of a multivariate normal distribution. Ann. Math. Statis. 43, 48 490. 9] Stein, C. (956). Inadmissibility of the usual estimator for the mean of multivariate normal distribution. Proceedings of the 3rd Berkeley Symposium on Math. Statist. and Probab. Univ. of California Press, Berkeley., 97 06. 0] Stein, C. (98). Estimation of the mean of a multivariate normal distribution. Ann. of Statist. 9, 35 5.