Merging Gini s Indices under Quadratic Loss

Size: px

Start display at page:

Download "Merging Gini s Indices under Quadratic Loss"

Roberta Woods
6 years ago
Views:

1 Merging Gini s Indices under Quadratic Loss S. E. Ahmed, A. Hussein Department of Mathematics and Statistics University of Windsor Windsor, Ontario, CANADA N9B 3P4 ahmed@uwindsor.ca M. N. Goria Universita Trento Trento, Italy February 11,

2 Abstract The Gini index is perhaps one of the most used indicators of economic and social condition. This article develops simultaneous estimation strategies of the Gini indices when samples are taken from several sources. We consider a basis for optimally combining various data sets. In a multi-sample scenario, we demonstrate that a shrinkage-type estimator has, under quadratic loss, a superior risk performance relative to the conventional estimators. Asymptotic distributional quadratic biases and risks of the proposed estimators are derived and compared with benchmark estimators. Key Words and Phrases: Gini index, Gini mean difference, restricted estimation, shrinkage-type estimation, local alternatives, quadratic loss function, asymptotic distributional quadratic risk, retrospective sampling. 1 INTRODUCTION The Gini index, proposed by Gini in 1912, of income or resource inequality is a measure of the degree to which a population shares that resource unequally. It is based on the statistical notion known in the literature as the mean difference of a population. The research on the Gini index has been developed remarkably and extended into various directions as evidenced by the bibliographies of Xu (2004), Yitzhaki (1998). Further, a substantial literature has been devoted to the construction of indices of economic inequality that are consistent with axiomatic systems of fairness. We refer to Anand (1983), Chakravarty (1990), Sen (1997) and Atkinson and Bourguignon (2000) for comprehensive surveys on the measures of inequality including Gini index. In order to define Gini s mean difference, suppose that X 1,, X n re independent and identically distributed (iid) with cumulative distribution function (cdf) F. We shall assume that F, instead of being restricted to a 2

3 parametric family, is completely unknown, subject only to some very general conditions such as continuity or existence of moments. The parameter θ = θ(f ) to be estimated is a real valued function defined over this nonparametric class F. The population Gini index is defined as γ = 2µ, (1) where is the population Gini s mean difference and is defined as The estimated Gini index is defined as = E X 2 X 1. (2) ˆγ = ˆ 2ˆµ, (3) where ˆµ and ˆ are consistent estimators of µ and, respectively. Further, an unbiased estimator of is given by ˆ = ( 1 ) n 2 i<j X j X i = 1 n n X n 2 j X i. (4) i=1 j=1 For large n, n( ˆ ) follows a normal distribution with finite variance (see Lehmann (1999) and Lee (1990)). 1968, 1970 and references therein) can be written as, ˆγ = 1 2n 2 x The Gini index estimator (see David, X j X i. (5) i=1 j=1 In the present investigation, the simultaneous estimation of Gini indices is considered in a multi-sample situation. Suppose that s independent random samples are obtained either from populations having similar characteristics or at different time points. Thus, we are interested in the analysis of sequences of the data sets collected in separate studies over time or space of the same phenomenon. Data sets obtained in this fashion is at times referred to as meta analysis. 3

4 Let γ l = l /(2µ l ) for l = 1, 2,, s and suppose there are s independent retrospective samples of size n 1, n 2,, n s acquired from these s populations. Denote the observed data by X li, i = 1, 2,, n l with distribution functions F l (x). The population parameters may differ from sample to sample due to a variety of reasons. We define Gini s mean difference parameter-vector as = ( 1, 2,, s ) and its covariance structure as Σ = σ 2 l I (s s), where I is an identity matrix. The population Gini index of each component is γ l = l 2ˆµ l, (6) where, l = E X lj X li. (7) If l and µ l are unknown, then the conventional estimator of ( l, µ l ) is ( ˆ l, ˆµ l ) where ˆ l = 1 ( nl 2 ) i<j Thus, the estimated componentwise Gini index is X lj X li, ˆµ l = 1 n l X li. (8) n l i=1 ˆγ l = ˆ l 2ˆµ l, (9) Now, we consider the estimation problem when the homogeneity of the indices is suspected. In other words, when the hypothesis H 0 : γ 1 = γ 2 = = γ l = γ 0 (unknown) (10) is thought to be true. For the sake of brevity, assume that under such equality of the Gini indices, the distributions of the X jl may be identical, i.e., F l F for all l. Our objective in this article is to consider the problem of simultaneous estimation of under the homogeneity of population indices. However, the 4

5 information in relation (10), regarding the homogeneity, is rather uncertain, and hence must be treated as uncertain prior information (UPI). Now, the question we would like to address in this investigation is: how to incorporate the given UPI in the estimation process? In other words, how to combine the UPI regarding the parameters with the available sample data. Our goal is to develop natural adaptive estimation methods that are free of subjective choices and tuning parameters and have superior risk performance under quadratic loss. In the context of the multi-parameter statistical models, we demonstrate a well-defined data-based shrinkage-type Gini index (SGI) estimator that combines estimation problems by shrinking a base (conventional) estimator to a plausible alternative estimator (estimator under restriction). Asymptotic results are demonstrated and the relationship of the SGI estimator to the family of Stein-rule (SR) estimators is discussed. 2 ESTIMATION STRATEGIES Our main interest is estimating the s-dimensional Gini indices parameter vector γ = (γ 1, γ 2,, γ s ). Throughout this paper, the boldface symbols will represent vectors/matrices. 2.1 Estimation Under Full Model For the full model, the conventional or unrestricted estimator (UE) of γ = (γ 1, γ s ), is defined as ˆγ = (ˆγ 1, ˆγ s ), where, ˆγ l = ˆ l /ˆµ l, l = 1, s. The following lemma provides the asymptotic distribution of the unrestricted estimator. Lemma 1 If EX 2 < then nl (ˆγ l γ l ) d N (0, τ 2 l ), (11) 5

6 where d means convergence in distribution, τ 2 l = 1 µ 2 l ( σ1l γ 2 l 2σ 2l γ l + σ 3l ), (12) σ 1l = σ 2l = σ 3l = x<y x<y x<y 2F l (x)[1 F l (y)]dxdy [(2F l (x) 1) + (2F l (y) 1)]F l (x)[1 F l (y)]dxdy 2(2F l (x) 1)(2F l (y) 1)F l (x)[1 F l (y)]dxdy, (13) and F l (.) is the distribution function for the l th population. The lemma can be proved using results in Stigler (1974) as in Ahmed et al. (2005). Also, it can be shown that ˆγ under quadratic loss is asymptotically minimax with constant quadratic risk trace[diag(τ 2 l )]. 2.2 Estimation Under Reduced Model Under the UPI that γ 1 = γ 2 = = γ s, we propose a restricted estimator (RE) of γ as follows ˆγ R = (ˆγ R, ˆγ R ) = ˆγ R 1 s, s ˆγ R = n lˆγ l /n, n = n n s, (14) l=1 where 1 s is an s-dimensional vector of one. We will show that ˆγ R has smaller asymptotic quadratic risk than ˆγ in an interval near the UPI at the expense of poorer performance in the rest of the parameter space induced by the UPI. Not only that, but also the risk function of ˆγ R becomes unbounded as the UPI error grows. If the prior information regarding homogeneity of the parameters is bad in the sense that the UPI error is large, the restricted estimator is inferior to ˆγ. Alternatively, if the information is good, i.e., the UPI error is small, ˆγ R offers a substantial gain over ˆγ. The above insight leads to shrinkage-type estimation when the information is rather suspicious and it is useful to construct a compromised estimator to identify model-estimator uncertainty. 6

7 2.3 The Shrinkage-type Base As one basis for identifying model-estimator uncertainty. Stein (1956) demonstrated the inadmissibility of the traditional maximum likelihood estimator (MLE) when estimating the s-variate normal mean vector θ under quadratic loss. Following this result, James and Stein (1961), Stein (1962), and Baranchik (1964) combined the s-variate MLE ˆθ with s-dimensional fixed null vector, under the normality assumption, as ˆθ S = (1 c/ ˆθ 0 2 )(ˆθ 0), where 0 < c < 2(s 2), and demonstrated that for s > 2 this estimator dominates the MLE. In an orthonormal s-mean context, Lindley (1962) suggested shrinking ˆθ towards the grand-mean estimator and demonstrated the risk dominance the the Stein Estimator. Ahmed and Saleh (1989), Green and Strawderman (1991) and Kim and White (2001) investigated the properties of the Stein-type estimator under various statistical model settings. Ahmed (1999) investigated the estimation problem of survivor functions in s independent sample setting based on reduced and full models, and demonstrated that, under quadratic loss, it yields the asymptotic distributional quadratic risk-dominating estimator ˆθ A = [1 (s 3)/T ](ˆθ ˆθ R ) + ˆθ, s > 3, where ˆθ R is the restricted estimator, and T = n(ˆσ n) 2 1 (ˆθ ˆθ R ) D(ˆθ ˆθ R ), where ˆσ n 2 is a consistent estimator of the nuisance parameter σ 2 and D = Diag(n 1 /n, n s /n), with n = n 1 + +n s. Clearly, this estimator resembles the Stein-rule estimator. Given this base, Ahmed et al. (2001), considered the simultaneous estimation of several intraclass correlation coefficients when independent samples are drawn from s multivariate normal populations and 7

8 provided an expression for the asymptotic risk and bias of the Stein-type estimator, ˆθ A. Now using the Stein-like base, provided by Ahmed (1999), we propose the following shrinkage-type estimator for the parameter vector, γ, as ˆγ = [1 (s 3)/Tn](ˆγ ˆγ R ) + ˆγ, s > 3, where T n = n(ˆτ 2 n) 1 (ˆγ ˆγ R ) D n (ˆγ ˆγ R ), ˆτ 2 n is a consistent estimator of the common value of τ 2 l, and D n = Diag(n l /n) with l = 1,..., s. The estimator ˆγ is in the general form of the Stein rule family of estimators, where shrinkage of the base estimator ˆγ is toward the alternative estimator ˆγ R. It is interesting to note that the proposed strategy is similar in spirit to the Bayesian model-averaging procedures. However, the main difference is that the Bayesian model-averaging procedures are not optimized with respect to any particular loss function. The present investigation is stimulated by prediction offered by Professor Efron in RSS News of January, The empirical Bayes/James-Stein category was the entry in my list least affected by computer developments. It is ripe for a computer-intensive treatment that brings the substantial benefits of James-Stein estimation to bear on complicated, realistic problems. A side benefit may be at least a partial reconciliation between frequentist and Bayesian perspectives as they apply to statistical practice. It may be worth mentioning that this is one of the two areas Professor Efron predicted for continuing research for the early 21st century. Shrinkage and likelihood-based methods continue to play vital roles in statistical inference. These methods provide extremely useful techniques for combining data from various sources. In passing we would like to remark 8

9 that the preliminary test estimation can also be used for tackling the uncertainty. However, making use of Stein-type estimator Sclove et al. (1972) demonstrated the non-optimality of preliminary test estimation as basis for dealing with model uncertainty. Hence, we confine here on Stein-type estimation, however for s < 3 preliminary test estimation may be a useful choice to tackle the estimation uncertainty. A plan for rest of the paper is as follows. We present some useful asymptotic results in Section 3 which form the basis for our study. Further, the expressions for bias and risk of the proposed estimators are given. The discussion on the risk behavior of the proposed estimators are contained in Section 4. Furthermore, some computed risk analyses are presented in the same section. 3 ASYMPTOTIC RESULTS We shall examine the properties of the proposed estimators under asymptotic set up in the light of the following weighted quadratic loss function: L(γ o, γ) = n(γ o γ) G(γ o γ), (15) where γ o is an appropriate estimator of γ and G is a given positive semidefinite matrix. Assume that G(y) = lim n P { n(γ o γ) y}. Then we define the asymptotic distributional quadratic risk (ADQR) by R(γ o, γ) = where G o = yy dg(y). y GydG(y) = trace(gg o ), (16) Further we consider the following contiguous sequence of alternatives to establish the needed asymptotic results: K (n) : γ = γ n, where γ n = γ o + λ n, λ a fixed real vector. (17) 9

10 Note that λ = 0 implies γ = γ o 1 s, so (10) is a particular case of {K (n) }. Based on regularity conditions no more stringent than the typical types of conditions assumed for establishing asymptotic properties of Gini indices, the proposed estimator also achieves similar properties. Noting that n(ˆγ ˆγ) G n(ˆγ ˆγ) = (s 3) 2 T 2 n { n(ˆγ ˆγ R ) G n(ˆγ ˆγ R )} (s 3) 2 {n(ˆγ ˆγ R ) G(ˆγ ˆγ R )} 1 {ch max (GD n 1 )} 2. where ch max (A) is the largest eigen-root of a matrix A. On the other hand, in the case of ˆγ R, n(ˆγ R γ) G(ˆγ R γ) p +, as n, (18) where p means convergence in probability. By virtue of the above the result, the ADQR of ˆγ R, for any γ H o, will approach + for large n. To compare the respective risk-performance of all the proposed estimators, we establish the following lemmas under the local alternatives, which facilitate the derivation of the ADQR of the proposed estimators. Lemma 2 Let U n = n(ˆγ γ o ), alternatives where ( Un V n ) N 2s {( λλ ), V n = n(ˆγ ˆγ R ), then under local ( τ 2 D 1 )} B B B as n, (19) λ = Hλ, H = I s JD, J = 1 s 1 s, D = lim(d n ), τ 2 = lim(ˆτ 2 n) B = τ 2 D 1 H Lemma 3 Let Z n = n(ˆγ R γ o ) then under local alternatives ( Zn V n ) N 2s {( 0λ ), ( τ 2 )} J 0 0 B 10 as n. (20)

11 By virtue of the above lemmas we shall present expressions for the asymptotic distributional bias (ADB) and ADQR of the estimators in the following section. Let Ψ s (x ; Λ) stand for the noncentral chi-square distribution with noncentrality parameter Λ and s degrees of freedom. Further, E (χ 2m s (Λ)) = 0 x m dψ s (x ; Λ). Let us define the asymptotic distributional bias (ADB) of an estimator γ o as B(γ o ) = lim n E{ n(γ o γ}. We present the expressions for the bias of the estimators in the following theorem. Using lemma 2 and 3, the bias of ˆγ is obtained by the same argument in Ahmed (2001) and by direct computations. Next, we establish the following lemma. Lemma 4 where, Λ = (τ 2 ) 1 λ Dλ. B(ˆγ R ) = λ, B(ˆγ ) = (k 3)λ E(χ 2 k+1 (Λ)), However, in an effort to present a clear analysis of various bias functions, first we transform various bias functions in scalar (quadratic) form by defining B (γ o ) = (τ 2 ) 1 [B(γ o )] D[B(γ o )] as the quadratic bias of the estimator γ o of a parameter vector γ. Thus, B (ˆγ R ) = Λ, B (ˆγ ) = (s 3) 2 Λ[E(χ 2 s+1(λ))] 2, The asymptotic bias functions of both estimators depend upon the parameters only through Λ. Hence, we investigate the behavior of the quadratic bias of the proposed estimators in terms of Λ. It is easy to see that the magnitude of bias of ˆγ R increases without a bound and tends to as Λ. The bias of ˆγ starts from 0 at Λ = 0 then increases to a point then decreases 11

12 towards 0, since E(χ 2 ν (Λ)) is a decreasing log-convex function of Λ. Since bias is a component of ADQR, we will discuss the ADQR of the estimators from here onwards. The expressions for ADQR are given in the following theorem. Theorem 1 For large n, and under local alternatives, the ADQRs of the estimators are given by R(ˆγ, γ) = τ 2 trace(gd 1 ), (21) R(ˆγ R, γ) = τ 2 trace(gd 1 ) trace(gc) + Λ G, (22) where Λ G = λ Gλ C = τ 2 (D 1 J), R(ˆγ, γ) = τ 2 trace(gd 1 ) + Λ G (k 3)(k + 1)E(χ 4 s+3(λ)) (k 3)trace(GC){2E(χ 2 s+1(λ)) (s 3)E(χ 4 s+1(λ))}, (23) Proof of (20) - (21) is fairly straightforward, (22) is obtained by using the same argument as in Ahmed (2001) and direct computations, therefore, we avoid the detail of the derivation. 4 RISK ANALYSIS FOR VARIOUS ESTI- MATORS In this section the large sample properties of the proposed estimators are discussed in the light of the quadratic loss function. The ADQR of ˆγ is constant (independent of Λ) with the value trace(gd 1 ), while the risk of ˆγ R becomes unbounded as the hypothesis error grows crossing the risk of ˆγ. Furthermore, we note that R(ˆγ R ; γ) R(ˆγ; γ) if Λ trace(gc). 12

13 Thus, ˆγ R dominates ˆγ in the interval [0, trace(gc)). Clearly, when Λ moves away from the origin beyond the value trace(gc), the ADQR of ˆγ R increases without a bound. We now turn to investigate the comparative statistical properties of the shrinkage-type estimator. First we compare it with ˆγ when Λ = 0. R(ˆγ; γ) R(ˆγ ; γ) = trace(gc)(s 3)E{2χ 2 s+1(0) (s 3)χ 4 s+1(0)} is a positive quantity. Hence, we conclude that the Stein-type estimator dominates ˆγ at this parametric value. Also, the maximum risk gain of ˆγ over ˆγ is achieved at the same point. To examine the risk behavior of ˆγ when Λ > 0, we characterize a class of positive semi-definite matrices G D = { trace(gd 1 ) ch max (GD 1 ) s + 1 } 2 where ch max (.) means the largest eigenvalue of (.). (24) In order to provide a meaningful comparison of the various estimators, we state the following theorem from linear algebra. Theorem 2 (Courant Theorem) If A and B are two positive semi-definite matrices with B nonsingular, both of order (s s), then ch min (AB 1 ) x Ax x Bx ch max(ab 1 ) where ch min ( ) and ch max ( ) mean the smallest and largest eigenvalues of ( ), respectively, and x is a column vector of order (s 1). We note that the above lower and upper bounds are equal to the infimum and supremum, respectively, of the ratio x Ax for x 0. Also, for B = I, x Bx the ratio is known as Rayleigh quotient for matrix A. 13

14 As a consequence of the Courant theorem, we have λ min (GD 1 ) λ Gλ λ Dλ λ max(gd 1 ), for λ 0 and G G D. Thus, under the class of matrices defined in relation (24) we conclude that for all λ, R(ˆγ ; γ) R(ˆγ; γ) where strict inequality holds for some λ. It clearly indicates the asymptotic inadmissibility of ˆγ under local alternatives relative to ˆγ. The risk of ˆγ begins with an initial value of 3 and increases monotonically towards trace(gd 1 ) as the value of Λ moves away from 0. The risk of ˆγ is uniformly smaller than ˆγ, where the upper limit is attained when λ. The result is valid as long the expectation in risk expression exists, which is the case whenever s 4. Next, R(ˆγ, γ) R(ˆγ R, γ) = trace(gc) s 3 s 1 trace(gd 1 ) > 0. (25) Therefore, the ADQR of ˆγ R is smaller than the ADQR of ˆγ when Λ = 0. Alternatively, when Λ departs from the initial value 0, in turn the value of E(χ 4 s+1(λ)) decreases, so ˆγ has smaller ADQR than ˆγ R. ˆγ dominates ˆγ in the rest of the parameter space. Hence, under local alternatives none of ˆγ and ˆγ R is asymptotically better than the other. To our knowledge, no estimator exists in the class that can outperform the optimality of the estimator based on reduced model (if true) in the entire parameter space. The present investigation reaffirms this unique characteristic of ˆγ R. It is noted that the risk of all the estimators depend on the matrices G and D. In order to facilitate numerical computation of the ADQR functions, we consider the particular case G = (τ 2 ) 1 D and obtain the value of risk expressions on a digital computer. With this substitution, we get R(ˆγ, γ) = s, R(ˆγ R, γ) = 1 + Λ and R(ˆγ, γ) = s + Λ(s 3)(s + 1)E(χ 4 s+3(λ)) (s 1)(s 3){2E(χ 2 s+1(λ)) (s 3)E(χ 4 s+1(λ))}, 14 (26)

15 We have plotted risk functions versus Λ at selected values of s. The graphical results in Figures 1 reinforce our theoretical findings that ˆγ dominates ˆγ. 5 CONCLUDING NOTE In this paper, we discussed estimation strategies for multi-sample Gini indices based on full and reduced models in the presence of uncertain prior information. The performance of the restricted estimator (i.e., under the UPI) heavily depends on the quality of non-sample information. However, one is seldom sure of the reliability of this information. We have presented shrinkage-type estimator for this multi-sample Gini indices. We find that ˆγ is relatively more efficient than ˆγ in the entire parameter space. It was also noted that ˆγ and can only be used for s > 3. It is worth mentioning that the assumption of homogeneity of population distributions under the hypothesis of equal Gini indices can be relaxed in which case the restricted estimator will be weighted by the asymptotic variances of the vector of Gini indices. References Ahmed, S. E. (1999). Simultaneous estimation of survivor functions in exponential lifetime models. Journal of Statistical Computation and Simulation 63, Ahmed, S. E., A. K. Gupta, S. M. Khan and C. J. Nicol (2001). Simultaneous estimation of several intraclass correlation coefficients. Annals of the Institute of Statistical Mathematics 53(2), Ahmed, S. E. and A. K. Md. E. Saleh (1989). Pooling multivariate data. Journal of Statistical Computation and Simulation 31,

16 Risk s=4 UE RE JS Risk s= Lambda Lambda Risk s=12 Risk s= Lambda Lambda Figure 1: Quadratic risk plotted as function of noncentrality parameter, Λ for the three estimators described above for various number of populations, s. 16

17 Ahmed, S.E. (2001). Shrinkage estimation of regression coefficients from censored data with multiple observations. Empirical Bayes and Likelihood Inference, Lecture Notes in Statistics, Editors: S.E. Ahmed and N. Reid. Springer-Verlag. Ahmed, S.E., A.A. Hussein and R. Ghori (2005). Gini mean difference and its applications in robust estimation.. Tecnical Report, University of Windsor, Canada. Anand, S. (1983). Inequality and poverty in Malaysia: Measurement and Decomposition. Oxford University Press. Atkinson, A.B. and F. Bourguignon (2000). Introduction: Income distribution and Economics, Handbook of Income Distribution. Elsevier. Baranchik, A.M. (1964). Multiple regression ans estimation of the mean of a multivariate normal distribution.. Technical Report 51, Stanford University, Dept. of Statistics. Chakravarty, S. R. (1990). Ethical Social Index Numbers. Springer. David, H.A. (1968). Gini s mean difference rediscovered. Biometrika 55, David, H.A. (1970). Order Statistics. Wiley. Green, Edwin J. and William E. Strawderman (1991). A James-Stein type estimator for combining unbiased and possibly biased estimators. Journal of the American Statistical Association 86, James, W. and C. Stein (1961). Estimation with quadratic loss. Proceeding of the Fourth Berkeley Symposium On Mathematical Statistics and Probability.. University of California Press, Berkeley, CA. 17

18 Kim, Tae-Hwan and Halbert White (2001). James-Stein-type estimators in large samples with application to the least absolute deviations estimator. Journal of the American Statistical Association 96(454), Lee, A. J. (1990). U-Statistics. Marcel Dekker, New York. Lehmann, E. L. (1999). Elements of Large-sample Theory. Springer-Verlag Inc. Lindley, D.V. (1962). Discussion of professor stein s paper. Journal of the Royal Statistical Society, B 24, Sclove, S. L., C. Morris and R. Radhakrishnan (1972). Non optimality of preliminary test estimation for the multinormal mean.. The Annals of Mathematical Statistics 43, Sen, A.K. (1997). On Economic Inequality.. Oxford:Clarendon Press. Stein, C. (1956). Inadmissibility of the usual estimator of the mean of a multivariate normal distribution. Proceeding of the Fourth Berkeley Symposium On Mathematical Statistics and Probability.. University of California Press, Berkeley, CA. Stein, C. (1962). Confidence sets for the mean of a multivariate normal distribution.. Journal of the Royal Statistical Society, B 24, Stigler, S.M. (1974). Linear functions of order statistics with smooth weight functions. Ann. Statist. 2, Xu, K. (2004). How has the literature on Gini s index evolved in the past 80 years?. Technical Report, Dalhousie University. Yitzhaki, S. (1998). More than a dozen alternative ways of speling gini.. Research in economic inequality 8,

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Journal of Statistical Research 007, Vol. 4, No., pp. 5 Bangladesh ISSN 056-4 X ESTIMATION OF AUTOREGRESSIVE COEFFICIENT IN AN ARMA(, ) MODEL WITH VAGUE INFORMATION ON THE MA COMPONENT M. Ould Haye School