An (almost) unbiased estimator for the S-Gini index

An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for nteger parameter values. Smulatons ndcate that these estmators perform consderably better then the usual estmators, especally for small sample szes. 1 The absolute and relatve S-gn ndces Assume that ncome s dstrbuted accordng to a contnuous and dfferentable cumulatve dstrbuton functon (cdf F : [0, ] [0, 1] wth fnte mean, µ, and contnuous populaton densty functon (pdf f. The absolute sngle-seres Gn (absolute S-Gn, A, and the Relatve sngle seres Gn (relatve S-Gn, R, wth parameter R ++ are gven by: A µ H and R 1 H µ, wth H 0 x (1 F (x 1 df (x. These ndces exst for all values of 1, but for values of < 1 t s possble that H reaches nfnty. From now on, we assume that H s well defned for all values of under consderaton. The parameter determnes the weght attached to the ncome of ndvduals at dfferent ponts n the ncome dstrbuton. As ncreases, more weght s gven to the bottom of the ncome dstrbuton. For equal to one, H 1 s equal to the mean µ and R 1 and A 1 are both equal to zero. For equal to 2, the ndces A 2 and R 2 reduce to the well-known absolute and relatve Gns. We refer to Donaldson and Weymark (1980, Ytzhak (1983 and Bossert (1990 for an n depth dscusson of the propertes related to the S-Gn ndex. I am pleased to acknowledge the nsghtful comments of Drk Van de gaer. Unversty of Ghent, Sherppa, Tweekerkenstraat 2, B-9000 Gent, Belgum. E-mal: thomas.demuynck@ugent.be 1

The most common fnte sample estmators for the S-Gns are gven by: A n µ n H n and R n 1 H n wth H n n µ n ( (n + 1 (n x Here x represents the th smallest value n the sample (the th order statstc and µ n s the sample mean, n x /n. The estmators A n and Rn are strongly consstent estmators for A and R and they are asymptotcally normally dstrbuted (Barrett and Pendakur, 1995; Ztks and Gastwrth, 2002. Unfortunately, they are not unbased and ther bas depends on the sample sze, n, the value of the parameter,, and the dstrbuton, F. The sample mean µ n s an unbased estmator for the populaton mean µ, hence, for the absolute S-Gn, A, we only need to construct an unbased estmator for the term H. Such estmator would also provde us wth an almost unbased estmator for R. Ths last estmator s not unbased because t s dvded by the sample mean whch s tself an estmator of the populaton mean. The next secton provdes an unbased estmator of H and the last secton provdes smulaton results to compare these estmators wth the estmators A n and Rn. n 2 A unbased estmator for H { n We denote by the strlng number of the second knd wth upper ndex n and lower k { n ndex k. The number represents the number of ways that a set of sze n can be k parttoned nto k subsets. We denote by the bnomal coeffcent wth upper ndex k n and lower ndex k,.e. the number of k element subsets of an n element set. Fnally, we denote by k the fallng factoral n(n 1... (n k + 1. The followng denttes 1 wll be used n ths secton: ( ( n n, R-1 k n k { { n n 1 k k 1 x r r + k 1 See Graham et al. (1989 for a proof of these denttes. { n 1, R-2 k { r x, R-3 2

( ( n n k k, R-4 k (x + y n x y n. R-5 We focus on the case where the parameter takes only nteger values. Assume that we have a set of observatons {x 1,..., x n that s drawn..d. from the cdf F. The th order statstc x wll have pdf f ( equal to: f ( (x F (x 1 (1 F (x n f(x. The expected value of H n equals: E(H n 1 n ( n ((n + 1 (n xf (x 1 (1 F (x n df (x 0 In order to smplfy ths expresson we splt t up nto several parts: E(Hn 1 x (n + 1 F (x 1 (1 F (x n n 0 {{ A 1 {{ A (n F (x 1 (1 F (x n df (x. {{ (1 B 1 {{ B We have that: ( n A 1 (n + 1 ( n 1 n(n + 1 1 ( (n + 1 +1 n n + 1 +1 { ( + 1 n n + 1 ( n B 1 (n ( n 1 (R-4 n(n 1 { (R-1 n { (R-3, R-4 3 ( n 1 n n +1 ( n 1 1 (R-4 (R-1, R-3.(R-4, R-1

+1 { ( + 1 n. (R-1 1 1 These results enable us to smplfy A and B: A +1 { ( + 1 n F (x 1 (1 F (x n 1 +1 1 1 1 { + 1 F (x 1 (1 F (x n 1 +1 + 1 (1 F (x 1. R-5 B 1 { ( n 1 +1 F (x 1 (1 F (x n 1 +1 1 F (x 1 (1 F (x n 1 +1 (1 F (x R-5 +1 1 (1 F (x 1. Substtutng A and B nto equaton (1 gves: E(Hn 1 +1 ({ { + 1 x n 1 (1 F 1 df (x 1 1 +1 x n 1 n 1 1 (1 F 1 df (x R-2 H. (2 Equaton (2 shows that the expected value of H n can be expressed as a weghted average of all ndces H m wth m. As such, the estmator H n wll not be unbased unless H m 4

s zero for all m. Equaton 2 allows us to construct an unbased estmator of H n a recursve way. For 1, we have that E(Hn 1 H 1 µ. Hence, Hn 1 s an unbased estmator of H. 1 Now, assume that we have an unbased estmator h m n of H m for all m n {1, 2,..., 1. Then we can construct followng estmator h n of H : ( h n 1 1 { n Hn h n. (3 Ths estmator s unbased: ( ( E(h 1 1 { n E n Hn h n ( 1 n E ( 1 Hn E ( h n H. The unbased estmator for A s then gven by a n µ n h n and the almost unbased estmator for R s gven by r n 1 h n/µ n. For the Gn ndex,.e. 2, t can be shown that r 2 n nr n/(n 1. Ths s n agreement to the frst order correcton for the Gn ndex found n the lterature (see Deaton, 1997; Deltas, 2003; Davdson, 2009. It can be shown that h n s equal to the followng expresson 2 : h n n 1 x. (4 The multplcators n 1 / sum to one 3 whch mples that, analogue to the estmators Hn, the estmators h n are a weghted average of the order statstcs x. Also, note that the weghts attached to the 1 hghest ncomes are equal to zero. Ths mples that the estmator h n does not use all avalable nformaton. For example, the value of h 10 n on a sample of sze 10 concdes wth the smallest value n the sample. Smple manpulaton of equaton (4 shows that we can wrte h n as a x, wth 2 See appendx A. 3 See appendx B /n ( f 1 a a 1 1 1 n ( 1 for > 1. (5 5

For 1, as ncreases, the weghts attached to x decrease n an ncreasng rate untl they reach zero for x n +2. The recurson (5 shows that the estmator h n s very easy to calculate. It also makes t possble to defne h n for non-nteger values of. Unfortunately, ths extenson has the unwanted sde-effects that the weghts a do no longer sum to unty, although s wll approxmate unty f n s not to small, and that the estmator s no longer unbased. 3 Smulaton For our emprcal llustraton we used a lognormal dstrbuton wth parameters 9.85 and 0.6. Our populaton statstcs A and R were calculated on the bass of a random sample of 50 mllon observatons. We drew 200.000 ndependent samples of sze m (m 10, 30, 50. For each of these samples, we calculated the estmators A m, a n, R m and r m. Table 1 presents the averages over these 200.000 samples (standard errors are between brackets for the values 1.5; 2; 5; 7.5 and 10. Smulaton results for other parameter values and other dstrbutons gve smlar results. Table 1: smulaton results sample sze A n a n A Rn rn R 1.5 10 4296 (1764 4378 (1852 0.1853 (0.0495 0.1886 (0.0530 30 4708 (1115 50 4799 (882 2 10 6733 (2600 30 7217 (1574 50 7307 (1227 5 10 11545 (3837 30 12193 (2250 50 12319 (1751 7.5 10 12729 (4082 30 13526 (2395 50 13671 (1863 10 10 13398 (4246 30 14287 (2488 4802 (1146 4870 (899 7481 (2890 7466 (1628 7455 (1252 12515 (4055 12509 (2289 12508 (1768 13853 (4346 13900 (2438 13895 (1882 14722 (4598 14715 (2539 4941 7458 12505 13894 14706 6 0.2059 (0.033 0.2105 (0.0262 0.2908 (0.0696 0.3158 (0.0431 0.3207 (0.0344 0.5011 (0.0938 0.5345 (0.0535 0.5411 (0.0415 0.5545 (0.0993 0.5936 (0.0556 0.60113 (0.0428 0.5828 (0.1026 0.6268 (0.0569 0.2100 (0.034 0.2136 (0.0268 0.3231 (0.0774 0.3267 (0.0446 0.3273 (0.0347 0.5438 (0.0982 0.5484 (0.0541 0.5494 (0.0417 0.6043 (0.1065 0.6101 (0.0563 0.61098 (0.0430 0.6414 (0.1151 0.6457 (0.0580 0.2177 0.3286 0.5508 0.6123 0.6480

Table 1: smulaton results sample sze A n a n A Rn rn R 50 14443 14698 0.6353 0.6466 (1942 (1947 (0.0437 (0.0441 NOTE: These smulatons were based on the lognormal dstrbuton: ln X N(9.85, 0.6. The statstcs R and A were based on a random sample of 10 mllon observatons. Each average was computed over a set of 200.000 samples. Standard errors are between brackets. We observe followng regulartes: For nteger parameter values, the estmators r n and a n performs consderably better then the estmators R n and A n. For nonnteger parameter values one can clearly see that the estmator a n s no longer unbased although the bas decreases for larger sample szes and larger parameter values. Furthermore, the estmators r n and a n seem to perform consderably better n comparson to the estmators A n and R n. The standard errors for the estmators r n and a n are slghtly larger compared to the standard errors for the estmators R n and A n. References Barrett, G. F., Pendakur, K., 1995. The asymptotc dstrbuton of the generalzed gn ndces of nequalty. Canadan Journal of Economcs 28, 1042 1055. Bossert, W., 1990. An axomatzaton of the sngle-seres gns. Journal of Economc Theory 50, 82 92. Davdson, R., 2009. Relable nference for the gn ndex. GREQAM Document de Traval nr 2007-23. Deaton, A. S., 1997. The analyss of household surveys: a mcroeconometrc approach to development polcy. John Hopkns Unversty Press for the World Bank, Baltmore. Deltas, G., 2003. The small-sample bas of the gn coeffcent: results and mplcatons for emprcal research. The Revew of Economcs and Statstcs 85, 226 234. Donaldson, D., Weymark, J. A., 1980. A sngle-parameter generalzaton of the gn ndces of nequalty. Journal of Economc Theory 22, 67 86. Graham, R. L., Knuth, D. E., Patashnk, O., 1989. Concrete Mathematcs. Addson-Wesley. Ytzhak, S., 1983. Relatve deprvaton and the gn coeffcent. Internatonal Economc Revew 93, 617 628. 7

Ztks, R., Gastwrth, J., 2002. The asymptotc dstrbuton of the s-gn ndex. Australan and New Zealand Journal of Statstcs 44, 439 446. A Equvalence of equaton 3 and 4 The proof s by nducton on. For 1 we easly establsh that both equatons 3 and 4 reduce to µ n. Assume that the asserton holds for all m <. The proof follows f we can show that: n Hn h n. where h n s gven by equaton 4. n Hn (n + 1 (n x x x x { n + 1 x n (R-1 n 1 ((n + 1 (n + 1 n 1 h n. n 1 x 8

B h n s a weghted sum We show that the weghts n 1 sum to one. n 1 (n! n ( 1!(n + 1! ( /( n n 1 n 1 1 n 1 ( /( n n 1 n k k The last step uses the dentty: problem 1, p173. 1. k0 m k0 ( / m k k m + 1 m + 1 n ( 1!(n! (n 1! (see Graham et al., 1989, 9