An (almost) unbiased estimator for the S-Gini index

Similar documents
Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Random Partitions of Samples

Linear Approximation with Regularization and Moving Least Squares

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Limited Dependent Variables

Linear Regression Analysis: Terminology and Notation

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Testing for seasonal unit roots in heterogeneous panels

Computing MLE Bias Empirically

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

x = , so that calculated

Chapter 3 Describing Data Using Numerical Measures

First Year Examination Department of Statistics, University of Florida

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Remarks on the Properties of a Quasi-Fibonacci-like Polynomial Sequence

The Order Relation and Trace Inequalities for. Hermitian Operators

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

The decomposition of inequality and poverty

Problem Set 9 Solutions

Polynomial Regression Models

Economics 130. Lecture 4 Simple Linear Regression Continued

III. Econometric Methodology Regression Analysis

Lecture 4 Hypothesis Testing

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Zhi-Wei Sun (Nanjing)

Lecture 3 Stat102, Spring 2007

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

e i is a random error

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

NUMERICAL DIFFERENTIATION

First day August 1, Problems and Solutions

Estimation: Part 2. Chapter GREG estimation

arxiv: v1 [math.co] 12 Sep 2014

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

2.3 Nilpotent endomorphisms

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Perfect Competition and the Nash Bargaining Solution

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

= z 20 z n. (k 20) + 4 z k = 4

Lecture 17 : Stochastic Processes II

Sampling Theory MODULE V LECTURE - 17 RATIO AND PRODUCT METHODS OF ESTIMATION

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

/ n ) are compared. The logic is: if the two

1 The Mistake Bound Model

Errors for Linear Systems

Basically, if you have a dummy dependent variable you will be estimating a probability.

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

T E C O L O T E R E S E A R C H, I N C.

Modeling and Simulation NETW 707

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Foundations of Arithmetic

Statistical Hypothesis Testing for Returns to Scale Using Data Envelopment Analysis

COMBINATORIAL IDENTITIES DERIVING FROM THE n-th POWER OF A 2 2 MATRIX

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

More metrics on cartesian products

Chapter 13: Multiple Regression

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Convergence of random processes

Homework Assignment 3 Due in class, Thursday October 15

Managing Capacity Through Reward Programs. on-line companion page. Byung-Do Kim Seoul National University College of Business Administration

Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Vapnik-Chervonenkis theory

APPENDIX A Some Linear Algebra

Lecture 3: Probability Distributions

Asymptotics of the Solution of a Boundary Value. Problem for One-Characteristic Differential. Equation Degenerating into a Parabolic Equation

Exercises. 18 Algorithms

Structure and Drive Paul A. Jensen Copyright July 20, 2003

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

P exp(tx) = 1 + t 2k M 2k. k N

Fuzzy Boundaries of Sample Selection Model

Numerical Heat and Mass Transfer

Credit Card Pricing and Impact of Adverse Selection

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

Properties of Least Squares

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Binomial transforms of the modified k-fibonacci-like sequence

Lecture 4: Universal Hash Functions/Streaming Cont d

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Econometrics of Panel Data

Generalized Linear Methods

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

A Comparative Study for Estimation Parameters in Panel Data Model

Estimating Lorenz Curves Using a Dirichlet Distribution

A Note on Test of Homogeneity Against Umbrella Scale Alternative Based on U-Statistics

Transcription:

An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for nteger parameter values. Smulatons ndcate that these estmators perform consderably better then the usual estmators, especally for small sample szes. 1 The absolute and relatve S-gn ndces Assume that ncome s dstrbuted accordng to a contnuous and dfferentable cumulatve dstrbuton functon (cdf F : [0, ] [0, 1] wth fnte mean, µ, and contnuous populaton densty functon (pdf f. The absolute sngle-seres Gn (absolute S-Gn, A, and the Relatve sngle seres Gn (relatve S-Gn, R, wth parameter R ++ are gven by: A µ H and R 1 H µ, wth H 0 x (1 F (x 1 df (x. These ndces exst for all values of 1, but for values of < 1 t s possble that H reaches nfnty. From now on, we assume that H s well defned for all values of under consderaton. The parameter determnes the weght attached to the ncome of ndvduals at dfferent ponts n the ncome dstrbuton. As ncreases, more weght s gven to the bottom of the ncome dstrbuton. For equal to one, H 1 s equal to the mean µ and R 1 and A 1 are both equal to zero. For equal to 2, the ndces A 2 and R 2 reduce to the well-known absolute and relatve Gns. We refer to Donaldson and Weymark (1980, Ytzhak (1983 and Bossert (1990 for an n depth dscusson of the propertes related to the S-Gn ndex. I am pleased to acknowledge the nsghtful comments of Drk Van de gaer. Unversty of Ghent, Sherppa, Tweekerkenstraat 2, B-9000 Gent, Belgum. E-mal: thomas.demuynck@ugent.be 1

The most common fnte sample estmators for the S-Gns are gven by: A n µ n H n and R n 1 H n wth H n n µ n ( (n + 1 (n x Here x represents the th smallest value n the sample (the th order statstc and µ n s the sample mean, n x /n. The estmators A n and Rn are strongly consstent estmators for A and R and they are asymptotcally normally dstrbuted (Barrett and Pendakur, 1995; Ztks and Gastwrth, 2002. Unfortunately, they are not unbased and ther bas depends on the sample sze, n, the value of the parameter,, and the dstrbuton, F. The sample mean µ n s an unbased estmator for the populaton mean µ, hence, for the absolute S-Gn, A, we only need to construct an unbased estmator for the term H. Such estmator would also provde us wth an almost unbased estmator for R. Ths last estmator s not unbased because t s dvded by the sample mean whch s tself an estmator of the populaton mean. The next secton provdes an unbased estmator of H and the last secton provdes smulaton results to compare these estmators wth the estmators A n and Rn. n 2 A unbased estmator for H { n We denote by the strlng number of the second knd wth upper ndex n and lower k { n ndex k. The number represents the number of ways that a set of sze n can be k parttoned nto k subsets. We denote by the bnomal coeffcent wth upper ndex k n and lower ndex k,.e. the number of k element subsets of an n element set. Fnally, we denote by k the fallng factoral n(n 1... (n k + 1. The followng denttes 1 wll be used n ths secton: ( ( n n, R-1 k n k { { n n 1 k k 1 x r r + k 1 See Graham et al. (1989 for a proof of these denttes. { n 1, R-2 k { r x, R-3 2

( ( n n k k, R-4 k (x + y n x y n. R-5 We focus on the case where the parameter takes only nteger values. Assume that we have a set of observatons {x 1,..., x n that s drawn..d. from the cdf F. The th order statstc x wll have pdf f ( equal to: f ( (x F (x 1 (1 F (x n f(x. The expected value of H n equals: E(H n 1 n ( n ((n + 1 (n xf (x 1 (1 F (x n df (x 0 In order to smplfy ths expresson we splt t up nto several parts: E(Hn 1 x (n + 1 F (x 1 (1 F (x n n 0 {{ A 1 {{ A (n F (x 1 (1 F (x n df (x. {{ (1 B 1 {{ B We have that: ( n A 1 (n + 1 ( n 1 n(n + 1 1 ( (n + 1 +1 n n + 1 +1 { ( + 1 n n + 1 ( n B 1 (n ( n 1 (R-4 n(n 1 { (R-1 n { (R-3, R-4 3 ( n 1 n n +1 ( n 1 1 (R-4 (R-1, R-3.(R-4, R-1

+1 { ( + 1 n. (R-1 1 1 These results enable us to smplfy A and B: A +1 { ( + 1 n F (x 1 (1 F (x n 1 +1 1 1 1 { + 1 F (x 1 (1 F (x n 1 +1 + 1 (1 F (x 1. R-5 B 1 { ( n 1 +1 F (x 1 (1 F (x n 1 +1 1 F (x 1 (1 F (x n 1 +1 (1 F (x R-5 +1 1 (1 F (x 1. Substtutng A and B nto equaton (1 gves: E(Hn 1 +1 ({ { + 1 x n 1 (1 F 1 df (x 1 1 +1 x n 1 n 1 1 (1 F 1 df (x R-2 H. (2 Equaton (2 shows that the expected value of H n can be expressed as a weghted average of all ndces H m wth m. As such, the estmator H n wll not be unbased unless H m 4

s zero for all m. Equaton 2 allows us to construct an unbased estmator of H n a recursve way. For 1, we have that E(Hn 1 H 1 µ. Hence, Hn 1 s an unbased estmator of H. 1 Now, assume that we have an unbased estmator h m n of H m for all m n {1, 2,..., 1. Then we can construct followng estmator h n of H : ( h n 1 1 { n Hn h n. (3 Ths estmator s unbased: ( ( E(h 1 1 { n E n Hn h n ( 1 n E ( 1 Hn E ( h n H. The unbased estmator for A s then gven by a n µ n h n and the almost unbased estmator for R s gven by r n 1 h n/µ n. For the Gn ndex,.e. 2, t can be shown that r 2 n nr n/(n 1. Ths s n agreement to the frst order correcton for the Gn ndex found n the lterature (see Deaton, 1997; Deltas, 2003; Davdson, 2009. It can be shown that h n s equal to the followng expresson 2 : h n n 1 x. (4 The multplcators n 1 / sum to one 3 whch mples that, analogue to the estmators Hn, the estmators h n are a weghted average of the order statstcs x. Also, note that the weghts attached to the 1 hghest ncomes are equal to zero. Ths mples that the estmator h n does not use all avalable nformaton. For example, the value of h 10 n on a sample of sze 10 concdes wth the smallest value n the sample. Smple manpulaton of equaton (4 shows that we can wrte h n as a x, wth 2 See appendx A. 3 See appendx B /n ( f 1 a a 1 1 1 n ( 1 for > 1. (5 5

For 1, as ncreases, the weghts attached to x decrease n an ncreasng rate untl they reach zero for x n +2. The recurson (5 shows that the estmator h n s very easy to calculate. It also makes t possble to defne h n for non-nteger values of. Unfortunately, ths extenson has the unwanted sde-effects that the weghts a do no longer sum to unty, although s wll approxmate unty f n s not to small, and that the estmator s no longer unbased. 3 Smulaton For our emprcal llustraton we used a lognormal dstrbuton wth parameters 9.85 and 0.6. Our populaton statstcs A and R were calculated on the bass of a random sample of 50 mllon observatons. We drew 200.000 ndependent samples of sze m (m 10, 30, 50. For each of these samples, we calculated the estmators A m, a n, R m and r m. Table 1 presents the averages over these 200.000 samples (standard errors are between brackets for the values 1.5; 2; 5; 7.5 and 10. Smulaton results for other parameter values and other dstrbutons gve smlar results. Table 1: smulaton results sample sze A n a n A Rn rn R 1.5 10 4296 (1764 4378 (1852 0.1853 (0.0495 0.1886 (0.0530 30 4708 (1115 50 4799 (882 2 10 6733 (2600 30 7217 (1574 50 7307 (1227 5 10 11545 (3837 30 12193 (2250 50 12319 (1751 7.5 10 12729 (4082 30 13526 (2395 50 13671 (1863 10 10 13398 (4246 30 14287 (2488 4802 (1146 4870 (899 7481 (2890 7466 (1628 7455 (1252 12515 (4055 12509 (2289 12508 (1768 13853 (4346 13900 (2438 13895 (1882 14722 (4598 14715 (2539 4941 7458 12505 13894 14706 6 0.2059 (0.033 0.2105 (0.0262 0.2908 (0.0696 0.3158 (0.0431 0.3207 (0.0344 0.5011 (0.0938 0.5345 (0.0535 0.5411 (0.0415 0.5545 (0.0993 0.5936 (0.0556 0.60113 (0.0428 0.5828 (0.1026 0.6268 (0.0569 0.2100 (0.034 0.2136 (0.0268 0.3231 (0.0774 0.3267 (0.0446 0.3273 (0.0347 0.5438 (0.0982 0.5484 (0.0541 0.5494 (0.0417 0.6043 (0.1065 0.6101 (0.0563 0.61098 (0.0430 0.6414 (0.1151 0.6457 (0.0580 0.2177 0.3286 0.5508 0.6123 0.6480

Table 1: smulaton results sample sze A n a n A Rn rn R 50 14443 14698 0.6353 0.6466 (1942 (1947 (0.0437 (0.0441 NOTE: These smulatons were based on the lognormal dstrbuton: ln X N(9.85, 0.6. The statstcs R and A were based on a random sample of 10 mllon observatons. Each average was computed over a set of 200.000 samples. Standard errors are between brackets. We observe followng regulartes: For nteger parameter values, the estmators r n and a n performs consderably better then the estmators R n and A n. For nonnteger parameter values one can clearly see that the estmator a n s no longer unbased although the bas decreases for larger sample szes and larger parameter values. Furthermore, the estmators r n and a n seem to perform consderably better n comparson to the estmators A n and R n. The standard errors for the estmators r n and a n are slghtly larger compared to the standard errors for the estmators R n and A n. References Barrett, G. F., Pendakur, K., 1995. The asymptotc dstrbuton of the generalzed gn ndces of nequalty. Canadan Journal of Economcs 28, 1042 1055. Bossert, W., 1990. An axomatzaton of the sngle-seres gns. Journal of Economc Theory 50, 82 92. Davdson, R., 2009. Relable nference for the gn ndex. GREQAM Document de Traval nr 2007-23. Deaton, A. S., 1997. The analyss of household surveys: a mcroeconometrc approach to development polcy. John Hopkns Unversty Press for the World Bank, Baltmore. Deltas, G., 2003. The small-sample bas of the gn coeffcent: results and mplcatons for emprcal research. The Revew of Economcs and Statstcs 85, 226 234. Donaldson, D., Weymark, J. A., 1980. A sngle-parameter generalzaton of the gn ndces of nequalty. Journal of Economc Theory 22, 67 86. Graham, R. L., Knuth, D. E., Patashnk, O., 1989. Concrete Mathematcs. Addson-Wesley. Ytzhak, S., 1983. Relatve deprvaton and the gn coeffcent. Internatonal Economc Revew 93, 617 628. 7

Ztks, R., Gastwrth, J., 2002. The asymptotc dstrbuton of the s-gn ndex. Australan and New Zealand Journal of Statstcs 44, 439 446. A Equvalence of equaton 3 and 4 The proof s by nducton on. For 1 we easly establsh that both equatons 3 and 4 reduce to µ n. Assume that the asserton holds for all m <. The proof follows f we can show that: n Hn h n. where h n s gven by equaton 4. n Hn (n + 1 (n x x x x { n + 1 x n (R-1 n 1 ((n + 1 (n + 1 n 1 h n. n 1 x 8

B h n s a weghted sum We show that the weghts n 1 sum to one. n 1 (n! n ( 1!(n + 1! ( /( n n 1 n 1 1 n 1 ( /( n n 1 n k k The last step uses the dentty: problem 1, p173. 1. k0 m k0 ( / m k k m + 1 m + 1 n ( 1!(n! (n 1! (see Graham et al., 1989, 9