Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005) Abstract Using the uniform statistical distribution and ordered uniform sacings, this aer rovides a oint for imroving our understanding of the Lorenz curve and the Gini co-efficient of inequality G, under random allocation. Starting with ordered uniform sacings it establishes, under random allocation, the joint moment generating function of these observations, and the exact distribution of the individual ordered observations. These rovide a basis for understanding the Gini co-efficient of inequality and the associated Lorenz curve. For examle, the exected value of G, under random allocations, is not unexectedly, (-1)/2*, and obviously aroaches 0.5 as increases. The aer, following Durbin (1965), uses a simle test to establish if the co-efficient is from a random allocation. Additionally, the aer develos the associated Lorenz curve, which takes the form L(z)=z+(1-z)log(1-z), 0<z<1, and hence leads to the negative exonential distribution underlying the results. It suggests three extensions covering articular values of G. 1. Introduction This aer will look at the ordered uniform sacings. This aer follows a considerable number of authors who have looked at uniform sacings (for examle: Barton & David (1956), Durbin (1961), Lewis (1965), Pyke (1965) and Stehens (1986)). It will start from the basics and aly the results to the Lorenz curve and the Gini co-efficient with a view to a better understanding of both the Lorenz curve and the Gini co-efficient. 2. Foundations Let v i (i=1,,n; v j > v i for j>i) be n observations uniformly distributed on the interval [0,a]. Define w i = v i - v i-1 (i=1,,n+1; v 0 =0, v n+1 = a).

Let x i = w i where w i is the ordered set of the w s (i.e. w i > w i-1 ), such that 0 < x i < x i+1 < a and Σ x i =a, where the summation is over all values of i=1,,=n+1. These x i are the ordered uniform sacings. We will also define x 0 =0. Some authors have used the terms D ( i) or c (i) to describe these sacings. 3. Gini co-efficient The Gini co-efficient of inequality is then, using the above x s, given by the following: G = 2(Σix i ) (+1)a. a Again the summation is over all values of x i (i=1,,). The minimum G is 0, when all the x s are equal, that is x i = a/, for all i. The maximum G is (-1)/ when x i = 0 (i=1,,-1) and x = a. This aroaches 1 as increases. 4. Distribution function of the x i s The first art of this section will outline the calculations used to roduce the joint moment generating function of the x s. The next art will show the individual moment generating functions. The final art will give the individual distribution functions. Additionally, as art of this section, the exected value of the Gini coefficient G is derived. 4.1 Joint Moment Generating Function of the x i s Here the aim is to find the Joint Moment Generating Function of the x i s. Σs i x i Σs i x i That is, E(e ) = K e π dx j S where S is the Sace such that 0 < x i < x i+1 < a, Σ x i = a, and K is a standardising function of and a, so that the integral is unity if all the s i are 0. (The summations and roduct are over i,j=1,,.) 4.11 Using the transformation u i = a (i*x -i+1 ) - Σ x j j=1 with u 1 0, u +1 a, max(u i ) = u i+1 and du i = -idx -i+1, it is easy to show that -i

s(u i - u i-1 ) E(e ) = K a -1 Σ (as) j _, indeendent of i.! j=0 (+j-1)! Since, when s=0, this must be 1, this imlies that K =!(-1)!. a -1 4.12 It follows that E(u i+1 u i ) = c, a constant indeendent of i. i.e. E(i(x -i+1 x -i )) = c. Hence E(x i) = c Σ (1/k), k=-i+1 for all i. But Σ x i =a. i=1 Hence E(x i ) = (a/) Σ (1/k), k=-i+1 for all i. We can also exress this as i E(x i ) = (a/) Σ (1/(-k+1)) for all i. k=1 For examle E(x 1 ) = a/ 2, E(x 2 ) = (a/) ((1/) + (1/(-1))), and E(x ) = (a/) Σ (1/k). k=1 If =2, E(x 1 ) = a/4 and E(x 2 ) = 3a/4. If =3, E(x 1 ) = a/9, E(x 2 ) = 5a/18 and E(x 3 ) = 11a/18. Not surrisingly, for any, they add to a. 4.13 Using the above it follows that E(Σix i ) = Σ {(a/) Σ (1/k) } = (3+1)a/4. i=1 i=1 k=-i+1 Hence E(G) = 2*E(Σix i )) (+1)a = 2*((3+1)/4) (+1) a = (-1)/2. This is exactly half way between the minimum (i.e. 0) and the maximum (i.e. (-1)/).

4.14 It follows, from 4.11, that Σs i (u i - u i-1 ) as i E(e ) = (-1)! Σ e -1 a -1 s i π (s i - s j ) where the first summation is over i=1,,, the second over i=2,, and the roduct over j=2,, excet j=i. 4.15 After noting the results of Stehens (1991), for examle, for integer m, any s i s j, Σ 0 (0 < m < -2), and i=0 π (s i - s j ) j i Σ -1 s i 1, i=0 π (s i - s j ) j i where the roducts are over all values of j, but j i, this leads to Σs i x i t i E(e ) = (-1)! Σ e _ i=1 π (t i - t j ) j i where t i = a Σ s k, the average of the last i s k s, multilied by a. i k=-i+1 s i m The first summation is over i=1,, and the roduct over j=1,, and again j i. 4.16 Examles Let =2, then we have s 1 x 1 + s 2 x 2 t 1 t 2 E(e ) = e + e _ (t 1 t 2 ) (t 2 t 1 ) as 2 a(s 1 + s 2 )/2 = e + e _ a(s 2 (s 1 + s 2 )/2) a((s 1 + s 2 )/2 - s 2 ) as 2 /2 as 2 /2 as 1 /2 = 2 e ( e - e ). a(s 2 s 1 ) Putting s 1 = 0, leads to the moment generating function of x 2, which is uniform on the interval [a/2,a]. Alternatively, utting s 2 = 0, leads to the moment generating function of x 1, which is also uniform, but on the interval [0,a/2].

In the limit as s 2 s 1 = s, we have not unexectedly, s 1 x 1 + s 2 x 2 as E(e ) = e. That is the oint a with robability 1. Generally, the variance var(x 1 ) = var(x 2 ) = a 2 /48, and the covariance cov(x 1, x 2 ) = -a 2 /48. Hence the correlation corr(x 1, x 2 ) = -1. This makes sense, intuitively. It says that if two eole have a fixed amount between them, then what one erson has the other erson does not have. 4.17 Let =3, then we have s 1 x 1 +s 2 x 2 +s 3 x 3 t 1 t 2 t 3 E(e ) = 2e + 2e _ + 2e _ (t 1 t 2 )(t 1 t 3 ) (t 2 t 1 )(t 2 t 3 ) (t 3 t 1 )(t 3 t 2 ) as 3 = 2 e a 2 ( s 3 (s 2 +s 3 )/2) (s 3 (s 1 +s 2 +s 3 )/3) a(s 2 + s 3 )/2 + 2 e a 2 ((s 2 +s 3 )/2 s 3 ) ((s 2 +s 3 )/2 (s 1 + s 2 +s 3 )/3) a(s 1 + s 2 + s 3 )/3 + 2 e. a 2 ((s 1 +s 2 +s 3 )/3 s 3 ) ((s 1 +s 2 +s 3 )/3 - (s 2 +s 3 )/2) Then we have, for examle, that corr (x 1, x 2 ) = 1/ (28), corr (x 1, x 3 ) = -5/ (52) and corr (x 2, x 3 )=- 8/ (91). It is easy to show that the correlation matrix is singular indicating that the three variables x 1, x 2 and x 3 are not indeendent, in an analogous way to the examle when =2. 4.2 Individual moment generating functions By utting s k =s and s j =0, j k and for ease of resentation here, and in the rest of section 4, we ut a=1, the individual moment generating functions are sx k E(e ) =!(-1)! Σ e s/i i -2 (-1) i+k++1 s -1 (-k)! i=-k+1 (-i)! (i-+k-1)! together with other terms such that any negative owers of s disaear.

4.21 Examles For examle, if =2, we have as mentioned in section 4.16, sx 1 sx 2 E(e ) = 2 (e s/2 1) and E(e ) = 2 (e s e s/2 ); s s that is the uniform distributions on [0,½] and [½,1], resectively. 4.22 For examle, if =3, sx 1 sx 2 E(e ) = 6 (3e s/3 3 -s), E(e ) = 12 (2e s/2 3e s/3 +1) and s 2 s 2 sx 3 E(e ) = 6 (e s - 4e s/2 +3e s/3 ). s 2 Note, for examle, in both situations, the minimum ower of the exonential of the largest sacing is e s/. This reflects the fact that the minimum value of the largest sacing is, if a=1, 1/. 4.23 An unexected air of results sx k It is easy to show that Σ E(e ) =! Σ s j-+1 /j!, k=1 s -1 j=-1 sx k and that Σ k E(e ) = *! Σ s j-+1 /j! - (-1)*! Σ (s/2) j-+1 /j! k=1 j=-1 2 j=-1 These results which are not intuitively obvious, are useful in determining G r when we transform x i x ir = x i r, for varying r>0. 4.3 Individual robability density functions For the sake of comleteness we include the individual density functions. 1/c Using the fact that e sx (1-cx) r dx = r! c r e s/c and other terms which x=0 s r+1 do not involve the exonential, leads to the following: ( f km (x) - ( 1/(-m+2) < x < 1/(-m+1), 2< m < k) f k (x)= ( f k1 (x) - ( 0 < x < 1/ ) ( 0 (elsewhere ) where k f km (x) = (-1)! Σ (-1) k-j (k-1)! (1-(-j+1)x) -2, 1 < k <. (k-1)! (-k)! j=m ( j-1)!(k-j)!

These are the individual robability density functions. For any value of, it gives the robability density function of the kth. function using, for examle, k=1 for the smallest value and k= as the largest value. It also uses m as a variable, increasing one at a time, allowing each articular segment of the individual functions to be secified. We use the term segment to indicate the range over which a articular function is valid. Where a function has the value 0 (e.g. f 1m (x) 0, for m>2, and that f 1 (x) 0) we will not treat this as a segment. For > 3, there is no discontinuity in the value of the function at the end of each segment, although there will, for most values of k, be a discontinuity in the (-2)th. derivative. 4.31 Examles The smallest density function is of the form a single segment, f 1 (x) = f 11 (x) = (-1)(1-x) -2 (0 < x < 1/, > 2). Hence E(x)=1/ 2 and the variance, var(x) =(-1)/( 4 (+1)). The second smallest density function is of the form two segments, ( f 21 (x) = (-1) 2 ((1-((-1)x)) -2 (1-x) -2 ) (0 < x < 1/), f 2 (x) = ( ( f 22 (x) = (-1) 2 (1-((-1)x)) -2 (1/ < x < 1/(-1)); > 2. The density function of the largest sacing is given by, (> 2), f (x) = f m (x) = (-1) Σ (-1) -j (-1)! (1-(-j+1)x) -2 j=m ( j-1)!(-j)! ( 1/(-m+2) < x < 1/(-m+1), 2< m < ). This has (-1) segments. 4.32 Limiting robability density functions It is reasonably easy to comute, using the mean and standard deviation, the limiting robability density function (i.e. as ) of the smaller distributions. There would aear to be no limiting robability density function of the larger values. This aears to be rimarily because Lt Σ 1/j j=1 does not have a finite limit, but that Lt (1/) Σ 1/j 0. j=1

5. Distribution of the Gini co-efficient We now aly all this to the Gini co-efficient of inequality. An equivalent result was resented by Durbin (1965). 5.1 Using the result from section 4.15, and utting s k = (k*s) then we have, again utting a=1, that t k = s*(2+1-k)/2. Σs i x i s*σix i t i E(e ) = E(e ) = (-1)! Σ e _ i=1 π (t i - t j ) j i s(2+1-i)/2 = (-1)! Σ e _ i=1 π (j-i)(s/2) -1 j i = e s (1-e -s/2 ) -1 = e s(+1)/2 * (e s/2 1) -1. (s/2) -1 (s/2) -1 This is the moment generating function of the sum of (-1) uniform distributions over [0,½] with a ositive dislacement of (+1)/2. I.e. it has a mean value of (3+1)/4 and a variance of (-1)/48. This was ut forward by Durbin (1965). 5.2 Hence using the equation for the Gini co-efficient, we have that the moment generating function of the Gini co-efficient is given by Mgf(G) = (e s/ 1) -1. (s/) -1 I.e. the Gini co-efficient of inequality is, under the uniform distribution, the sum of (-1) uniform distributions over the interval [0,1/]. The exectation is then (-1)/(2*), with variance (-1)/(12* 2 ). Hence for moderately large, a 95% confidence interval for G, is given by the interval: (0.5-1.96/(2 (3)), 0.5 + 1.96/(2 (3)) ). 6. The shae of the Lorenz curve In this section we find the general shae of the Lorenz curve using the exected values as derived in section 4.12. We have that E(x k) = (a/) * Σ (1/j). j=-k+1 Therefore, the value of the Lorenz curve at the value i on the x-axis is given by

i i Σ E(x k) = Σ (a/) * Σ (1/j) = (a/) * Σ (j-+i)/j k=1 k=1 j=-k+1 j=-i+1 We will, again, ut a=1, but also let i=z where 0 < z < 1. Then the oint on the curve at the oint where the x-axis has the value z is given by (1/) * Σ (j-+z)/j j=(1-z)+1 = 1 Σ (1 - (1-z)/j ) j=(1-z)+1 = 1 * ( Σ 1 ) - (1-z) * Σ (1/j) j=(1-z)+1 j=(1-z)+1 For moderately large, this is z ((1-z)* log e (/(1-z))) = z + (1-z) log (1-z) Lorenz curve - under a) Perfect Equality, and b) the Uniform Distribution. 1.0 0.9 0.8 Equality 0.7 0.6 0.5 0.4 0.3 Uniform 0.2 Distribution 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Hence the equation of this is given by L(z)= z + (1-z) log (1-z) (0< z< 1). Hence by utting =1, we relate the curve to the roortion of the oulation and obtain the following Lorenz curve L(z)= z + (1-z) log (1-z) (0 < z < 1).

This is the equation of the Lorenz curve under the assumtion that the original data comes from the Uniform random distribution. The grah shows the shae and comares this with the straight-line equality Lorenz curve. 6.1 Table of values and associated information The table, below, shows values on the Lorenz curve. It shows for each value, the cumulative amount that this grou will have (i.e. the oint on the Lorenz curve), together with in the third column, the average (using 1 unit as the overall average) that this grou will have. Following the blank column, in the fifth column, we have those who have not been counted in the first column. The sixth column shows the amounts that these individuals will have. The final column shows the average that these individuals will have. Table showing values and associated information on the Lorenz curve roduced using the exected value of the ordered uniform sacings. Proortion Cumulative Amount Average Amount Above oint Amount above oint Average above the oint 0.0000 0.0000-1.0000 1.0000 1.0000 0.1000 0.0052 0.0518 0.9000 0.9948 1.1054 0.2000 0.0215 0.1074 0.8000 0.9785 1.2231 0.3000 0.0503 0.1678 0.7000 0.9497 1.3567 0.4000 0.0935 0.2338 0.6000 0.9065 1.5108 0.5000 0.1534 0.3069 0.5000 0.8466 1.6931 0.6000 0.2335 0.3891 0.4000 0.7665 1.9163 0.6321 0.2642 0.4180 0.3679 0.7358 2.0000 0.7000 0.3388 0.4840 0.3000 0.6612 2.2040 0.8000 0.4781 0.5976 0.2000 0.5219 2.6094 0.9000 0.6697 0.7442 0.1000 0.3303 3.3026 1.0000 1.0000 1.0000 0.0000 0.0000-6.2 Examles That is, under the uniform distribution, for examle, for the lowest 10% oint, will have between them 0.52% of resources, an average of 0.0518 units. The remaining 90% of individuals will have 99.48% of resources, an average of 1.1054 units. At the oint where 90% are accounted for, these will have 66.97% of resources, an average of 0.7442 units. The remaining 10% will have 33.03% of resources, an average of 3.3026 units.

6.3 Characteristics of this Lorenz curve This section briefly looks at the characteristics of the curve. As noted above and as exected, L(0)=0, L(1)=1. 6.31 Derivatives and Area under the curve L (z)= -log(1-z). This is greater than 0 for z> 0 and tends to infinity as z 1. L (z) = 1/(1-z). This is 1 for z=0 and again tends to infinity as z 1. The area under the curve, to the oint z, is given by z [ y + (1-y) log(1-y) ] dy y=0 = z(3z-2) - (1-z) 2 log(1-z). 4 2 Hence, not unexectedly, the total area under the curve is, on utting z=1, 0.25. Hence the Gini co-efficient of inequality is 0.5 as exected. 6.32 The overall average When the gradient of the curve is 1 this is the maximum horizontal distance between the line of equality and this curve. This is the value of the average, i.e. the individual has 1 unit. Here L (z 1 )= -log(1-z 1 )=1, i.e. z 1 = (e-1)/e =63.21%. U to this oint the cumulative amount is 26.42% of resources, giving an average of 0.4180 units. The remainder (i.e. 36.79%) have 73.58% of the resources, i.e. an average of 2.00 units. These figures are also given in the table. 6.33 Relationshi of a articular value to the average of the higher values Exanding on the revious section, for a articular z, the cumulative amount is L(z)= z+((1-z)*log(1-z)), the actual value is log(1-z), the average amount to this oint is 1+((1-z)*log(1-z)/z). Further for the remaining 1-z, the total amount that is left is 1-L(z) = 1-[z+(1-z)log(1-z)] = ((1-z)*(1-log(1-z))). Hence the average for these higher values is (1-log(1-z)). Hence, for a articular z, if we let v = -log(1-z) (0 < z <1) be a articular value, then the average of the values above this is given by 1+v.

This is trivially obvious for v=0; in the revious section we noted it for v=1, and now conclude that it is generally true. 6.34 To round off our excursion we note that for any Lorenz curve we have the following: Z L F (z)= µ -1 F -1 (x) dx, 0 < z < 1, 0 where F -1 (x) = su {y: F(y) < x}; F(x) is the cumulative distribution function. (See, Sarabia et al. (1999).) In our case, L(z)= z + (1-z) log (1-z), (0 < z < 1). L (z)= -log(1-z), µ=1, hence F -1 (x)= -log(1-x), - < x <1. Hence x=f(-log(1-x)), and on utting y=-log(1-x), leads to F(y)=1-e -y, with F(0)=0 and, as y, F(y) =1. Hence, not unexectedly given our starting oint, f(y)= e -y. 7. Extensions Here we consider three extensions of this work. Previous work on creating families of Lorenz curves include Sarabia et al. (1999). We define L 0 (z)=z and L 1 (z)=z+(1-z) log (1-z). 7.1 We can roduce a Lorenz curve of the form L α (z)= z + α (1-z) log (1-z), with 0 < α < 1, where α=0, is the line of equality and α=1 is the same as the reviously determined curve. It is easy to show that, corresondingly, G α = α/2. The roblem with this family of curves, is that it is not ossible to roduce, without further adjustments, a Lorenz curve with G α > 1/2. This curve aears to arise when there is a basic unit allocation together with a Uniformly randomly distributed allocation. 7.2 Extension using the transformation x ir = x i r. We next consider the transformation x i x ir = x i r for varying r, r>0. This would have the following affect: If r=0, we obtain erfect equality, since for all values of x i, x i 0 1. If r=1, we have the current examle, i.e. using the uniform random distribution.

As r, the curve works towards erfect inequality, since the largest x i, (i.e. x ) takes recedence and in effect all the other terms tend to 0. This would then create a family of Lorenz curves covering all values of the Gini co-efficient G. The greater the r the more inequality. Using the results of section 4.23, we can show that G r = ((2 r -1)/2 r )*((-1)/) and, obviously, tends to (2 r -1)/2 r as. Hence by varying r we may be able to rovide a benchmark distribution with which to comare various Lorenz curves with the same Gini co-efficient of inequality, covering all values of G. We can find the value of r using the following, for secified G, (0 < G < 1), and large, utting r G = -log e (1-G)/log e 2 gives the required r, for a re-secified G. This suggests that we should be able to find a articular Lorenz curve, for a articular r which is determined by a re-secified G. This curve could then be used as a benchmark for comaring other curves with the same G. 7.21 Shae of this family of curves Further work is required to establish if it is ossible to roduce the general shae of these curves. If we define this family of curves by L r (z) (0 < r < ) then we know the following, for 0 < z < 1: L 0 (z) = z, L 1 (z) = z + (1-z)log(1-z), with G r = (2 r -1)/2 r, and as r, L r (z) 0, 0 < z <1, L r (1) =1, and G r =1. 7.3 Third extension This section looks at a third extension. Since L 0 (z)= z and L 1 (z)= z+(1-z)log(1-z), we can exress a generalised form of these as L s (z)= z s+1 + Σ s * z s+j+1, 0 < s <. (s+1) j=1 (s+j)*(s+j+1) It is easy to show that, for all s > 0, L s (0)=0, L s (1)=1. Since all the coefficients are ositive it must be a Lorenz curve and that G s = s/(s+1).

Hence we can, for any given G, ut s G = G/(1-G) and hence we can find a Lorenz curve of the above form, again covering all values of G. 8. Summary and conclusion In summary, the aer started from the Uniform distribution, went via some reasonably comlicated functions, then to the Gini co-efficient and the Lorenz curve, and ended at the negative exonential distribution. The aer went on to roose extensions to rovide a family of Lorenz curves. These curves could be used as benchmark for comaring actual curves with the same Gini co-efficient. I hoe this has been a useful tri into the roerties of the Ordered Uniform Sacings, Moment Generating functions, density functions, the Gini co-efficient and the Lorenz curve. We have related this work to the Uniform distribution, further work is required to relate it to other distributions. CJS 25/04/05

References Barton, D.E, and David, F.N. (1956), Tests for randomness of oints on a line, Biometrika, 43, 104-112. Durbin, J. (1961), Some methods of constructing exact tests, Biometrika, 48, 41-55. Durbin, J. (1965), in the Discussion on Professor Pyke s Paer Sacings, J. R. Statist. Soc. B, 27, 437-438. Lewis, P.A.W. (1965), Some results on tests for Poisson rocesses, Biometrika, 52, 67-78. Pyke, R. (1965), Sacings, J. R. Statist. Soc. B, 27, 395-436, with discussion. Sarabia J.-M., Castillo E., and Slottje D.J. (1999), An ordered family of Lorenz curves, Journal of Econometrics, 91, 43-60. Stehens C.J. (1991), Symmetry in disguise, I.M.A. Bulletin, 27, 187-188. Stehens M.A. (1986), Tests for the Uniform distribution. In Goodness-of-fit techniques (eds. R. D Agostino and M. A. Stehens). New York: Marcel Dekker.