APPENDIX 1: CONSTRUCTING GINI COEFFICIENTS

Size: px

Start display at page:

Download "APPENDIX 1: CONSTRUCTING GINI COEFFICIENTS"

Barbara Polly Grant
5 years ago
Views:

1 APPENDIX 1: CONSTRUCTING GINI COEFFICIENTS A. Non-parametric estimation of bounds We follow the technique described in Murray (1978) for finding the upper and lower bounds of the Gini from grouped data on income distributions. 1 Calculating these bounds involves two optimization problems, which can be solved using linear and quadratic programming methods. We briefly outline the two optimization problems in this section. The structure of the American Indian reservation income data provides the following information:! is the total number of families, "is the population mean income, # $ is the number of families in the % &' interval where % = 1,, -, and [/ 0 $, / 1 $ ] are the lower and upper income limits of this interval. For the terminal interval -, there is no upper limit, which we denote as / 1 3 =. Two important statistics are unknown to us, given data limitations. These are: (1) the mean income within each interval, which we define as / $, and (2) the distribution of income within each interval. Solving the optimization problems require choosing these unknowns to solve for the upper and lower bounds of the Gini coefficient. Intuitively, finding the lower-bound Gini for a given grouped income distribution requires concentration of the within-interval mean incomes as much as possible towards the mean of the entire distribution, ". Minimizing the Gini also requires concentrating the distribution of within-interval incomes on the (unknown) within-interval mean income / $. Table A1 provides a simplified numerical example where - = 3 bins and! = 100 families. Table A1: Numerical Example to Illustrate Optimization Procedure Total number of families! = 100 Number of families within income range $0 to $33 = [/ 0 6, / 1 6 ) # 6 = 40 $33 to $66 = [/ 0 9, / 1 9 ) # 9 = 40 $66 or greater = [/ 0 :, ) # : = 20 Population Mean Income " = $50 Within-Group Mean Incomes / 6, / 9, / : =?#%#@A#B to be chosen Constrained by / $ 0 / $ / $ 1 1 For general non-parametric inequality measures estimation techniques see Cowell and Mehta (1982), Gastwirth and Glauberman (1976), and Cowell (2000). Depending on what information is available difference methods are available for calculating the 5bounds of inequality measures. Gastwirth (1972), Cowell (1991), McDonald and Randsom (1981), and Murray (1978) are all papers with these types of objectives in mind. A1

2 Figure A1 illustrates how the upper and lower bounds of the Gini are found based on the numerical example in table A1. With respect to the lower bound, note how that outer income ranges converge as much as possible towards the population mean income ". 1 Intuitively, one might expect / 6 to be chosen so that it is exactly equal to / 6, and / : to be 0 chosen so that it is exactly equal to / :, but there is the constraint that the sum of total income across all groups must be equal to the total income of the reservation, i.e., # 6 / 6 + # 9 / 9 + # : / : =!". Shown surrounding each / $ is the (unobservable) intermediate income distribution within interval %. To find the lower bound on the Gini, the optimization procedure forces the within-interval income distributions to collapse until every family within each interval has the same income (equal to / F ) and there is no within-interval inequality. Figure A1: Finding Upper and Lower Bounds of Gini Notes: / $ 0G corresponds to finding the lower bound of the Gini and / $ 1G corresponds to finding the upper bound of the Gini. Now consider finding the upper bound on the Gini. Here the optimization procedure involves allowing the within-interval mean income to diverge as much as possible from the population mean income ". Note in figure A1 how the within-interval mean incomes for the outer bins, / 6 and / :, diverge from the population mean ". Similar to the lower bound case, there is the constraint # 6 / 6 + # 9 / 9 + # : / : =!" which limits how far the / $ s can move. If this constraint were not effective, then the within-interval mean for the terminal bin / : would approach infinity and there would be no solution. The within-interval inequality is also 1 maximized such that a certain percentage of families earn the upper limit / $ and the rest earn the lower limit / 0 $, shown as the dividers between each income range. To formally solve the optimization problems we define H $ for all closed income 1 intervals as the proportion of families in interval % having income / $ and 1 H $ as the proportion having income / 0 $. The mean income within interval % can now be written as / $ = A2

3 H $ / $ 1 + (1 H $ )/ $ 0 where 0 H $ 1 for % = 1,, - 1. For the unbounded, terminal interval we define H 3 as the ratio of mean income / 3 to the lower bound / 3 0. The mean income for the terminal interval can be written as / 3 = H 3 / 3 0 where H 3 1. Given this specification, we can vary the two unknowns in the optimization problem within-interval mean income / $ and within-interval inequality by choosing the H s defined above. Moving H $ from 0 to 1 moves / $ from / $ 0 to / $ 1. In the upper bound Gini case, each family in interval % can have income equal to either / $ 0 or / $ 1. Therefore, if H $ = 0 then / $ = / $ 0, and if H $ = 1, then / $ = / $ 1. In either case, incomes are equal for all families within this group. As H $ moves away from 0 or 1, within-interval inequality increases as a certain proportion of families earn income at the opposite ends of the income range. When finding the lower bound of the Gini coefficient, the Gini can be expressed in the following way as a linear function of H $. Minimizing this function yields the lower bound Gini. LM#M 0G = NM# H $ L(H $ ) = 3U6 1! 9 " O# 3P 3 / 0 3 H 3 + Q # $ P $ R/ 0 $ / 0 6 SH 3 (/ 1 6 / 0 6 ) (Q # $ P $ ) 3U6 $V9 + Q # $ P $ R/ 0 $ / 0 $ S # 3 P 3 / 0 6 W $V9 where P $ = (! # $ ). This function takes into account the fact that within-interval incomes are equal. When finding the upper bound of the Gini, there is an additional degree of freedom allowing within-interval inequality to vary. The Gini is defined as a quadratic function of H $. Finding the upper bound Gini requires maximizing this function as below, 3 $V9 H $ LM#M 1G = NXY H $ L(H $ ) = 3U6 1! 9 " O# 3P 3 / 0 3 H 3 + Q # $ P $ R/ 0 $ / 0 6 SH 3 (/ 1 6 / 0 6 ) (Q # $ P $ ) $V9 3U6 + Q # $ P $ R/ 0 $ / 0 0 $ S # 3 P 3 / 6 $V9 3U6 A3 3 $V9 + Q # 9 $ R/ 1 $ / 0 $ SRH $ H 9 $ S + # 3 (# 3 1)/ 0 3 (H 3 1) W $V6 H $

4 where P $ = (! # $ ). Whether minimizing and maximizing the objective functions to find the bounds on the Gini, the optimization procedures are constrained by the limits of H $ and by total reservation income. We define these constraints as: (^1) 0 H $ 1 _@` M = 1,, - 1, (^2) 1 H 3, 3U6 (^3) Q # F RH $ / 1 $ + (1 H $ )/ 0 $ S + # 3 -/ 0 3 =! " FV6 Because both the objective function and constraints for the minimization problem are linear in the decision variable H $, we therefore use linear programming methods to calculate the lower bound Gini. In the case of the maximization problem, the objective function is quadratic in H $, which is why we employ quadratic programming methods to calculate the upper bound Gini. For the analysis that follows, we simply use the midpoint between the upper and lower bounds as the point estimate for the Gini coefficients. We justify this choice of a 0.50 weight between the lower and upper bound in section B below. For the simple numerical example in Table A1, where K = 3, the lower bound Gini is and the upper bound Gini is The midpoint is It is worth noting that the range between lower and upper bounds decreases with the number of intervals K, holding constant the distribution of income. For this reason, the upper minus lower bound difference in our actual Gini computations are significantly narrower than the = range computed from the simple example. 2 B. GMM estimation with Maximum Entropy distribution In order to produce a point estimate of the Gini coefficient, one must select a compromise value between the upper and lower bound Gini. For cases in which the grouped data contains information on group frequencies and group means, Cowell (2000) suggests a weighted average of the bounds giving a weight of 2/3 to the upper bound and 1/3 to the lower bound. Due to data limitations in our setting, we observe only group frequencies and population means and therefore have little guidance from Cowell (2000), or the rest of the literature, in terms of selecting a compromise value between lower and upper bound Ginis. 2 We calculated the Gini bounds with K = 5 and K = 10 assuming the family incomes were distributed uniformly from $0-100 and kept the lower limit of the terminal interval at $ /K. For example, for K = 5, there are approximately 10 families to a bin with a terminal interval of $80 or greater and, for K = 10, there are roughly five families to an interval with a terminal interval of $90 or greater. The difference between the upper and lower Gini bounds diminishes rapidly as K increases. When K = 5 this difference is and when K = 10 this difference is A4

5 To assess whether or not the midpoint of the lower and upper bound Ginis is a reasonable compromise value, we estimate the Gini coefficient after first estimating the continuous income distribution using techniques in Wu and Perloff (2007). Their procedure involves assuming a variable income in our case - is distributed according to a flexible maximum entropy density function 3, and then estimating the parameters of the so-called maxent density by Generalized Method of Moments (GMM). This procedure gives us a blueprint for calculating a Gini coefficient directly from an estimated density function. Because of computational constraints resulting from our data structure, however, we are unable to use the Wu and Perloff method to estimate the Gini coefficient for all reservations in our sample. 4 Therefore, we use the Wu and Perloff method to calculate Gini coefficients for a subset of our sample, and then compare the estimates with the midpoints of the lower and upper bound calculations described above. The following is a brief sketch of the estimation procedure. For theoretical motivation and a more in-depth explanation of the procedure, see Wu and Perloff (2007). The maxent density function we use to approximate the distribution is defined as _(Y b) = exp( b f b 6 Y b 9 Y 9 b : X`ghX#(Y) b i j@k(1 + Y 9 )) where b f is a normalization term that is defined such that _(Y b)my = 1. A special case of this Un function is the normal distribution, which occurs when b : and b i are both equal to zero. The X`ghX#(Y) component allows for deviations from a symmetric distribution, e.g., skewness, multi-modalness, etc. The j@k(1 + Y 9 ) component allows for fat tails. We employ GMM methods to estimate the parameters of this density function that best fits the grouped income data. This method involves minimizing a weighted quadratic function of the moment conditions of the density function which takes the form of (1). bo = X`kNM# N(b) qn(b (1) b Here the moment conditions are defined as (b) = r u s t sv _(Y b) B $ t w, x/ 0 n $, / 1 $ y are the upper _(Y b) " and lower bounds of interval %, B $ = z t { f is the share of the population in interval %, and " is the population mean. The objective is to make these moment conditions as close to zero as 3 The principle of maximum entropy is to choose the probability distribution consistent with known information while being as noncommittal as possible with regard to missing information. 4 The algorithm we use to minimize the objective function has difficulty converging for reservations which have an interval with no families in them, i.e., there is an empty bin in the frequency table. Generally, these are the smaller reservations. A5

6 possible subject to the weights in the weight matrix q. q is estimated using the following simulation procedure. 1. Draw an i.i.d. random sample Y of size! from the _RY bof S where bof is a consistent preliminary estimate calculated by setting q to the identity matrix. 2. Group Y in the same way as the original interval data and calculate simulated shares B $ and the population mean ". 3. Calculate the simulated weight matrix q defined below. B q = 0 B " Repeat this procedure Ä times and the weighting matrix is the average of these simulated weight matrices: q = G ÇV6 q (Ç). Once the entire distribution of income is estimated, the Gini coefficient can be calculated as the following: LM#M = 6 É U6 Ñ(Y)(1 Ñ(Y)mY f where Ö = Y_(Y)mY f Ü and Ñ(Y) = _(Y bo)my. In order to get an actual estimate of bo, we minimize function (1) f using the BFGS algorithm with numerical gradients. Because the maxent density is a rather complex function, a significant amount of numerical integration is required in order to evaluate this function, its gradients, and the moment conditions. This makes optimization slow. Moreover, it is often difficult to get the objective function to converge when trying to estimate the income distribution from grouped data which have bin or group intervals containing zero families. For this reason, we are unable to estimate the distribution of income and the Gini coefficient for many of the reservations whose grouped data have gaps (generally sparsely populated reservations). Nonetheless, this method allows us to estimate the Gini coefficients for a significant number of reservations. We compare the estimates with the estimates from the non-parametric procedure described above using different weights on the upper and lower bound Gini coefficients in order to find a reasonable compromise point estimate. In order to get a reasonable number of estimates with which to compare against the non-parametric estimates, we estimated the income distribution and Gini coefficients for all reservations in 1990 and 2000 for whom it was feasible. We then took the GMM Gini estimates L F áàà and compared them with the Gini estimates from the non-parametric procedure using different weights on the upper and lower bound of the Gini: L F {â (A) = A6

7 AL F 1 + (1 A)L F 0. In order to find a value of A which makes L F áàà and L F {â (A) close, we minimize the sum of the squared differences: Aä = X`kNM# A F (L áàà F L {â F (A)) 9. This yields a value of Aä = This finding indicates that roughly equal weights to the upper and lower bounds of the Gini generates the Gini point estimate which is closest to the GMM procedure. For this reason we use the midpoint between the upper and lower bounds as the point estimate of the Gini coefficient for all of the analysis. A7

8 APPENDIX 2: GROWTH-INEQUALITY RELATIONSHIPS FOR AMERICAN INDIAN RESERVATIONS BY GEOGRAPHIC REGION Notes: The geographical regions are depicted in figures 3 and 4. A8

9 APPENDIX 3: TRITILE GRAPHS OF GROWTH-INEQUALITY RELATIONSHIP Figure A.3.1: Tritile Graph of Growth-Inequality, Acoma to Cocopah Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A9

10 Figure A.3.2: Tritile Graph of Growth-Inequality, Coeur d Alene to Fond du Lac Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A10

11 Figure A.3.3: Tritile Graph of Growth-Inequality, Forest County Potawatomi to Fort Yuma Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A11

12 Figure A.3.4: Tritile Graph of Growth-Inequality, Gila River to Lac Courte Oreilles Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A12

13 Figure A.3.5: Tritile Graph of Growth-Inequality, Lac du Flambeau to Mescalero Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A13

14 Figure A.3.6: Tritile Graph of Growth-Inequality, Mille Lacs to Osage Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A14

15 Figure A.3.7: Tritile Graph of Growth-Inequality, Pala to Reno-Sparks Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A15

16 Figure A.3.8: Tritile Graph of Growth-Inequality, Rincon to San Pascual Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile. A16

17 Figure A.3.9: Tritile Graph of Growth-Inequality, Santa Clara to Standing Rock Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile A17

18 Figure A.3.10: Tritile Graph of Growth-Inequality, Stockbridge-Munsee to Umatilla Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile A18

19 Figure A.3.11: Tritile Graph of Growth-Inequality, Ute Mountain to Yavapai-Apache Nation Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile A19

20 Figure A.3.12: Tritile Graph of Growth-Inequality, Zuni Note: The vertical axis shows the Gini Coefficient and the horizontal axis shows the income tritile A20

21 APPENDIX 4: ADDITIONAL TABLES A. Variable Definitions and Sources Reservation Income o American Indian Per Capita Income Per capita income information from 1943 to 1945 is available from BIA reports housed in the National Archives in Washington D.C.. Income data from 1970 to 2010 is available from the Census Bureau. o Slot Machines Per Capita Number of slot machines in tribal casinos on the reservation divided by the American Indian population. For more information see Anderson and Parker (2008) and Cookson (2010). Ethnicity o Blood Quantum Blood quantum information is available from 1938 in a BIA report housed in the National Archives in Washington D.C.. The data includes the number of individuals in four blood quantum bins, 100%, 50-99%, 25-49%, and less than 25%. Reservation Demographics o American Indian Population American Indian population data is from the same set of reports as the per capita income data. o Pct. Completed High School This variable reports the share of the American Indian population with at least a high school degree. This measure is available from the Census Bureau for o Migration Migration measures were calculated from Census Bureau data available through NHGIS. Migration share used in Appendix Table A.4.3 is calculated as the share of the American Indian population that moved from either out of the state or out of the country to the reservation in the last 5 years in 2000 and in the prior year for Reservation Characteristics o Land Tenure Land tenure shares are measured as the fraction of the reservation in different tenure types as of The measures include the share of land in tribal trust, individual trust, and fee-simple. These measures are from the Bureau of Indian Affairs. o Indian Reorganization Act IRA adoption is from Frye and Parker (2016) and is a binary measure of whether or not the reservation voted to adopt the Indian Reorganization Act between 1934 and o Public Law 280 Public Law 280 is from Anderson and Parker (2008) and is a binary measure of whether or not Public Law 280 was applied to the reservation. Regional Economic Controls o State Per Capita Income State per capita income is from the Bureau of Economic Analysis. o Adjacent County Per Capita Income Adjacent county per capita income is from the Bureau of Economic Analysis. This measure is the average income of those counties that border the reservation and do not overlap with the reservation. o Distance to Nearest MSA Distance is calculated from the centroid of each reservation to the closest MSA in A21

22 B. Summary Statistics Table A.4.1: Summary Statistics by Period Gini Coefficient (9.109) (7.431) (8.134) (4.288) (4.484) (5.559) American Indian Per Capita Income (5007.5) (2142.8) (3105.2) (2107.1) (3076.1) (4272.9) Slots per capita (0) (0) (0) (0.0523) (0.589) (1.165) Less Ethnically Assimilated (0.217) (0.217) (0.217) (0.217) (0.217) (0.217) Ethnic Fragmentation (0.159) (0.159) (0.159) (0.159) (0.159) (0.159) American Indian Population (5324.9) (6501.7) ( ) ( ) ( ) ( ) Pct. Completed High School (.) (0.103) (0.0729) (0.127) (0.0913) (0.0879) State Per Capita Income (2329.4) (2782.7) (2877.5) (3417.2) (4447.2) (4203.2) Adjacent County Per Capita Income (.) (2546.9) (3112.5) (4122.0) (4264.4) (4015.5) Distance to Nearest MSA (in mi) (83.95) (83.68) (81.86) (81.86) (81.86) (81.86) Share of Acreage in Tribal Trust (0.383) (0.383) (0.383) (0.383) (0.383) (0.383) Share of Acreage in Individual Trust (0.176) (0.176) (0.176) (0.176) (0.176) (0.176) Share of Acreage in Fee-Simple (0.314) (0.314) (0.314) (0.314) (0.314) (0.314) Indian Reorganization Act (0/1) (0.402) (0.402) (0.402) (0.402) (0.402) (0.402) Public Law 280 Reservation (0/1) (0.480) (0.480) (0.480) (0.480) (0.480) (0.480) Notes: Table presents means and standard deviations for the outcomes and covariates used throughout the paper. For sources and definitions see Appendix A.4.1. Incomes are adjusted to 2010 dollars. A22

23 C. Inequality and Migration Tables Table A.4.2: Panel Model Estimates of Relationship between of Income and Inequality, Including Endogenous Controls Ln(Income Per Capita) Ln(Income Per Capita) LBQA Ln(Income Per Capita) BQP Ln(Income Per Capita) LBQA BQP (1) (2) (3) (4) (5) *** ** (0.070) (0.159) (0.103) (0.171) (0.213) ** (0.200) (0.152) (0.264) 0.873*** 0.789*** 2.826*** (0.262) (0.295) (0.864) ** (1.323) Reservation Fixed-Effects x x x x x Year Fixed-Effects x x x x x Time-Varying Controls x x x x x Historic Time-Trend Controls x x x x x Endogenous Population Controls x x x x x Endogenous Education Controls x x x x x Number of Reservations Number of Observations R-Squared Notes: * p < 0.10, ** p < 0.05, *** p < Standard errors, reported in parentheses, are clustered at the reservations level. Time-Varying controls include state per capita income and adjacent county per capita income. Historic Time-Trend controls include a dummy variable for whether the reservation adopted the IRA, a dummy variable for whether Public Law 280 applied to the reservation, log distance from the closest MSA, and controls for the share of reservation land held in tribal trust and individual trust. All of these variables are interacted with time period. The null hypothesis is that all the coefficients in the model are equal to zero. A23

24 Table A.4.3: Panel Model Estimates of Relationship between of Income and Inequality, Considering Migration Ln(Income Per Capita) Ln(Income Per Capita) LBQA Ln(Income Per Capita) BQP Ln(Income Per Capita) LBQA BQP (1) (2) (3) Share Ln(Gini) Ln(Gini) Migrate *** *** (0.538) (0.538) (8.509) 1.323** 1.323** (0.588) (0.587) (10.632) 5.372*** 5.401*** (1.942) (1.927) (31.934) * * (2.544) (2.524) (45.198) Reservation Fixed-Effects x x x Year Fixed-Effects x x x Time-Varying Controls x x x Historic Time-Trend Controls x x x Migration Controls x Number of Reservations Number of Observations R-Squared Notes: * p < 0.10, ** p < 0.05, *** p < Standard errors, reported in parentheses, are clustered at the reservations level. Data is from 2000 and Time-Varying controls include state per capita income and adjacent county per capita income. Historic Time- Trend controls include a dummy variable for whether the reservation adopted the IRA, a dummy variable for whether Public Law 280 applied to the reservation, log distance from the closest MSA, and controls for the share of reservation land held in tribal trust and individual trust. All of these variables are interacted with time period. The null hypothesis is that all the coefficients in the model are equal to zero. A24

GMM Estimation of a Maximum Entropy Distribution with Interval Data

GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics