Poverty Estimation Methods: a Comparison under Box-Cox Type Transformations with Application to Mexican Data

Thesis submitted in fulfillment for the degree of Master of Science in Statistics to the topic Poverty Estimation Methods: a Comparison under Box-Cox Type Transformations with Application to Mexican Data in the Chair of Statistics and Econometrics School of Business and Economics Freie Universität Berlin submitted by Natalia Rojas-Perilla Supervisors: Prof. Dr. Timo Schmid Prof. Dr. Ulrich Rendtel 12 Mai, 2015

Acknowledgements First and foremost, I would like to express my deepest gratitude to my advisor, Prof. Dr. Timo Schmid, who has supported me all the time with his encouraging guidance, constant motivation, exemplary enthusiasm, profound understanding and patience. He also provided me an excellent personal and work environment for doing research. My sincere thanks also goes to Dr Nikos Tzavidis for his useful and insightful suggestions, that greatly improved the realization of this thesis. I am also very thankful to Prof. Dr. Ulrich Rendtel and Prof. Dr. B. Piedad Urdinola for their unconditional academic support and personal motivation during this exciting journey in Germany. My thanks are also to the Colombian foundation COLFUTURO and the German organization DAAD for their financial assistance during my Master study. I wish to express my sincere thanks to the whole team fu:stat, for providing me with all the necessary facilities for this thesis. I am deeply thankful to my partner for his endless love, support and encouragement throughout this entire fascinating journey. With him I found the inspiration and motivation needed to complete my Master study. Last but not the least, I thank my families in Colombia and Germany, for their hearty, permanent and unconditional support throughout this venture. i

Contents Acknowledgements i List of Figures List of Tables Abbreviations iv viii xii 1 Introduction 1 2 Poverty Mapping Approaches 3 2.1 Poverty Indicators............................... 3 2.2 Poverty Mapping Framework......................... 6 3 Poverty Estimation Methods 7 3.1 Small Area Estimation............................. 7 3.2 Small Area Estimation of Poverty Indicators Using Unit-Level Models.. 9 3.2.1 Empirical Best Predictor (EBP) Approach.............. 9 3.2.2 The World Bank Method (ELL)................... 15 3.2.3 Parametric Bootstrap for MSE Estimation.............. 18 3.3 Model-Based Simulation Study: ELL vs. EBP................ 19 3.3.1 Poverty Measures Estimation for the First Scenario......... 22 3.3.2 MSE Estimation for the First Scenario................ 26 3.3.3 Poverty Measures Estimation for the Second Scenario....... 30 3.3.4 MSE Estimation for the Second Scenario.............. 34 3.4 Concluding Remarks.............................. 38 4 The EBP Approach under Power Transformations 39 4.1 Power Transformation Families........................ 40 4.2 Model-Based Simulation Scenarios...................... 45 4.2.1 Scenario 1: rmal Error Terms................... 46 4.2.2 Scenario 2: Contaminated Error Term................ 53 4.2.3 Scenario 3: Chi-Square Error Term.................. 58 4.2.4 Scenario 4: log Scale Outcomes.................... 63 4.2.5 Scenario 5: Gamma Error Term................... 68 4.2.6 Scenario 6: Heteroscedastic Error Term............... 73 4.3 Concluding Remarks and Future Research Directions............ 77 ii

Contents iii 5 Poverty Mapping Approach Applied in State of Mexico at a Municipality Level 80 5.1 Data Description................................ 81 5.2 Poverty Estimation under the EBP Procedure................ 84 5.2.1 The Working Model Selection..................... 84 5.2.2 Municipality-Level Poverty Estimations............... 87 5.3 Concluding Remarks and Future Research Directions............ 96 A State of Mexico 99 Bibliography 104 Declaration of Authorship 113

List of Figures 3.1 Box-plots of the RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.05, of the mean (on the left) and PG (on the right)............................... 22 3.2 RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.05, of the mean (on the top) and PG (on the bottom)..................................... 22 3.3 Box-plots of the RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.05, of HCR (on the left) and Gini (on the right)............................... 24 3.4 RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.05, of HCR (on the top) and Gini (on the bottom) 24 3.5 Box-plots of the RB/Bias and over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.05, of the mean (on the left) and PG (on the right)........................ 26 3.6 over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.05, of the mean (on the top) and PG (on the bottom)..................................... 26 3.7 Box-plots of the RB/Bias and over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.05, of HCR (on the left) and Gini (on the right)......................... 28 3.8 over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.05, of HCR (on the top) and Gini (on the bottom) 28 3.9 Box-plots of the RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.5, of the mean (on the left) and PG (on the right)............................... 30 3.10 RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.5, of the mean (on the top) and PG (on the bottom)..................................... 30 3.11 Box-plots of the RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.5, of HCR (on the left) and Gini (on the right).................................. 32 3.12 RB/Bias and over the domains for the direct, EBP and ELL estimators, with ICC=0.5, of HCR (on the top) and Gini (on the bottom) 32 3.13 Box-plots of the RB/Bias and over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.5, of the mean (on the left) and PG (on the right)........................ 34 3.14 over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.5, of HCR (on the top) and PG (on the bottom) 35 iv

List of Figures v 3.15 Box-plots of the RB/Bias and over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.5, of HCR (on the left) and Gini (on the right)......................... 36 3.16 over the domains for the MSE estimator of EBP and ELL approaches, with ICC=0.5, of HCR (on the top) and Gini (on the bottom) 37 4.1 On the left, the Box-Cox transformation and on the right the alternative convex-to-concave transformation. Both under different levels of λ = 0.0, 0.5, 1.0, 1.5, 2.0........................... 45 4.2 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 47 4.3 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 47 4.4 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of HCR (on the left) and Gini (on the right) 49 4.5 RB/Bias and over the domains for EBP, under the different transformations, of HCR (on the top) and Gini (on the bottom)..... 49 4.6 RB/Bias and over the domains for the EBP estimator under the Box-Cox transformation of average income, PG, HCR and the Gini coefficient, by using the different re-transformation procedures...... 52 4.7 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 53 4.8 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 54 4.9 RB/Bias and over the domains for EBP, under the different transformations, of HCR (on the top) and Gini (on the bottom)..... 56 4.10 RB/Bias and over the domains for EBP, under the different transformations, of HCR (on the top) and Gini (on the bottom)..... 56 4.11 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 58 4.12 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 59 4.13 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 61 4.14 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of HCR (on the left) and Gini (on the right) 61 4.15 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 63 4.16 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 64 4.17 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 65

List of Figures vi 4.18 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 66 4.19 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 68 4.20 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 69 4.21 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of HCR (on the left) and Gini (on the right) 71 4.22 over the domains for EBP, under the different transformations, of HCR (on the top) and Gini (on the bottom)................ 71 4.23 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 73 4.24 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 74 4.25 Box-plots of the RB/Bias and over the domains for EBP, under the different transformations, of average income (on the left) and PG (on the right).................................. 75 4.26 RB/Bias and over the domains for EBP, under the different transformations, of average income (on the top) and PG (on the bottom) 76 4.27 Optimal transformation parameter λ for EBP under the different scenarios across all simulated populations........................ 78 5.1 Sample sizes of the survey- and census-data of the 125 municipalities divided by urban and rural areas....................... 83 5.2 An illustration of poverty definitions by CONEVAL in Mexico..... 84 5.3 Optimal λ for the REML approach applied in the Box-Cox transformation 85 5.4 Q-Q-plots for the Pearson residuals (upper panels) and random effects (lower panels) of the working model for EBP under the different transformations..................................... 88 5.5 Density plots of the Pearson residuals (on the left) and random effects (on the right) of the working model for EBP under the different transformations 89 5.6 Box-plots of point estimations of average income, HCR1, HCR2, PG1, PG2 and Gini for the EBP method under the different transformations, with 1 and 2 indicating moderate and extreme poverty, respectively.... 90 5.7 Poverty map of the mean income in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level.... 92 5.8 Poverty map of the Gini in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level........ 92 5.9 Poverty map of the HCR1 in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level........ 92 5.10 Poverty map of the HCR2 in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level........ 93 5.11 Poverty map of the PG1 in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level........ 93 5.12 Poverty map of the PG2 in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level........ 93

List of Figures vii 5.13 Poverty map of the in State of Mexico for the EBP method under the log and Box-Cox transformations at a municipality level........ 95 5.14 Box-plots of the Pearson residuals (on the left) and standardized random effects (on the right) of the working model for EBP under the different transformations................................. 97 A.1 Municipalities and districts in State of Mexico.............. 99

List of Tables 3.1 Quantiles and mean of RB of the mean for the direct, EBP and ELL estimators with ICC=0.05........................... 23 3.2 Quantiles and mean of of the mean for the direct, EBP and ELL estimators with ICC=0.05........................... 23 3.3 Quantiles and mean of bias of PG for the direct, EBP and ELL estimators with ICC=0.05................................. 23 3.4 Quantiles and mean of of PG for the direct, EBP and ELL estimators with ICC=0.05........................... 23 3.5 Quantiles and mean of bias of HCR for the direct, EBP and ELL estimators with ICC=0.05........................... 25 3.6 Quantiles and mean of of HCR for the direct, EBP and ELL estimators with ICC=0.05........................... 25 3.7 Quantiles and mean of bias of Gini for the direct, EBP and ELL estimators with ICC=0.05.............................. 25 3.8 Quantiles and mean of of Gini for the direct, EBP and ELL estimators with ICC=0.05........................... 25 3.9 Quantiles and mean of RB of MSE of the mean for the EBP and ELL estimators with ICC=0.05........................... 27 3.10 Quantiles and mean of of MSE of athe mean for the EBP and ELL estimators with ICC=0.05........................ 27 3.11 Quantiles and mean of RB of MSE of PG for the EBP and ELL estimators with ICC=0.05.............................. 27 3.12 Quantiles and mean of of MSE of PG for the EBP and ELL estimators with ICC=0.05........................... 27 3.13 Quantiles and mean of RB of MSE of HCR for the direct, EBP and ELL estimators with ICC=0.05........................ 29 3.14 Quantiles and mean of of MSE of HCR for the direct, EBP and ELL estimators with ICC=0.05........................ 29 3.15 Quantiles and mean of RB of MSE of Gini for the direct, EBP and ELL estimators with ICC=0.05........................... 29 3.16 Quantiles and mean of of MSE of Gini for the direct, EBP and ELL estimators with ICC=0.05........................ 29 3.17 Quantiles and mean of RB of the mean for the direct, EBP and ELL estimators with ICC=0.5........................... 31 3.18 Quantiles and mean of of the mean for the direct, EBP and ELL estimators with ICC=0.5........................... 31 3.19 Quantiles and mean of bias of PG for the direct, EBP and ELL estimators with ICC=0.5.................................. 31 viii

List of Tables ix 3.20 Quantiles and mean of of PG for the direct, EBP and ELL estimators with ICC=0.5........................... 31 3.21 Quantiles and mean of bias of HCR for the direct, EBP and ELL estimators with ICC=0.5........................... 33 3.22 Quantiles and mean of of HCR for the direct, EBP and ELL estimators with ICC=0.5........................... 33 3.23 Quantiles and mean of bias of Gini for the direct, EBP and ELL estimators with ICC=0.5............................... 33 3.24 Quantiles and mean of of Gini for the direct, EBP and ELL estimators with ICC=0.5........................... 33 3.25 Quantiles and mean of RB of MSE of the mean for the EBP and ELL estimators with ICC=0.5........................... 35 3.26 Quantiles and mean of of MSE of the mean for the EBP and ELL estimators with ICC=0.5......................... 35 3.27 Quantiles and mean of RB of MSE of PG for the EBP and ELL estimators with ICC=0.5............................... 35 3.28 Quantiles and mean of of MSE of PG for the EBP and ELL estimators with ICC=0.5........................... 36 3.29 Quantiles and mean of RB of MSE of HCR for the EBP and ELL estimators, with ICC=0.5........................... 37 3.30 Quantiles and mean of of MSE of HCR for the EBP and ELL estimators with ICC=0.5........................... 37 3.31 Quantiles and mean of RB of MSE of Gini for the EBP and ELL estimators with ICC=0.5........................... 37 3.32 Quantiles and mean of of MSE of Gini for the EBP and ELL estimators with ICC=0.5........................... 38 4.1 Quantiles and mean of RB of average income for EBP under the different transformations............................ 48 4.2 Quantiles and mean of of average income for EBP under the different transformations............................ 48 4.3 Quantiles and mean of bias of PG for EBP under the different transformations..................................... 48 4.4 Quantiles and mean of of PG for EBP under the different transformations.................................... 48 4.5 Quantiles and mean of bias of HCR for EBP under the different transformations.................................... 50 4.6 Quantiles and mean of of HCR for EBP under the different transformations................................. 50 4.7 Quantiles and mean of Bias of Gini for EBP under the different transformations.................................... 50 4.8 Quantiles and mean of of Gini for EBP under the different transformations................................. 50 4.9 Quantiles and mean of RB of average income for EBP under the different transformations............................ 54 4.10 Quantiles and mean of of average income for EBP under the different transformations............................ 54 4.11 Quantiles and mean of of PG for EBP under the different transformations.................................... 55

List of Tables x 4.12 Quantiles and mean of of PG for EBP under the different transformations.................................... 55 4.13 Quantiles and mean of bias of HCR for EBP under the different transformations.................................... 57 4.14 Quantiles and mean of of HCR for EBP under the different transformations................................. 57 4.15 Quantiles and mean of bias of Gini for EBP under the different transformations..................................... 57 4.16 Quantiles and mean of of Gini for EBP under the different transformations................................. 57 4.17 Quantiles and mean of RB of average income for EBP under the different transformations............................ 59 4.18 Quantiles and mean of of average income for EBP under the different transformations............................ 59 4.19 Quantiles and mean of bias of PG for EBP under the different transformations..................................... 60 4.20 Quantiles and mean of of PG for EBP under the different transformations.................................... 60 4.21 Quantiles and mean of bias of HCR for EBP under the different transformations.................................... 62 4.22 Quantiles and mean of of HCR for EBP under the different transformations................................. 62 4.23 Quantiles and mean of bias of Gini for EBP under the different transformations..................................... 62 4.24 Quantiles and mean of of Gini for EBP under the different transformations................................. 62 4.25 Quantiles and mean of RB of average income for EBP under the different transformations............................ 64 4.26 Quantiles and mean of of average income for EBP under the different transformations............................ 64 4.27 Quantiles and mean of bias of PG for EBP under the different transformations..................................... 65 4.28 Quantiles and mean of of PG for EBP under the different transformations.................................... 65 4.29 Quantiles and mean of bias of HCR for EBP under the different transformations.................................... 66 4.30 Quantiles and mean of of HCR for EBP under the different transformations................................. 66 4.31 Quantiles and mean of bias of Gini for EBP under the different transformations..................................... 67 4.32 Quantiles and mean of of Gini for EBP under the different transformations................................. 67 4.33 Quantiles and mean of RB of average income for EBP under the different transformations............................ 69 4.34 Quantiles and mean of of average income for EBP under the different transformations............................ 69 4.35 Quantiles and mean of bias of PG for EBP under the different transformations..................................... 70

List of Tables xi 4.36 Quantiles and mean of of PG for EBP under the different transformations................................. 70 4.37 Quantiles and mean of bias of HCR for EBP under the different transformations.................................... 72 4.38 Quantiles and mean of of HCR for EBP under the different transformations................................. 72 4.39 Quantiles and mean of bias of Gini for EBP under the different transformations..................................... 72 4.40 Quantiles and mean of of Gini for EBP under the different transformations................................. 72 4.41 Quantiles and mean of RB of average income for EBP under the different transformations............................ 74 4.42 Quantiles and mean of of average income for EBP under the different transformations............................ 74 4.43 Quantiles and mean of bias of PG for EBP under the different transformations..................................... 75 4.44 Quantiles and mean of of PG for EBP under the different transformations.................................... 75 4.45 Quantiles and mean of bias of HCR for EBP under the different transformations.................................... 75 4.46 Quantiles and mean of of HCR for EBP under the different transformations................................. 76 4.47 Quantiles and mean of RB of Gini for EBP under the different transformations..................................... 76 4.48 Quantiles and mean of of Gini for EBP under the different transformations................................. 76 4.49 Quantiles and mean of the optimal transformation parameter λ for EBP under the different scenarios across all simulated populations....... 78 5.1 Sample sizes of the municipalities available in survey- and census-data. 82 5.2 ICTPC in the urban and rural areas, measured in Mexican pesos.... 82 5.3 Poverty lines defined by CONEVAL broken down by household and poverty types.................................. 84 5.4 Description of the explanatory variables used in the working model.. 85 5.5 R 2, ICC and λs for the different working models.............. 86 5.6 Skewness, kurtosis and values of the Shapiro-Wilk (S-W) normality test for the random effects and error terms of the working models for EBP under the different transformations...................... 88 5.7 Quantiles and mean of point estimations of mean, HCR1, HCR2, PG1, PG2 and Gini for the EBP method under the different transformations, with 2 and 1 indicating moderate and extreme poverty, respectively.... 91 5.8 Box-plots of for the point estimations of average income, HCR1, HCR2, PG1, PG2 and Gini for the EBP method under the different transformations, with 1 and 2 indicating moderate and extreme poverty, respectively................................... 94 A.1 List of the municipalities and respective districts in State of Mexico. 103

Abbreviations CEPAL CONEVAL CTMP EB EBP ELL ENIGH FGLS FGT GREG HCR HT ICC ICTPC IGLS INEGI MCMC MDG MQ MSE ONU PG Q-Q QSR RB Comisión Económica Para América Latina y el Caribe Consejo Nacional de Evaluación de la Política de Desarrollo Social Comité Técnico para la Medición de la Pobreza Empirical Bayes Empirical Best Predictor Elbers, Lanjouw and Lanjouw (World Bank method) Encuesta Nacional de Ingreso y Gasto de los Hogares Feasible Generalized Least Squares Foster Greer Thorbecke General REgression Estimator Head Count Ratio Horvitz Thompson Intraclass Correlation Coefficient Ingreso Corriente Total Per Cápita Iterative Generalized Least Squares Instituto Nacional de Estadística y Geografía Markov Chain Monte Carlo Millennium Development Goal M-Quantile Mean Squared Error Organización de las Naciones Unidas Poverty Gap Quantile-Quantile plot Quintile Share Ratio Relative Bias xii

List of Abbreviations xiii REML SAE UNDP WEF REsidual Maximum Likelihood Root Mean Squared Error Small Area Estimation United Nations Development Programme World Economic Forum

Chapter 1 Introduction Latin America stands out together with Sub-Saharan Africa as one of the most unequal regions in the world. This problem was acknowledged by the United Nations, which argued that fighting poverty should be the first and the most important of the Millennium Development Goals to be addressed in this century. However, to achieve this, it is crucial to obtain a detailed description of the spatial distribution of poverty for understanding the geographic conditions where the poor live within a country. For this purpose, poverty mapping procedures are commonly applied. To do this, the use of an adequate estimation method becomes necessary. Unfortunately national surveys often are not suitable to give reliable statistical information at local levels to cover all regions within a country. Small area procedures are estimation procedures for parameters under very small sample sizes. For sufficiently large sample sizes, traditional estimators, such as the mean estimators, produce very convincing results, but when applied to small sample sizes, such estimators often only have very limited reliability. This is often the case if the subject requires the data to be split into many small categories, e.g. municipalities. Even large surveys in a country will contain administrative units from which only very few or even no households have been sampled. For such domains, in practice, the sampling error is often huge. Luckily there is an alternative to the classical estimators, the model based methods, which have been developed further in recent years. These methods use model assumptions to reduce the sampling error. The small area estimation methods, in particular the ones based on generalized linear mixed models are part of this class of methods. Their basic principle is to improve the estimation by extending the original too small sample. A range of estimation approaches for small areas in particular based on 1

Chapter 1. Introduction 2 unit-level mixed models and with application to poverty indicators have been proposed. However, the most commonly used in the literature are the Empirical Best Predictor estimation method, suggested by Molina and Rao (2010), the World Bank method, developed by Elbers et al. (2003) and the M-Quantile approach introduced by Chambers and Tzavidis (2006). Some of these methods rely on Gaussian assumptions, which is a feature seldom observed in original data. For this purpose, a logarithmic transformation is very often used in practice to ensure normality. However, in order to find an optimal transformation, other already existing methods, like power transformations (the Box-Cox transformation in particular) should be taken into account. Therefore, it is important to analyze how the performance of small area estimation methods are affected by departures from normality and how such Box-Cox type transformations can assist with improving the validity of the model assumptions and the precision of small area prediction. The present work is structured as follows. Chapter 2 presented the theoretical background of some selected small area estimation methods for poverty mapping. Chapter 3 provides an overview of the main approaches of poverty estimation methods based on unit-level linear mixed models. In Chapter 4 some power transformation families are introduced and analyzed. At last in Chapter 5 the poverty mapping procedure is applied on data from State of Mexico.

Chapter 2 Poverty Mapping Approaches In this chapter, the theoretical background of some selected small area estimation methods for poverty mapping is presented. In order to understand poverty measurement, one-dimensional poverty indicators are introduced in Section 2.1, in particular the Gini Coefficient (Gini, 1912), the Income Quintile Share Ratio (Atkinson, 1987) and the Foster Greer Thorbecke (Foster et al., 1984) indicator family. Finally, a general framework of a poverty mapping process is described in Section 2.2. 2.1 Poverty Indicators Poverty generally refers to a sociodemographic condition, where the basic human needs required to live comfortably are lacking (Betti and Lemmi, 2013). It is widely known to be a multidimensional concept (Bourguignon and Chakravarty, 2003), but until to now the main focus drawn in the literature was on one dimension. Conventional measures of poverty use household per capita income, expenditure or consumption level (Greeley, 1994). For this reason, this paper is based on the analysis of a one-dimensional poverty concept: income-poverty, which can be measured in an absolute and relative form. Absolute income-poverty is taken constantly over time and between countries. But, Relative income-poverty depends on the location or society in which people live and is generally measured in relation to a poverty line (Betti and Lemmi, 2013; Litchfield, 1999; Atkinson, 1987). In his context, a poverty line, also known as threshold, is derived from an estimate of an adequate minimum income in a given country and is commonly 3

Small Area Estimation Methods for Poverty Mapping 4 set by national governments. Under this framework, poverty and income inequality measurements are based on identifying an index or aggregate indicator, which can be built with respect (or not) to a poverty line taking into account the households economic conditions. In the literature, there are different poverty indicators intending to summarize poverty and income inequality in only one measure (Molina et al., 2014). Among them, the Sen Index (Sen, 1976), the Monetary and Supplementary Fuzzy measures (Betti et al., 2012) and the Human Poverty Index, are widely used. However, the best known indicator family is the Foster Greer Thorbecke (FGT) family, which provides information on three important factors: incidence, intensity and severity (Boltvinik, 1998). This family is composed of additive poverty indexes and was developed by Foster et al. (1984). Another family of indicators is the Laeken Indicators, endorsed by Eurostat (2004) by the European Council in the Brussels suburb of Laeken, Belgium. They are known to measure inequality and the Gini Coefficient (Gini, 1912) and the Income Quintile Share Ratio (QSR) (Eurostat, 2004) in particular, are also widely-cited due to their simplicity and straightforward interpretation. tation Before defining the above mentioned poverty indicators and discussing their estimation under small area estimation methods, a general framework notation is defined for the present work. Assume U a finite population of size N, partitioned into D regions or domains U 1, U 2,..., U D of sizes N 1,..., N D, where i = 1,..., D refers to a ith region and j = 1,..., N i to the jth individual. Let y be the target measure, in this case, the household income, where y ij is the value of y for unit j from domain i. Definition 2.1. (FGT measures (Foster et al., 1984)) The FGT index of type α for a region i = 1,..., D and a fixed threshold t is given by F i (α, t) = 1 N i N i j=1 F ij (α, t), α = 0, 1, 2 (2.1) where ( ) t α yij F ij (α, t) = I(y ij t) t with I(A) an indicator function which returns 1 if A is a true expression and 0 otherwise. From this definition the following poverty measures are derived:

Small Area Estimation Methods for Poverty Mapping 5 1. Setting α = 0, Head Count Ratio(HCR) or At-Risk-of-Poverty-rate index is defined. This represents the incidence of the poverty and is the proportion of the population whose consumption or income is below the given poverty line. 2. Taking α = 1, leads to Poverty Gap (PG) index, which is a measure of poverty intensity or depth and quantifies the degree to which the mean income of people living under the poverty line differs from the poverty line. 3. Setting α = 2, results Poverty severity or squared poverty gap, which gives squared poverty gap index. Definition 2.2. (Income Quintile Share Ratio (QSR) (Eurostat, 2004)) The QSR for a region i = 1,..., D is defined as QSR i = y ij Ni j=1 I (y ij y 0.8 ) Ni j=1 I (y ij y 0.2 ) with y 0.8 and y 0.2, denote the 80% and 20% quantiles of the target variable y, respectively. In other words, QSR is the ratio of the total income earned by the richest 20% relative to that earned by the poorest 20% in the same region. Definition 2.3. (Gini coefficient (Gini, 1912)) The Gini coefficient for a region i = 1,..., D is given by G i = 1 N i 1 j=1 [ ] [ ] pi(j+1) p ij yi(j+1) y ij with p the empirical probability distribution of population. The Gini coefficient is one of the most commonly used measures of inequality and describes the relationship between the cumulative portion of population arranged according to the level of income and the cumulative portion of income earned. It ranges from 0 or 0% (perfect equality) to 1 or 100% (complete inequality) and it equals twice the area between the Lorenz curve and the line at 45 degrees. The Lorenz curve is a graphical representation of the cumulative distribution function of the empirical probability distribution of income (Lorenz, 1905). For practical reasons, the present work concentrates mainly in the theoretical part, on the estimation of the FGT indicators described above in 2.1.

Small Area Estimation Methods for Poverty Mapping 6 2.2 Poverty Mapping Framework The poverty mapping procedure aims to obtain a detailed description of the spatial distribution of poverty in order to enhance current understanding of the geographic conditions where the poor live within a country. For this purpose, combined data from a surveyed representative sample and census are used to construct the so-called highresolution maps, where estimated poverty indicators are graphically represented in detail (Henninger and Snel, 2002). Such a poverty mapping follows the steps below: 1. Choose the poverty measure(s): As a starting point, the selection of poverty indicators and a poverty line t if required. Moreover, choose y, the target variable and X = (x 1,..., x p ) T, the design matrix, containing p appropriate explanatory variables, where y ij is a value of y and x T ij = (x 1ij,..., x pij ), the values of the variables related to y ij. 2. Select input data: Survey data: available for y and X. Census data: available for X. 3. Choose a poverty estimation method. 4. Estimation process: Estimate a statistical model on the survey data, that links y to X and obtain the corresponding parameters. Obtain a synthetic population of y in the census data, using the previously obtained parameters and the covariates available in the census. Estimate target poverty indicators and their associated precision estimates. 5. Produce maps: Obtain a high-resolution map here called poverty map by plotting the resulting estimates on the geographical coordinates.

Chapter 3 Poverty Estimation Methods As previously seen in Section 2.2, in a general framework of poverty mapping, appropriate estimation methods have to be chosen. Some of the most used approaches are the Small Area Estimation (SAE) methods, which are introduced in Section 3.1. Followed by an overview of the main approaches of poverty estimation methods based on unit-level linear mixed models. The first approach, the Empirical Best Predictor (EBP) estimation method, suggested by Molina and Rao (2010), is presented in Section 3.2. The second approach used by the World Bank and developed by Elbers et al. (2003) is introduced in Section 3.3. In order to evaluate the performance of both estimators, a model-based simulation study is performed in Section 3.4. To conclude this Chapter, some remarks and possible areas of future research are provided in Section 3.4. 3.1 Small Area Estimation As noted above, for performing a complete poverty mapping and obtaining a local picture of poverty, the use of an adequate estimation method becomes necessary. Unfortunately national surveys are often not suitable to give reliable statistical information at local levels due to the high costs. For instance, the sample sizes are not large enough to cover all regions within a country. One possibility would be using direct estimators of means, totals, quantiles and proportions for the complete population by using a design-unbiased procedure such as Horvitz Thompson (HT) estimator (Horvitz and Thompson, 1952) or by taking some covariates related to the target variable y into account such as the 7

Poverty Estimation Methods 8 General Regression Estimator (GREG) (Särndal and Wretman, 1992). In practice, when the purpose is to obtain detailed information of poverty at local levels, direct estimators often lead to unreliable results, as they are based only on the available region-specific sample. Therefore, appropriate estimators for small areas are required (Jiang and Lahiri, 2006). In this context and according to Schaible (1996), small area is defined as: Definition 3.1. (Small area (Rao, 2003)) A domain or area is regarded as small if the domain-specific sample is not large enough to support direct estimates of adequate precision. Typical examples of small domain or areas include geographical regions (states, municipalities, etc.) or socio-demographic groups (specific ages, ethnic groups, etc.) (Jiang and Lahiri, 2006). Therefore, in order to find more efficient estimators and to produce estimates for small areas with an adequate level of precision, indirect estimators should be used. These methods are suitable for the estimation of complex measures such as the poverty indicators in Chapter 2. Furthermore, they make the estimations by borrowing strength values of the target variable y from related areas over space (Chambers et al., 2009) and/or time periods (Rao and Yu, 1994). In particular, this research is based on indirect domain estimators, which use values of y from another domain but not from another time period (Schaible, 1996; Rao, 2003). Furthermore, these approaches use related information sources (Betti and Lemmi, 2013), resulting in an increment of the precision level by reducing the sampling error derived from survey data (Jiang and Lahiri, 2006). The most popular domain indirect estimation approaches make use of explicit random effects models for estimating domain specific parameters (Rao, 2003), in particular, the nested error regression models (Battese et al., 1988). These procedures provide a link between the geographical areas by assuming a constant relationship between the target variable y and explanatory variables X = (x 1,..., x p ) T, p and account for additionally between area variation. These approaches are also called model-based estimators and can be classified into two types: area-level models and unit-level models. The first approach can be used if only aggregated data are available over the areas, usually leading to some loss of information. The most widely used area-level estimator is the Fay-Herriot estimator (Fay III and Herriot, 1979). Unit-level

Poverty Estimation Methods 9 models are established for unit values of a target variable to unit-specific explanatory ones, using more detailed information and leading to improvement of precision. The most well-known is the Battese-Harter-Fuller model (Battese et al., 1988). For further insights regarding small area estimation see Pfeffermann et al. (2013); Pfeffermann (2002) and also Jiang and Lahiri (2006); Ghosh and Rao (1994) for area- and unit-level models. 3.2 Small Area Estimation of Poverty Indicators Using Unit-Level Models As noted above, a range of estimation approaches for small areas based on unit-level mixed models and with application to poverty indicators have been proposed. However, the most commonly used in the literature are: The Empirical Best Predictor (EBP) approach (Molina and Rao, 2010) The M-Quantile approach (MQ) (Chambers and Tzavidis, 2006) The World Bank method (ELL) (Elbers et al., 2003) These methods can be applied to general area parameters, that are non linear functions of the target variable in the area units (Molina and Rao, 2013b). However, one of their major disadvantages is the higher computational costs for the user in case of large populations. The present work concentrates mainly on the analysis of EBP and ELL methods, therefore they are discussed in detail in Section 3.2.1 and 3.2.2, respectively. For detailed information about alternative methodologies see Chambers et al. (2014); Molina et al. (2014); Elbers and van der Weide (2014); Betti et al. (2012); Jiang et al. (2011); Ferretti and Molina (2011); Tzavidis et al. (2010); Chambers and Tzavidis (2006); Ghosh and Rao (1994). 3.2.1 Empirical Best Predictor (EBP) Approach The empirical best predictor (EBP) method, suggested by Molina and Rao (2010), assumes a unit-level mixed model also called nested error regression model on the transformed target variable y. This model includes additional random effects for the sampling areas and uses generally two sources of information, survey and census data that share the same covariates or auxiliary information. The EBP method provides a

Poverty Estimation Methods 10 Monte Carlo approximation of the estimator and adopts a parametric bootstrap approach for the Mean Squared Error (MSE) estimation. Before the estimation process of the EBP approach is explained in detail, a general framework notation is defined. tation Assume U a finite population of size N, partitioned into D regions or domains U 1, U 2,..., U D of sizes N 1,..., N D, where i = 1,..., D refers to a ith region and j = 1,..., N i to the jth individual. Let y be the target variable. Assume X = (x 1,..., x p ) T, the design matrix, containing p appropriate explanatory variables. Define s the set of sample units, with s i the in-sample units in region i. Let r the set of non-sampled units, with r i, the out-of-sample units in region i. Let n i be the sample size in region i with n = D i=1 n i. Thereby, consider y i a vector with population elements for domain i partitioned as: ) yi T = (y i1,..., y ini, ( ) = yis, T yir T, where y is and y ir denote the sample elements s and the out-of-sample elements r in area i, respectively. This Chapter is based on the general small area estimation theory described in Rao (2003) and applied by Molina and Rao (2010) for poverty measures. Empirical Bayes (EB) Approach Consider a generic index or a target parameter δ i = h(y i ), as a function of the population variable y i, for i = 1,..., D. Let ˆδ i an estimator of δ i depending only on y is, the in-sample units for region i. The mean squared error for δ i is defined as: ( ) { ( ) } 2 MSE ˆδi = E yi ˆδi δ i, (3.1) where E yi indicates the expectation with respect to the joint distribution of y i. The ( best predictor (EB) of δ i that minimizes (3.1) is δi B = E yir δ i y is ), a function of y is proportioned by the conditional expectation of y ir given y is. Subtracting and adding ˆδ i the next expression for the mean squared error is obtained as follows: ( ) { ( MSE ˆδi = E yi ˆδi δi B ) 2 } + 2E yi { ( ˆδi δ B i ) (δ B i δ i )} + E yi { ( δ B i δ i ) 2 }.

Poverty Estimation Methods 11 In this equation the third term does not depend on ˆδ i and the second one is equal to zero as follows: E yi {( ˆδi δ B i ) )} [ {( ) )} ] (δi B δ i = E yis E yir ˆδi δi (δ B i B δ i E yis [ { } )}] = E yis ˆδi δi {δ B i B E yir (δ i y is = 0. Since δ B i = E yir ( δ i y is ), it is non-negative with min-value equal to zero, the EB is: ˆδ i B = δ B i = E yir (δ i y is ), (3.2) which is also an unbiased estimator: E yis ( ˆδi B ) = E yis { E yir ( δ i y is )} = E yi ( δ i ). Usually the joint distribution of y depends on θ, a vector of unknown model parameters. In this context, the empirical best predictor (EBP) of δ can be obtained by evaluating the expectation in 3.2 by substituting θ = ˆθ, where ˆθ is a suitable estimator of θ. Nested Error Linear Regression Model To obtain best predictors for FGT poverty measures in a context of small areas, the use of a nested error linear regression model is necessary to asses the relationship between the target variable and the auxiliary information from the survey data. This relationship is described by a nested error linear regression unit-level defined by Battese et al. (1988), which includes random area-specific effects together with the unit-level error terms (error term). The EBP approach relies on Gaussian assumptions for the model error terms, therefore, a one-to-one transformation T (y ij ) = y ij for filling this assumption. of the target variable y is assumed

Poverty Estimation Methods 12 Definition 3.2. (Nested error linear regression model (Battese et al., 1988)) The nested error linear regression model is defined by: y ij = x T ijβ + u i + e ij, j = 1,..., n i, i = 1,..., D, u i iid N(0, σ 2 u), (3.3) e ij iid N(0, σ 2 ɛ ), where u i and e ij are assumed to be independent. In the above definition x T ij is a vector of auxiliary variables with dimension p 1 at unit-level, β the (p + 1) 1 vector of regression coefficients defined as β T = (β 0,..., β p ) and the random area-specific effects and unit-level error terms are denoted by u i and e ij, respectively. Furthermore, the vector of unknown model parameters is defined as: θ T = (β 0,..., β p, σ 2 u, σ 2 e) = (β, σ 2 u, σ 2 e). Re-defining the nested model described in 3.3 in form of vectors by stacking the elements for each area i = 1,..., D leads to: yi = col ( y ) ij, ei = col (e ij ), x i = col ( x T ) ij. 1 j n i 1 j n i 1 j n i Under the model defined by 3.3, the vectors y i are independent and normal distributed y i N(µ i, V i ) for i = 1,..., D where µ i = x i β and V i = σ 2 u1 ni 1 T n i + σ 2 ei ni, with 1 ni a column vector of ones of size n i and I ni the n i n i identity matrix. The corresponding decomposition of y i into in-sample and out-of-sample elements is y T ( ) yis T, y T ir and the density of y given yis can be determined as: i = y i = y is y ir N µ i = µ is µ ir, V i = V is V irs V isr V ir.

Poverty Estimation Methods 13 In 3.2 ˆδ B i depends on a parameter θ, a vector of unknown model parameters. As previously pointed out, the best empirical estimator is obtained by replacing θ with a suitable estimator ˆθ. EBP Approach for FGT Poverty Indicators The method starts making an estimation of θ denoted by ˆθ = ( ˆβ, ˆσ u, 2 ˆσ e), 2 using the sample data. The most frequently used estimation methods for the parameter θ are the Restricted Maximum Likelihood (REML), Iterative Generalized Least Squares (IGLS), Markov Chain Monte Carlo (MCMC) methods or Feasible Generalized Least Squares (FGLS). The FGT indicators given by 2.1 can be expressed in terms of y ij as follows: F ij (α, t) = 1 N i N i j=1 ( t T 1 t y ij ) α I [ T 1 (y ij) t ] =: h α (y i ), α = 0, 1, 2 and Therefore ( ) h α (yij) := t T 1 yij t ) h α (y i := 1 N i ) h α (yij. N i j=1 α I [ T 1 (y ij) t ]. That means, the FGT indicators are non linear functions of the transformed target variable. From here on out refers to the transformed data y ij. Taking δ i = F i (α, t) and following 3.2, the BP of δ i = F i (α, t) is given by ] ˆF i B (α, t) = E y ir [F i (α, t) yis. (3.4)

Poverty Estimation Methods 14 By using the decomposition of F i (α, t) in terms of in-sample and out-of-sample elements, the following is obtained: ˆF ij B 1 (α, t) = E y ir = 1 N i N i = 1 N i N i N i N i N i F ij (α, t) + F ij (α, t) yis j s i j r i j s i F ij (α, t) + j s i F ij (α, t) + N i j r i E y ir N i j r i [ ] F ij (α, t) yis B ˆF ij (α, t), where ˆF ( ) ij B(α, t) is the BP of F ij α, t = h α (yi ) defined as: ] ( ) ˆF ij B (α, t) = E y ir [h α (yij) y is = h α (yij)df yij y is R for j r i. (3.5) ) Due to the complexity of h α (yij the integral in 3.5 cannot be explicitly calculated (West et al., 1985), however once the normality assumption on the regression nested model in 3.3 is fully, the conditional distribution of y ir given y is is: y ir y is N (µ ir is, V ir is ), (3.6) which is equivalent to y ir y is = µ ir is + u i 1 Ni n i + e is, where the random area-effects and error term denoted by u i and e is respectively are ( ) ) independently and satisfy u i N 0, ˆσ 2 (1 γ i ) and e is N (0 Ni n i, ˆσ ei 2 Ni n i with conditional mean and covariance matrix defined by: µ ir is = x ij ˆβ + ˆσ 2 u1 Ni n i 1 T n i V 1 is (y is x ij ˆβ), V ir is = ˆσ 2 u(1 γ i )1 Ni n i 1 T N i n i + ˆσ 2 ei Ni n i, (3.7) for γ i = ˆσ 2 u(ˆσ 2 u + ˆσ 2 e/n i ) 1 and V is = ˆσ 2 u1 Ni 1 Ni + ˆσ 2 ei Ni. Based on this distribution, ˆF ij B (α, t) can be obtained by using an empirical approximation by Montecarlo simulation described as follows:

Poverty Estimation Methods 15 1. Generate l = 1,..., L non-sample vectors y (l) ir defined in 3.6 and 3.7 for each area i = 1,..., D. ( 2. Form a population vector yi (l)t = y T is, y (l) ir T from the conditional distribution ) for each area i = 1,..., D and l = 1,..., L by attaching the corresponding sample elements y is. 3. Using the re-transformed population vector y (l) i, calculate the poverty measure F (l) i (α, t) = h α (y i ) for l = 1,..., L in each area. 4. Finally, take the mean over the L Montecarlo generations in each area to obtain an approximation of ˆF ij B (α, t) as follows: ˆF B i (α, t) = E yir [F i (α, t) y is ] 1 L L l=1 F (l) i (α, t). In the case of non-sampled domains, generate y (l) ij l = 1,..., L from y (l) ij = x T ij ˆβ + u (l) i + e (l) ij Calculate ˆF i (l) (α, t) from the re-transformed y (l) estimator ˆF ij B (α, t) by ˆF B i (α, t) 1 L L l=1 for u (l) i ij (l) ˆF i (α, t). with j = 1,... N i by bootstrapping iid N(0, ˆσ 2 u) and ɛ (l) ij iid N(0, ˆσ 2 ɛ ). for l = 1,..., L and obtain a synthetic This procedure is also appropriate for predicting another poverty measure δ i = h(y i ), in i = 1,..., D, as long as the distribution of the transformation of the target variable T (y ij ) is known and the conditional distribution y r y s can be obtained. 3.2.2 The World Bank Method (ELL) The World Bank uses an alternative estimation method, officially published in 2003 by Elbers, Lanjow & Lanjouw, which is well known as the World Bank method or ELL for brevity. The present work is based on the calculations from Elbers et al. (2002, 2003); Demombynes et al. (2007); Silva Filho et al. (2008). The ELL method assumes a unit-level mixed model also called nested error regression model on the transformed target variable y. This model uses generally two sources of information, survey and census data that share the same covariates or auxiliary information. Furthermore, one of the most notable aspects of this procedure is the data based clustering structure, which is a design property in both, the sample and the population data. For the ELL approach, units with similar characteristics in the data are grouped naturally, usually into geographical clusters. Therefore, the nested regression model assumes for this method