Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1

Similar documents
Imputed Welfare Estimates in Regression Analysis 1

Updating Small Area Welfare Indicators in the Absence of a New Census

MICRO-LEVEL ESTIMATION OF WELFARE

Socio-Economic Atlas of Tajikistan. The World Bank THE STATE STATISTICAL COMMITTEE OF THE REPUBLIC OF TAJIKISTAN

Selection of small area estimation method for Poverty Mapping: A Conceptual Framework

Spatially Disaggregated Estimates of Poverty and Inequality in Thailand

Impact Evaluation Technical Workshop:

Brazil within Brazil:

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Gravity Models, PPML Estimation and the Bias of the Robust Standard Errors

DETERMINING POVERTY MAP USING SMALL AREA ESTIMATION METHOD

Applied Microeconometrics (L5): Panel Data-Basics

Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines

Lecture 4: Linear panel models

Jean Razafindravonoma, Director Direction des Statistiques des Ménages (DSM) de l Institut National de la Statistique, Madagascar

Economic poverty and inequality at regional level in malta: focus on the situation of children 1

Welfare in Villages and Towns

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Micro-Level Estimation of Welfare

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Impact Evaluation of Rural Road Projects. Dominique van de Walle World Bank

Bayesian Hierarchical Models

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

CLUSTER EFFECTS AND SIMULTANEITY IN MULTILEVEL MODELS

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Online Appendices, Not for Publication

More on Roy Model of Self-Selection

Topic 10: Panel Data Analysis

Introduction to Survey Data Integration

Econometrics of Panel Data

Environmental Econometrics

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Demand Shocks with Dispersed Information

APPLICATION OF THE COUNTRY PRODUCT DUMMY METHOD TO CONSTRUCT SPATIAL AND TEMPORAL PRICE INDICES FOR SRI LANKA

1. The OLS Estimator. 1.1 Population model and notation

Estimating Income Distributions Using a Mixture of Gamma. Densities

CROSS-COUNTRY DIFFERENCES IN PRODUCTIVITY: THE ROLE OF ALLOCATION AND SELECTION

Topic 7: Heteroskedasticity

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

1 Estimation of Persistent Dynamic Panel Data. Motivation

Advanced Econometrics

Impact Evaluation Workshop 2014: Asian Development Bank Sept 1 3, 2014 Manila, Philippines

1 Bewley Economies with Aggregate Uncertainty

Minimax-Regret Sample Design in Anticipation of Missing Data, With Application to Panel Data. Jeff Dominitz RAND. and

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology

CIRPÉE Centre interuniversitaire sur le risque, les politiques économiques et l emploi

Alp Simsek (MIT) Recitation Notes: 1. Gorman s Aggregation Th eorem2. Normative Representative November 9, Household Theorem / 16

ESTIMATION OF TREATMENT EFFECTS VIA MATCHING

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Flexible Estimation of Treatment Effect Parameters

Robust Non-Parametric Techniques to Estimate the Growth Elasticity of Poverty

Business Cycle Comovements in Industrial Subsectors

Lecture 4: Heteroskedasticity

Simple Regression Model (Assumptions)

Missing dependent variables in panel data models

Cross-Country Differences in Productivity: The Role of Allocation and Selection

Econometrics II Censoring & Truncation. May 5, 2011

Market access and rural poverty in Tanzania

Additional Material for Estimating the Technology of Cognitive and Noncognitive Skill Formation (Cuttings from the Web Appendix)

Beyond the Target Customer: Social Effects of CRM Campaigns

Katherine J. Curtis 1, Heather O Connell 1, Perla E. Reyes 2, and Jun Zhu 1. University of Wisconsin-Madison 2. University of California-Santa Cruz

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Intermediate Econometrics

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

HOUSEHOLD surveys collect information on incomes,

Specification Tests in Unbalanced Panels with Endogeneity.

Demand Shocks, Monetary Policy, and the Optimal Use of Dispersed Information

Short T Panels - Review

Households or locations? Cities, catchment areas and prosperity in India

Fractional Imputation in Survey Sampling: A Comparative Review

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

INCOME RISK AND CONSUMPTION INEQUALITY: A SIMULATION STUDY

Consumer Demand and the Cost of Living

Mapping poverty in rural China: how much does the environment matter?

The Impact of Residential Density on Vehicle Usage and Fuel Consumption: Evidence from National Samples

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

The impact of residential density on vehicle usage and fuel consumption*

Contextual Effects in Modeling for Small Domains

Taxing capital along the transition - Not a bad idea after all?

Econometrics of Panel Data

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Steven Cook University of Wales Swansea. Abstract

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

ECONOMICS 210C / ECONOMICS 236A MONETARY HISTORY

Sentiments and Aggregate Fluctuations

Quantile Regression for Dynamic Panel Data

Notes on Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

CSCI-6971 Lecture Notes: Monte Carlo integration

Equivalent representations of discrete-time two-state panel data models

Transcription:

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Income and wealth distributions have a prominent position in growth and development theories, and as determinants of specific socio-economic outcomes. However, empirical investigation of these relationships has been contrained by the lack of detailed high quality information on those distributions at the microlevel. We have developed a unit record-level statistical approach to the estimation of welfare measures that takes advantage of both the detail in household sample surveys and the comprehensive coverage of a large survey or census ( poverty mapping. See Elbers, Lanjouw and Lanjouw 2003 for details.) We discuss here extensions to the approach to allow construction of welfare estimates for given year (t1) when either the household survey or census is not available in that year, but is at some time removed (t0). Often the interest in doing this will be to update an existing map constructed using both survey and census information in t0. Updated maps could be used by policy makers needing distributional information for monitoring programs and the incidence of development policy; and by researchers trying to understand how changes over time in policy or other factors affect distributional outcomes. 1. Outline of Poverty Mapping Consider first a single period. Let W be a welfare indicator based on the distribution of a householdlevelvariableofinterest,y h. Using the smaller and richer data sample, we estimate the joint distribution of y h and observable covariates x h. By restricting the explanatory variables to those that can be linked to households in the census, this model can be used to generate the distribution of y h for any target population in the census conditional on its observed characteristics and, in turn, the conditional distribution of W. Although disaggregation may be along any dimension - not necessarily geographic - we will call these target populations villages. Briefly this is done as follows. We estimate a model of y ch, the per capita expenditure, say, of household h in sample cluster c, typically in the form of a linear approximation, (1) ln y ch = E[ln y ch x T ch ]+u ch = x T ch β + u ch. We allow for a within-cluster correlation in the disturbances: u ch = η c + ε ch, where η c, ε ch and x ch are uncorrelated. An initial estimate of β is obtained from weighted least squares estimation. A flexible form for the idiosyncratic part of the disturbance variance, σ 2 ε,ch, can be estimated using the residuals e ch from the decomposition bu ch = bu c. +(bu ch bu c. )=bη c + e ch, (where. indicates an average). In current practice we estimate a logistic form for the variance with a limited number of explanatory variables. The estimated covariance matrix, weighted by the household expansion factors, is used to obtain feasible GLS estimates of the parameters and their variance. We are interested in welfare measures based on individuals and thus write W (m v,x v,β,u v ), where m v is a vector of household sizes, X v a matrix of observable characteristics and u v a vector of disturbances. 1 Department of Economics, Vrije Universiteit, celbers@feweb.vu.nl; Agricultural and Resource Economics Department, University of California at Berkeley, jlanjouw@brook.edu; The World Bank, planjouw@worldbank.org.this paper was prepared for the International Statistical Institute 2005. Many of the ideas in this paper were developed in conversations with Hans Hoogeveen, Menno Pradhan, Remco Oostendorp and Roy van der Weide and we are grateful for their input. Financial support is gratefully acknowledged from the Government of Japan s PHRDTF at the World Bank. The views presented here should not be taken to reflect those of the World Bank or any of its affiliates. All errors are our own. 1

Because u v is unknown, we estimate the expected value of the indicator given the village households observable characteristics and the model of expenditure. This is denoted µ v = E[W m v,x v,ζ v ], where ζ v is the vector of model parameters, including those which describe the distribution of the disturbances. Monte Carlo simulation is used to calculate the expected welfare measures as follows: 1. Draw a vector of parameters ζ r from the estimated sampling distribution of b ζ : N( b ζ v, V b ( b ζ v )). 2. Draw a disturbance vector directly from the standardized residuals (semi-parametric), or from an appropriate empirical standardized distribution determined on the basis of those residuals (parametric), and transform using the heteroscedasticity model with parameters ζ r to obtain u r. 3. With each vector of simulated values construct the indicator, W r = W (m, X,β r,u r ). With R draws, the simulated expected value for the welfare indicator, and its variance, are then (2) eµ = 1 R RX r=1 W r and fv = 1 R RX (W r eµ) 2. r=1 The prediction error can be decomposed as (3) W eµ =(W µ)+(µ bµ)+(bµ eµ). Idiosyncratic Error - (W µ) : is the result of the realizations of the unobserved component of expenditure. This component of the variance in our estimator falls approximately proportionately in the number of households in the target population, M. Model Error - (µ bµ) : is determined by the properties of the first stage estimators and thus does not change systematically with M. Computation Error - (bµ eµ) :can be made negligible by making R sufficiently large. 2. Updating A change in consumption over time can be explained by a change in covariates, a change in their relationship to consumption, and a change in the disturbances: (4) ln y 1 ln y 0 =(X 1 X 0 )β 0 + X 1 (β 1 β 0 )+ η + ε If what is missing is a census in t1, and one is willing to assume that covariates have not changed, then a map for t1 can be constructed using the t0 census and a consumption model estimated using the t1 household survey. The required assumption can be checked by comparing the distribution of covariates in the t1 survey to their distribution in the census. The resulting imputed estimates can also be compared at an aggregated level to t1 sample survey estimates. If instead there is a census for t1 but a household survey only for t0 or t2, and one is willing to assume that a consumption model estimated for an adjacent period is valid for t1, then a welfare map for period t1 can be constructed using that model together with t1 census convariates. This may be a reasonable assumption if the two periods are not too far apart, or if an explanatory variable can pick up time trends in consumption. For instance, a noisy measure of consumption or income from the census data could be included as an explanatory variable in a model of a higher quality indicator, as in the third example below. These assumptions are often untenable in practical settings. Fortunately, a variety of data configurations offer the means to update poverty maps without having to resort to such extreme assumptions. We now outline several approaches to obtaining welfare estimates with partial information. These methods may or may not be effective in any given setting. In particular, the fact that information is more limited may force more aggregated modelling (e.g. one model for all rural areas vs. by strata) which, in turn, may undermine the basic assumption that the models are appropriate for out-of-sample imputation. This should be considered carefully. Similarly, because they use only partial information, updated maps are likely to be useful only at higher levels of aggregation than a standard map constructed with full information. Again in some cases, this higher level may well be a substantial improvement in terms of detail than what can be achieved using sample survey data only. 2

Using Household Panel Data In some cases panel survey data are available. With a measure of consumption y ch for the same households in each period, but census information only for t0, we can proceed as follows: 1. For t0 estimate the consumption model (5) ln y 0ch = x T 0ch β 0 + η 0c + ε 0ch. The resulting model can be used to simulate welfare estimates eµ 0. 2. For t1 use the fact that consumption in t1 and covariates in t0 are available for the same households to estimate the model (6) ln y 1ch = x T 0ch β 1 + η 1c + ε 1ch. 3. Impute t1 consumption for each census household based on t0 characteristics. When using panel estimates that have been constructed in this way to analyze changes in welfare over time, one must take account of correlated model error and also correlated cluster effects if some part of the effect of location is persistent. It would be efficient to estimate the t0 and t1 consumption equations jointly, in which case estimated correlation in the parameters and location effects could be applied in the subsequent simulations. Alternatively, one could estimate an equation for ln y 0ch and separately a second equation modelling the change in log consumption between periods. These are less likely to be correlated. Using Village Variables Even when the t0 household characteristics of households in the t1 sample are unknown (i.e., one does not have household panel data), it is quite possible that their t0 village characteristics are known. If so, one can proceed as above and estimate a version of equation (6) with explanatory variables limited to those at the village level: (7) ln y 1ch = x 0c β 1 + η 1c + ε 1ch. Estimates for t1 can be simulated for the population on the basis of village-level characteristics in t0. As only village-level variables are used, all households in the same village will be allocated the same value for the systematic part of their simulated consumption. Given this, it is advisable to include in the model villagelevel variables that can capture differences across localities in the within-village variation in consumption, e.g. not only means of the x variables, but also squares and higher powers. Similarly it is important to consider carefully the heteroscedasticity model. The advantage of the household panel survey approach is that it allows one to use household-level variables and thereby develop a better model of household consumption. However, panel data sets tend to be very small, which increases model error in the resulting welfare estimates, and also often suffer acutely from attrition bias. The village variable approach will typically allow use of a larger number of sample households and on that account may have lower model error than the panel case. However, the limitation to village-level variables increases the idiosyncratic error in the resulting welfare estimates. If household panel data are available as a subset of households within a larger survey sample, one could take advantage of the strengths of each approach. A first map could be constructed using the panel data. A second map could be constructed using the non-panel households and a model with only village-level variables. A final estimate of the welfare measure could be constructed as the variance-minimizing weighted average of the two. 3

Short and Long Form Survey Data This approach allows estimation when there is a census in both periods but only partial information on consumption. Suppose that, in the base period, households in a fraction of sampled clusters are given a long form survey yielding a measure of per capita consumption y tch, and the rest filled in a short form survey yielding a less exact measure s tch. In t1, all households are given the short form. To obtain estimates of welfare defined in terms of the more accurate indicator, y ch, in both periods, we assume that the joint distribution of log short consumption s tch and covariates x tch can be explained by (8) ln s tch = x T tch β t + η s tc + ε s tch and that of per capita long consumption y tch is described by the model (9) ln y tch = z T tch α t + γ t ln s tch + η tc + ε tch. The set of covariates z ch differs from those in the regression for s ch. 1. For each period separately, estimate equation (8) to obtain ln d s tch = x T tch β t. 2. Using data from t0, estimate the model (10) ln y 0ch = z0ch T α 0 + γ d 0ln s0ch + hx T 0ch (β 0 β b i 0 )γ 0 +(η 0c + γ 0 η s 0c)+(ε 0ch + γ 0 ε s tch ). 3. With estimates bα 0 and bγ 0, and d ln s 0ch, estimate predicted log per-capita long form consumption d ln y 0ch using equation (10) for all households in the base period and use the estimated disturbance distribution and model variance covariance matrix to simulate base period welfare measures eµ 0v. True ln y 0ch should be used for households where this information is available. 4. Using estimates bα 0, bγ 0, and d ln s 1ch, estimate d ln y 1ch for households in period t1. Again use the estimated disturbance distribution and model variance covariance matrix from t0 to simulate welfare measures eµ 1v. Estimated short form consumption is used instead of the true values, ln s 1ch, despite the latter being available for all households, because estimated values enter the model of long form consumption in the base period. The advantage of this approach as opposed to estimating y as a function of both z and x is that is requires the relatively weak assumption that α t = α, andthatγ t = γ. Importantly, it can accomodate changes in the intercept and in the returns to x variables over time, changes that would imply evolution in β t. 2 Higher-level Estimation This approach may be used when there is no census in t1 but there is panel data on village level variables, x 0 and x 1. Weestimateafullscalemapint0 to obtain the expected welfare estimates eµ 0v. We the posit a relationship between the welfare indicator and the village-level variables: (11) W 0v = x T v0θ 0 + τ v0. Recall that (12) W 0v = eµ 0v + ξ 0v. Substituting, (13) eµ 0v = x T 0vθ 0 +[τ 0v (W 0v µ 0v ) (µ 0v eµ 0v )]. 2 The disturbance distribution is assumed to be constant over time, which means that the model error in the estimation of β t remains similar even though levels may be changing. 4

After estimating this equation, village welfare measures can then be imputed for t1 using equation (11) as cw 1v = \ E[W x 1v ]=x T 1v b θ 0. The variance of the prediction is E[W 1v c W 1v ] 2 = x T 1vV( b θ 0 )x 1v + σ 2 τ. The first part of this variance is straightforward to obtain from equation (13). We obtain a noisy estimate of σ 2 τ by estimating equation (11) using sample estimates of welfare, W0v s, calculated from the household survey in t0. The disturbance in this equation is [τ v0 + s vo ], where s vo is sampling error in W0v s. The variance due to sampling error can be determined analytically or through simulation and then subtracted from the residual variance to arrive at an estimate of σ 2 τ. This approach requires the assumption that the model estimated in t0 applies in t1 which, as noted above, will be more tenable across shorter time periods or if variables that capture time trends can be included. REFERENCES Alderman, H., M. Babita, G. Demombynes, N. Makhatha, and B. Özler (2002): How Low Can You Go?: Combining Census and Survey Data for Mapping Poverty in South Africa, Journal of African Economics, forthcoming. Chesher, A., and C. Schluter (2002): Economic Studies, forthcoming. Welfare Measurement and Measurement Error, Review of Elbers, Chris, J.O. Lanjouw and Peter Lanjouw (2003): Micro-Level Estimation of Poverty and Inequality, Econometrica, 71, 355-64. (2002): Micro-Level Estimation of Welfare, Policy Research Department Working Paper, The World Bank, forthcoming. 5