Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India

Similar documents
Selection of small area estimation method for Poverty Mapping: A Conceptual Framework

Estimation of District Level Poor Households in the State of. Uttar Pradesh in India by Combining NSSO Survey and

- An Application to National Sample Survey Data

ESTP course on Small Area Estimation

DISTRICT ESTIMATES OF HOME DELIVERIES IN GHANA: A SMALL AREA ANALYSIS USING DHS AND CENSUS DATA

Market access and rural poverty in Tanzania

Contextual Effects in Modeling for Small Domains

LFS quarterly small area estimation of youth unemployment at provincial level

Small Domain Estimation for a Brazilian Service Sector Survey

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada

Households or locations? Cities, catchment areas and prosperity in India

Introduction to Survey Data Integration

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

Small Domains Estimation and Poverty Indicators. Carleton University, Ottawa, Canada

Conducting Fieldwork and Survey Design

Small Area Estimation in R with Application to Mexican Income Data

A4. Methodology Annex: Sampling Design (2008) Methodology Annex: Sampling design 1

Secondary Towns and Poverty Reduction: Refocusing the Urbanization Agenda

Package sae. R topics documented: July 8, 2015

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1

Sample size and Sampling strategy

OPTIMAL CONTROLLED SAMPLING DESIGNS

STATS DOESN T SUCK! ~ CHAPTER 16

Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

ESTIMATE THE REGRESSION COEFFICIENTS OF VARIABLES SPL. REFERENCE TO FERTILITY

A STUDY OF HUMAN DEVELOPMENT APPROACH TO THE DEVELOPMENT OF NORTH EASTERN REGION OF INDIA

APPLICATION OF THE COUNTRY PRODUCT DUMMY METHOD TO CONSTRUCT SPATIAL AND TEMPORAL PRICE INDICES FOR SRI LANKA

Operational Definitions of Urban, Rural and Urban Agglomeration for Monitoring Human Settlements

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

A CHAIN RATIO EXPONENTIAL TYPE ESTIMATOR IN TWO- PHASE SAMPLING USING AUXILIARY INFORMATION

Multivariate area level models for small area estimation. a

The trends and patterns of urbanization in the NCT of Delhi during

Experiences with the Development and Use of Poverty Maps

UNIT 11 INTER STATE AND INTER DISTRICT IMBALANCES

Optimum Spatial Weighted in Small Area Estimation

Ordinary Least Squares Regression Explained: Vartanian

Small Area Estimation Using a Nonparametric Model Based Direct Estimator

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety

Electronic Research Archive of Blekinge Institute of Technology

Outlier robust small area estimation

THE STUDY OF NON-SAMPLED AREA IN THE SMALL AREA ESTIMATION USING FAST HIERARCHICAL BAYES METHOD

DETERMINING POVERTY MAP USING SMALL AREA ESTIMATION METHOD

CRP 272 Introduction To Regression Analysis

Experiences with the Development and Use of Poverty Maps

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

S. LEIVANG, MD. H. ALI AND S. SAGOLSEM

Anne Buisson U.M.R. E.S.P.A.C.E. Ater, University of Provence

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Inference for Regression Inference about the Regression Model and Using the Regression Line

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

Simulating urban growth in South Asia: A SLEUTH application

Choose Carefully! An Assessment of Different Sample Designs on Estimates of Official Statistics

Estimation of small-area proportions using covariates and survey data

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

ANALYSIS OF POVERTY IN INDONESIA WITH SMALL AREA ESTIMATION : CASE IN DEMAK DISTRICT

A Review of Concept of Peri-urban Area & Its Identification

Indicator: Proportion of the rural population who live within 2 km of an all-season road

Human development is a well-being concept with its core being the capability

Achieving the Vision Geo-statistical integration addressing South Africa s Developmental Agenda. geospatial + statistics. The Data Revolution

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Behavioural Analysis of Out Going Trip Makers of Sabarkantha Region, Gujarat, India

Georgia Kayser, PhD. Module 4 Approaches to Sampling. Hello and Welcome to Monitoring Evaluation and Learning: Approaches to Sampling.

A measurement error model approach to small area estimation

Application of log-linear models in producing small area estimates of unemployment in Poland

Poverty Estimation Methods: a Comparison under Box-Cox Type Transformations with Application to Mexican Data

Summary and Statistical Report of the 2007 Population and Housing Census Results 1

Updating Small Area Welfare Indicators in the Absence of a New Census

Rural Alabama. Jennifer Zanoni. Geography Division U.S. Census Bureau. Alabama State Data Center 2018 Data Conference Tuscaloosa, Alabama

Development from Representation? A Study of Quotas for the Scheduled Castes in India. Online Appendix

Tribhuvan University Institute of Science and Technology 2065

Plausible Values for Latent Variables Using Mplus

Location as a Poverty Trap

A Small Area Procedure for Estimating Population Counts

Botswana Census-Based. Poverty Map Report. Republic of Botswana. District Level Results. July Price P25.00

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Incentives and Nutrition for Rotten Kids: Intrahousehold Food Allocation in the Philippines

ANGOLA: PARTICIPATORY MAPPING OF URBAN POVERTY. By Allan Cain, Development Workshop

Application of Statistical Analysis in Population and Sampling Population

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

Simple Linear Regression for the Climate Data

Improving rural statistics. Defining rural territories and key indicators of rural development

Poverty assessment me-thodology.

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

GIS in Locating and Explaining Conflict Hotspots in Nepal

Secondary Towns, Population and Welfare in Mexico

University of Pretoria etd

HISTORICAL PERSPECTIVE OF SURVEY SAMPLING

Understanding the modifiable areal unit problem

Food Price Subsidy under Public Distribution System in Andhra Pradesh, Maharashtra and Rajasthan

ESTP course on Small Area Estimation

Engagement on Strategies to Overcome Inequality

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Does city structure cause unemployment?

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

FORECASTING COARSE RICE PRICES IN BANGLADESH

WIDER Working Paper 2018/182. The dynamics of spatial and local inequalities in India. Abhiroop Mukhopadhyay 1 and David Garcés Urzainqui 2

TRAVEL PATTERNS IN INDIAN DISTRICTS: DOES POPULATION SIZE MATTER?

Transcription:

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi Email: hchandra@iasri.res.in Acknowledgments Ray Chambers, University of Wollongong, Australia Nicola Salvati, University of Pisa, Italy

Background India has been in an advantageous position with availability of regular data flow through National Sample Survey Office (NSSO) The NSSO surveys, the main source of official statistics in India These surveys are planned to generate statistics at state/national level The estimates are provided (separately for rural and urban sectors) at State/UT and all-india level No regular flow of estimates at further below (e.g., district) level District (or small area) level estimates are currently not available There is a rapidly growing demand of such micro level statistics in India as the country is moving from centralized to more decentralized system 2

Examples GOI-UNDP project on Capacity Development for District Planning - It expects decentralised planning to improve effectiveness of development programmes UN-MDG-1: To eradicate extreme poverty and hunger. To monitor the progress and target requires both state and districts level estimates on various parameters, reliable estimates at later level is not available in India NSSO surveys provide reliable state and national level estimates; they cannot be used to derive reliable direct estimates at the district level because of small sample sizes which lead to high levels of sampling variability District is a very important domain of planning process, but we do not have surveys to produce estimates at this level 3

There is great emphasis on district level planning in India At the same time it is also true that conducting district specific surveys is going to be very trivial and costly as well as time consuming job Using the state level survey data to derive the estimates at district level may end up with very small sample sizes which may result in very unstable estimates for these levels A need for special techniques to produce estimates for such small domains or small areas referred as small area estimation (SAE) Small area estimation (SAE) answers the problem of small sample sizes 4

Map of India 5

Uttar Pradesh - state located in northern India Covers 93,933 square miles (243,290 km 2 ), equal to 6.88% of the total area of India The fourth largest Indian state by area Population over 200 million (in 2011) The most populous state in the country as well as the most populous country subdivision in the world 6

Application to NSSO Data We use SAE techniques to derive model-based estimates of proportion of poor households at small area levels in the State of Uttar Pradesh in India by linking data from the Household Consumer Expenditure (HCE) Survey 2006-07 of NSSO 63 rd round and the Population Census 2001 Small areas - different districts of State of Uttar Pradesh We illustrate how the HCE survey and Census data can be combined to derive reliable level estimates for various policy relevant parameters The covariates are available at districts level. Therefore, we adopt an area level area model to derive the small area level estimates 7

Two types of variables are required for this analysis (i) The variable of interest for which small area estimates are required is drawn from the Household Consumer Expenditure Survey 2006-07 of NSSO 63 rd round data for rural areas of the State of Uttar Pradesh The target variable - poor households The poverty line used to identify whether given household is poor or not The poverty line used is same as those of year 2004-05, given by planning commission, Govt of India A household having MPCE below the state s poverty line (i.e., Rs. 365.84) is categorised as poor household The parameter of interest is the proportion of poor household (HCR) at the district level 8

(ii) Covariates known for the population are drawn from the Census 2001 Use of covariates from the Population Census 2001 to model poor household from the HCE Survey 2006-07 of NSSO may raise issues of comparability. However, relationship between variable of interest and covariates used in this study are assumed not to change significantly over the period There were more than 100 covariates available from the Population Census for the purpose of modelling Selection of covariates: We first examined the correlation of all the available covariates with the target variable and then select the covariates with reasonably good correlation with the target variable This was followed by step-wise regression analysis 9

Finally, six variables were identified for the further analysis which significantly explained the model. These are (i) sex ratio of SC population, (ii) sex ratio of ST population, (iii) percentage of Other worker Population, (iv) percentage of Literate Male, (v) main Other workers female and (vi) marginal Other population The R 2 for the chosen model was 48 per cent 10

Distribution of districts-wise sample size S.No. Sample size S.No. Sample size S.No. Sample size S.No. Sample size 1 48 19 48 37 24 55 24 2 48 20 48 38 24 56 24 3 48 21 24 39 24 57 24 4 48 22 24 40 24 58 48 5 24 23 48 41 24 59 48 6 24 24 48 42 24 60 48 7 24 25 48 43 48 61 48 8 24 26 48 44 24 62 24 9 24 27 24 45 48 63 48 10 24 28 48 46 48 64 48 11 48 29 24 47 24 65 48 12 48 30 24 48 24 66 19 13 24 31 24 49 48 67 24 14 24 32 24 50 48 68 24 15 24 33 24 51 23 69 24 16 24 34 24 52 24 70 24 17 48 35 24 53 48 Total 2322 18 24 36 24 54 24 11

The sampling design used in the NSSO data is stratified multi-stage random sampling with districts as strata, villages as first stage units and households as the second stage units. A total of 2322 households were surveyed from the 70 districts of the Uttar Pradesh The district-wise sample size varied from 19 to 48 with average of 33 District level sample sizes are very small with average sampling fraction as 0.0001 Difficult to derive reliable estimates and their SE at district level 12

Diagnostics for Small Area Estimation Ambler, et al. (2011) Generally two types of diagnostics procedures are tested in small area estimation - Model diagnostics and - Small area estimates validation/diagnostics Suggested criteria for successful SAE SAE model should fit the data and should be able to explain between area variation Model-based estimates should be - consistent with unbiased direct estimates - more precise than direct estimates - more stable over time than direct estimates - acceptable to informed users 13

Model diagnostics are used to verify the assumptions of underlying model Small area estimates diagnostics are applied to validate the reliability of the model-based small area estimates 14

Bias Diagnostic If direct estimates are unbiased, their regression on the true values should be linear and correspond to the identity line. If model-based estimates are close to the true values the regression of the direct estimates on the model-based estimates should be similar Plot direct estimates on Y-axis and model-based estimates on X- axis - look for divergence of regression line from Y = X. - test for intercept = 0 and slope = 1 15

Distribution of the district level residuals (left hand side) and normal q-q plot of the district level residuals (right hand side) If the model assumptions are satisfied then the district level residuals are expected to be randomly distributed and not significantly different from the regression line y=0, where under the GLMM model, the area T level residuals are defined as r ˆ xβ ˆ d d d 16

The Figure clearly reveals that the randomly distributed district level residuals and the line of fit does not significantly differ from the line y=0, as expected The q-q plots also confirm the normality assumption Therefore the model diagnostics are fully satisfied for the data 17

Diagnostic procedures To validate the reliability of the model-based small area estimates we used the bias diagnostics, coefficient of variation (CV) and computed the 95 percent confidence intervals The bias diagnostics are used to investigate if the model-based estimates are less extreme as compared to the direct survey estimates The bias scatter plot of the model-based estimates against the direct estimates show that the model-based estimates are less extreme as compared to the direct estimates, demonstrating the typical SAE outcome of shrinking more extreme values towards the average 18

Bias diagnostics plot 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Model based estimates 19 Direct estimates 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Coefficient of variation (CV) assesses the improved precision of the model-based estimates compared to the direct survey estimates The CVs show the sampling variability as a percentage of the estimate Estimates with large CVs are considered unreliable (smaller is better) Plot of CV for direct vs Model based estimates There are no internationally accepted tables available that allow us to judge how large is too large The estimated CVs show that model-based estimates have a higher degree of reliability as compared to the direct estimates 20

District-wise CV for direct (solid line) and model-based estimate (dash line) 21

95% confidence intervals of direct estimates and model-based estimates 22

The standard errors of the direct estimates are too large and therefore the estimates are unreliable Note that for many districts we can even not produce the confidence intervals due to unavailability of standard errors These results show the degree of inequality with respect to distribution of poor households in different districts In many districts the lower bound (Lower) of 95% confidence interval (CI) is negative which results in practically impossible and inadmissible values of CI for direct estimates In contrast, the model estimate with precise CI and reasonable CV percent are reliable 23

This problem was mostly observed when there was no variability in the sample data of district (e.g. all y values in sample were 0) The results clearly show the advantage of using SAE technique to cope up the small sample size problem in producing the estimates or reliable confidence intervals These estimates can definitely be useful for resource allocation and policy decision-making relating the living condition of people in rural areas 24

Head Count Ratio - Uttar Pradesh 0.26 0.15 0.13 0.10 0.09 0.07 0.06 0.03 0.01 25

26 Poverty Distribution Head Count Ratio - Uttar Pradesh 0.01 0.03 0.06 0.07 0.09 0.10 0.13 0.15 0.26 Q10 0.01 0.03 0.06 0.08 0.09 0.10 0.13 0.15 0.27 Q25 0.08 0.13 0.17 0.21 0.23 0.26 0.33 0.35 0.52 Q50 0.24 0.37 0.44 0.47 0.49 0.53 0.56 0.62 0.70 Q75 0.52 0.65 0.71 0.74 0.75 0.78 0.80 0.83 0.89 Q90 0.75 0.84 0.88 0.90 0.91 0.92 0.93 0.93 0.95

Q10 Q25 0.27 0.15 0.13 0.10 0.09 0.08 0.06 0.03 0.01 0.52 0.35 0.33 0.26 0.23 0.21 0.17 0.13 0.08 27

Q50 Q75 0.70 0.62 0.56 0.53 0.49 0.47 0.44 0.37 0.24 0.89 0.83 0.80 0.78 0.75 0.74 0.71 0.65 0.52 28

Q90 Head Count Ratio - Uttar Pradesh 0.95 0.93 0.93 0.92 0.91 0.90 0.88 0.84 0.75 0.26 0.15 0.13 0.10 0.09 0.07 0.06 0.03 0.01 29

Conclusions We demonstrate the application of SAE techniques to estimate the district level statistics of poor households using survey and Census data The diagnostic procedures clearly confirm that the model-based district level estimates have reasonably good precision As the quantum of work involved in the conduct of Census is quite appreciable, Censuses are generally carried out after a fixed period of time - Census data is available only after a certain time period This study produces reliable statistics at micro level using existing surveys and other already available secondary data and can be seen as an initiative example for further applications Such micro level statistics can be generated without conducting micro level specific survey 30

This has the merit that unlike Census, estimates can be produced on a regular basis from existing surveys. 31

Related References [1] Ambler, R., Caplan, D., Chambers, R., Kovacevic, M. and Wang, S. (2001). Combining unemployment benefits data and LFS data to estimate ILO unemployment for small areas: an application of a modified Fay-Herriot method. Proceedings of the Int. Assoc. of Survey Stat., Meeting of the ISI, Seoul, August 2001. [2] Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error component model for prediction of county crop areas using survey and satellite data. J. of the Amer. Stat. Assoc. 83, pp. 28-36. [3] Chambers, R. and Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika, 93,255-68. [4] Chandra, H., Salvati, N. and Sud, U.C. (2011a). Disaggregate-level estimates of indebtedness in the state of Uttar Pradesh in India-an application of small area estimation technique. Journal of Applied Statistics, 38(11), pp. 2413-2432. [5] Chandra, H., Sud, UC. and Gupta V.K. (2013). Small Area Estimation under Area Level Model Using R Software. http://sample.iasri.res.in/ssrs/sae_using_r.pdf [6] Chandra, H., Sud, U. C. and Salvati, N. (2011b). Estimation of District Level Poor Households in the State of Uttar Pradesh in India by Combining NSSO Survey and Census Data. Journal of the Indian Society of Agricultural Statistics, 65(1), 1-8. [7] Elbers, C., Lanjouw J. and Lanjouw P., (2003). Micro-level estimation of poverty and inequality, Econometrica, 71, 355-64. 32

[8] Fay, R. E. and Herriot, R. A. (1979). Estimation of income from small places: an application of james-stein procedures to census data. J. of the Amer. Stat. Assoc. 74, pp. 269-277. [9] Johnson, F.A., Chandra, H., Brown, J. J. and Padmadas, S. (2009). District-level estimates of institutional births in ghana: application of small area estimation technique using census and DHS data, J. of Off. Stat. [10] Manteiga, G.W., Lombardìa, M.J., Molina, I., Morales, D. and Santamarìa, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Comput. Stat. & Data Anal. 51(2007), pp. 2720-2733. [11] Molina, I. and Rao, J.N.K. (2009). Small area estimation of poverty indicators. The Canadian Journal of Statistics. [12] Mukhopadhyay, P.K. and McDowell, A. (2011). Small Area Estimation for Survey Data Analysis Using SAS. Working paper 336-2011. SAS Institute Inc., Cary, NC. http://support.sas.com/resources/papers/proceedings11/336-2011.pdf [13] Prasad, N. G. and Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators,journal of the American Statistical Association, 85, 163-71. [14] Rao, J.N.K. (2003). Small Area Estimation. Wiley Series in Survey Methodology, John Wiley and Sons Inc, 2003. [15] Saei A. and Chambers, R. (2003). Small area estimation under linear and generalized linear mixed models with time and area effects, W.P. No. M03/15(2003), Southampton Statistical Sciences Research Institute, University of Southampton, UK. [16] Tzavidis, N., Salvati, N., Pratesi, M. and Chambers, R. (2008). M-quantile models for poverty mapping. Stat.Meth.and Applications, 17, 393-411. 33

34