Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Similar documents
Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Selection of small area estimation method for Poverty Mapping: A Conceptual Framework

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

Small Domains Estimation and Poverty Indicators. Carleton University, Ottawa, Canada

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

M-Quantile And Expectile. Random Effects Regression For

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India

A Resampling Method on Pivotal Estimating Functions

BTRY 4090: Spring 2009 Theory of Statistics

Combining data from two independent surveys: model-assisted approach

Advances in M-quantile estimation

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Poverty Estimation Methods: a Comparison under Box-Cox Type Transformations with Application to Mexican Data

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Outlier robust small area estimation

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model

MS&E 226: Small Data

Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)

Accounting for Complex Sample Designs via Mixture Models

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Small area prediction based on unit level models when the covariate mean is measured with error

Local Polynomial Wavelet Regression with Missing at Random

Flexible Estimation of Treatment Effect Parameters

Robustness to Parametric Assumptions in Missing Data Models

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Constructing Prediction Intervals for Random Forests

3 Joint Distributions 71

Small Area Estimation Using a Nonparametric Model Based Direct Estimator

Inference based on robust estimators Part 2

Advanced Statistics II: Non Parametric Tests

Disease mapping via negative binomial M-quantile regression

Economics 582 Random Effects Estimation

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Linear Models and Estimation by Least Squares

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Linear models and their mathematical foundations: Simple linear regression

Statistical Inference

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

Advanced Econometrics

Modeling Real Estate Data using Quantile Regression

A measurement error model approach to small area estimation

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Spatial M-quantile Models for Small Area Estimation

Statistical Properties of Numerical Derivatives

AFT Models and Empirical Likelihood

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Quantile regression and heteroskedasticity

A better way to bootstrap pairs

Contextual Effects in Modeling for Small Domains

UNIVERSITÄT POTSDAM Institut für Mathematik

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Generalized quantiles as risk measures

Motivational Example

Introduction to Survey Data Integration

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Graduate Econometrics I: Asymptotic Theory

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Binary choice 3.3 Maximum likelihood estimation

Inference via Kernel Smoothing of Bootstrap P Values

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

ESTP course on Small Area Estimation

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

Graduate Econometrics I: Unbiased Estimation

Small Sample Corrections for LTS and MCD

Transformation and Smoothing in Sample Survey Data

Generated Covariates in Nonparametric Estimation: A Short Review.

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Lecture 3: Statistical Decision Theory (Part II)

Jyh-Jen Horng Shiau 1 and Lin-An Chen 1

Lecture 4: Heteroskedasticity

9. Robust regression

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Inference Based on the Wild Bootstrap

Machine Learning Basics: Maximum Likelihood Estimation

Regression #3: Properties of OLS Estimator

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Robust covariance estimation for quantile regression

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Using Estimating Equations for Spatially Correlated A

Central Limit Theorem ( 5.3)

Imputation for Missing Data under PPSWR Sampling

The Bootstrap in Econometrics

Transcription:

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department of Statistics and Mathematics Applied to Economics, University of Pisa 2 Social Science Statistical Research Institute, University of Southampton NTTS conference, Bruxelles, 23th - 25th February 2011

Outline 1 Motivation 2 Review of models for small area estimation 3 Model-Based Estimators of Means, Quantiles and Poverty Indicators 4 A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators 5 Simulation Results 6 Concluding Remarks

Motivation Part I Motivation

Motivation Motivation Goal: Picture on poverty and social exclusion at small area level (e.g. LAU 1-2) How: Estimate small area means, totals, quantiles, head count ratio and poverty gap Focus: Mean Squared Error (MSE) estimation for small area estimators Unique framework to estimate MSE of small area means, quantiles and poverty indicators estimators under the M-quantile regression model

Motivation Why do we use small area methods? Measure key statistics (e.g. poverty indicators) Survey data (e.g. EU-SILC) Design for accurate estimates at a given domain level (e.g. NUTS 2) Demand of accurate estimates at an higher domain level (e.g. LAU 1-2) Use of small area methods (not need oversampling)

Review of models for small area estimation Part II Review of models for small area estimation

Review of models for small area estimation Methods for small area estimation Modern small area estimation is based on model-based methods Statistical models link the variable of interest with covariate information that is also known for units not in the sample A class of models suitable for small area estimation is multilevel models A novel approach to small area estimation is based on quantile/m-quantile models

Review of models for small area estimation Small area estimation: mixed effects models Concept: include random area-specific effects to account for the between area variation beyond that explained by the variation in model covariates Notation: (j =area, i =individual) Variable of interest: y ij Focus on unit level covariate information: x ij Area level random effect: γ j - (hp: normal distribution) Random error: ɛ ij - (hp: normal distribution) y ij = x T ij β + γ j + ɛ ij, i = 1,..., n j, j = 1,..., d

Review of models for small area estimation M-quantile models With regression models we model the mean of the variable of interest (y) given the covariates (x) A more complete picture is offered, however, by modeling not only the mean of (y) given (x) but also other quantiles. Examples include the median, the 25th, 75th percentiles. This is known as quantile regression An M-quantile regression model for quantile q Main features of these models Q q = x T ij β ψ (q) No hypothesis of normal distribution Robust methods (influence function of the M-quantile regression)

Review of models for small area estimation Using M-quantile models to measure area effects Central Idea: Area effects can be described by estimating an area specific q value (ˆθ j ) for each area (group) of a hierarchical dataset (Chambers and Tzavidis, 2006) q ij : Q qij = y ij y ij = x T ij β ψ(q ij ) ˆθ j = n 1 j i s j q ij ŷ ij = x T ij β ψ(ˆθ j ) REMARK: A mixed effects model uses random effects γ j to capture the dissimilarity between groups. M-quantile models attempt to capture this dissimilarity via the group-specific M-quantile coefficients ˆθ j

Model-Based Estimators of Means, Quantiles and Poverty Indicators Part III Model-Based Estimators of Means, Quantiles and Poverty Indicators

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Small Area Quantiles An estimator of the Chambers-Dunstan small area distribution function can be defined under the M-quantile model ˆF (t) j,cd = N 1 j i sj I (y ij t) + n 1 j I (ŷ kj + (y ij ŷ ij ) t) i s j k r j where ŷ kj = x T kj ˆβ ψ (ˆθ j ) and ŷ ij = x T ij ˆβ ψ (ˆθ j ). The estimate of the qth quantile for small area j ( ˆQ q,j ) can be obtained by numerically solving the integral ˆQ q,j : ˆQ q,j dˆf (t) j,cd = q

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Small Area Means Following Tzavidis, Marchetti & Chambers (2010), the bias-adjusted estimator of the mean is defined as ˆm MQ/CD j = t dˆf CD,j (t) = N 1 j { y ij + x T ˆβ(ˆθ ij j ) + N j n j n j i s j i r j i s j [y ij x T ij ˆβ(ˆθ j )]} An alternative to the CD estimator of the distribution function that can be used is the Rao-Kovar-Mantel (RKM) estimator It can be shown that under srs integration of the RKM or the CD estimators will result in the same estimator for the small area mean

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Poverty Indicators Denoting by t the poverty line and following the EB approach (Molina and Rao, 2010), different poverty measures are defined by using ( t yij ) αi(yij F α,ij = t) i = 1,..., N t The population distribution function in small area j can be decomposed as follows [ ] F α,j = N 1 j F α,ij + F α,ij i s j i r j Setting α = 0 defines the Head Count Ratio whereas setting α = 1 defines the Poverty Gap. HCR and PG can be estimated as follows { ˆF 0,j = N 1 j I (y ij t) + n 1 j i s j { ˆF 1,j = N 1 t y ij j t i s j I (y ij t) + n 1 j k r j } I (ŷ kj + (y ij ŷ ij ) t) k r j i s j t ŷ kj (y ij ŷ ij ) t i s j } I (ŷ kj + (y ij ŷ ij ) t) where ŷ kj = x T kj ˆβ ψ (ˆθ j ) and ŷ ij = x T ij ˆβ ψ (ˆθ j )

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Poverty Indicators (MC approach) 1 Fit the M-quantile small area model using the raw y s sample values and obtain estimates of β and θ j ; 2 draw an out of sample vector using y ij,r = x ij,r ˆβ(ˆθj ) + e ij,r, where eij,r is a vector of size N j n j drawn from the Empirical Distribution Function (EDF) of the estimated M-quantile regression residuals or from a smooth version of this distribution and ˆβ, ˆθ j are obtained from the previous step; 3 repeat the process H times. Each time combine the sample data and out of sample data for estimating the target using ˆF α,j = N 1 j 4 average the results over H simulations. [ ] I(y ij t) + I(yij t) ; i s j i r j

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators Part IV A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Our MSE estimator for the small area means, quantiles and poverty indicators is based on the bootstrap method proposed by Lombardia et al. (2003). In this work we adapted and extended the Lombardia et al. (2003) bootstrap method to the small area estimation problem under the M-quantile approach

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Let b = (1,..., B), where B is the number of bootstrap populations Let r = (1,..., R), where R is the number of bootstrap samples Let Ω = (y k, x k ), k (1,..., N), be the target population By we denote bootstrap quantities ˆτ j denotes the small area j mean, quantile or poverty indicators estimator Let y be the study variable that is known only for sampled units and let x be the vector of auxiliary variables that is known for all the population units Let s = (1,..., n) be a within area simple random sample of the finite population Ω = {1,..., N}

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Fit the M-quantile regression model on sample s, ŷ ij = x T ij ˆβ ψp (ˆθ j ) Compute the residuals, y ij ŷ ij = e ij Generate B bootstrap populations of dimension N, Ω b 1 y kj = x T kj ˆβ ψ (ˆθ j ) + e kj, k = (1,..., N) 2 e kj are obtained by sampling with replacement residuals e ij 3 residuals can be sampled from the empirical distribution function or from a smoothed distribution function 4 we can consider all the residuals (e i, i = 1,..., n), that is the unconditional approach or only area residuals (e ij, i = 1,..., n j ), that is the conditional approach. From every bootstrap population draw R samples of size n without replacement

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Bias From the B bootstrap populations and from the R samples drawn from every bootstrap population estimate the mean squared error of the Chambers-Dunstan estimator of the distribution function BIAS(ˆτ j ) = B 1 B b=1 R 1 R r=1 Variance VAR(ˆτ j ) = B 1 B b=1 R 1 R r=1 (ˆτ br j (ˆτ br j ) τj b ) br 2 ˆτ j where τj b is the true parameter of the are j in the bth bootstrap population ˆτ j br is the estimate for τj b estimated using the r th sample drown from the bth bootstrap population ˆτ j br = R 1 R br r=1 ˆτ j

Simulation Results Part V Simulation Results

Simulation Results Simulation Design Data generating process: y ij = 11 x ij + γ j + ɛ ij, j = 1,..., d = 30, i = 1,..., N j 50 N j 150 and N = 2820, 5 n j 15 and n = 282 x ij N(µ j, σ x = 1), µ j U[8, 11] γ i χ 2 (1), ɛ ij χ 2 (6) The target parameters are the mean, the median and the head count ratio and poverty gap

Simulation Results Model Based Simulation Averages Min 1st Q Median Mean 3rd Q Max True 0.75 0.94 1.06 1.10 1.27 1.49 Estimated(Analytic) 0.79 0.89 1.01 1.02 1.12 1.27 Estimated(Bootstrap) 0.81 0.91 1.02 1.07 1.19 1.41 Rel. Bias(%)(Analytic) 15.04 9.20 7.17 6.53 3.99 7.42 Rel. Bias(%)(Bootstrap) 7.24 4.93 1.84 2.48 0.77 7.95 RMSE(Analytic) 0.22 0.26 0.33 0.37 0.45 0.67 RMSE(Bootstrap) 0.08 0.11 0.13 0.15 0.18 0.27 HCR Min 1st Q Median Mean 3rd Q Max True 0.08 0.08 0.09 0.10 0.11 0.13 Estimated 0.08 0.08 0.09 0.10 0.11 0.13 Rel. Bias(%) 5.04 1.14 0.12 0.19 2.19 5.81 RMSE 0.01 0.02 0.02 0.02 0.02 0.03 PG Min 1st Q Median Mean 3rd Q Max True 0.07 0.08 0.09 0.09 0.10 0.12 Estimated 0.07 0.08 0.09 0.10 0.11 0.12 Rel. Bias(%) 5.59 1.02 0.07 0.26 2.48 5.05 RMSE 0.01 0.02 0.02 0.02 0.02 0.03 Median Min 1st Q Median Mean 3rd Q Max True 0.93 1.07 1.23 1.26 1.40 1.68 Estimated 0.88 0.97 1.11 1.16 1.29 1.54 Rel. Bias(%) 13.71 9.97 8.14 8.16 6.57 2.79 RMSE 0.12 0.16 0.19 0.20 0.23 0.29 Table: True, Estimated, Root Mean Squared Error (RMSE) and relative bias of the Root Mean Squared Error estimator summarized over areas and simulations. Smooth Unconditional Approach.

Concluding Remarks Part VI Concluding Remarks

Concluding Remarks Concluding Remarks Unique framework for Bootstrap MSE Estimator for the Small Area Means, Quantiles and Poverty Indicators Easy to implement for tackling a very difficult problem i.e. estimating the MSE of small area estimated quantiles Asymptotic assumptions and results made by Lombardia et al. (2003) are still reasonable under the M-quantile regression model R functions are available to compute point estimates and relative mean squared error of small area means, quantiles and poverty indicators Time consuming Small underestimation It is necessary to know auxiliary variables for all the population units

Concluding Remarks Essential Bibliography Breckling J. and Chambers R. (1988). M-quantiles. Biometrika, 75, 761-71. Chambers, R., Dorfman, A., Peter, H. (1992). Properties of estimators of the finite population distribution function. Biometrika 79 (3), 577-582 Chambers, R., Dunstan, M. (1986). Estimating distribution function from survey data. Biometrika 73, 597-604. Chambers, R., Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika 93 (2), 255-68. Foster, J., Greer, J., Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica 52, 761-766. Lombardia, M., Gonzalez-Manteiga, W., Prada-Sanchez, J. (2003). Bootstrapping the chambers-dunstan estimate of finite population distribution function. Journal of Statistical Planning and Inference 116, 367-388. Molina, I., Rao, J. (2010). Small area estimation of poverty indicators. The Canadian Journal of Statistics. Newey, W., Powell, J. (1987). Asymmetric least squares estimation and testing. Econometrica 55 (4), 819-47. Tzavidis, N., Marchetti, S., Chambers, R. (2010). Robust estimation of small area means and quantiles. Australian and New Zealand Journal of Statistics 52 (2), 167-186.