Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department of Statistics and Mathematics Applied to Economics, University of Pisa 2 Social Science Statistical Research Institute, University of Southampton NTTS conference, Bruxelles, 23th - 25th February 2011

Outline 1 Motivation 2 Review of models for small area estimation 3 Model-Based Estimators of Means, Quantiles and Poverty Indicators 4 A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators 5 Simulation Results 6 Concluding Remarks

Motivation Part I Motivation

Motivation Motivation Goal: Picture on poverty and social exclusion at small area level (e.g. LAU 1-2) How: Estimate small area means, totals, quantiles, head count ratio and poverty gap Focus: Mean Squared Error (MSE) estimation for small area estimators Unique framework to estimate MSE of small area means, quantiles and poverty indicators estimators under the M-quantile regression model

Motivation Why do we use small area methods? Measure key statistics (e.g. poverty indicators) Survey data (e.g. EU-SILC) Design for accurate estimates at a given domain level (e.g. NUTS 2) Demand of accurate estimates at an higher domain level (e.g. LAU 1-2) Use of small area methods (not need oversampling)

Review of models for small area estimation Part II Review of models for small area estimation

Review of models for small area estimation Methods for small area estimation Modern small area estimation is based on model-based methods Statistical models link the variable of interest with covariate information that is also known for units not in the sample A class of models suitable for small area estimation is multilevel models A novel approach to small area estimation is based on quantile/m-quantile models

Review of models for small area estimation Small area estimation: mixed effects models Concept: include random area-specific effects to account for the between area variation beyond that explained by the variation in model covariates Notation: (j =area, i =individual) Variable of interest: y ij Focus on unit level covariate information: x ij Area level random effect: γ j - (hp: normal distribution) Random error: ɛ ij - (hp: normal distribution) y ij = x T ij β + γ j + ɛ ij, i = 1,..., n j, j = 1,..., d

Review of models for small area estimation M-quantile models With regression models we model the mean of the variable of interest (y) given the covariates (x) A more complete picture is offered, however, by modeling not only the mean of (y) given (x) but also other quantiles. Examples include the median, the 25th, 75th percentiles. This is known as quantile regression An M-quantile regression model for quantile q Main features of these models Q q = x T ij β ψ (q) No hypothesis of normal distribution Robust methods (influence function of the M-quantile regression)

Review of models for small area estimation Using M-quantile models to measure area effects Central Idea: Area effects can be described by estimating an area specific q value (ˆθ j ) for each area (group) of a hierarchical dataset (Chambers and Tzavidis, 2006) q ij : Q qij = y ij y ij = x T ij β ψ(q ij ) ˆθ j = n 1 j i s j q ij ŷ ij = x T ij β ψ(ˆθ j ) REMARK: A mixed effects model uses random effects γ j to capture the dissimilarity between groups. M-quantile models attempt to capture this dissimilarity via the group-specific M-quantile coefficients ˆθ j

Model-Based Estimators of Means, Quantiles and Poverty Indicators Part III Model-Based Estimators of Means, Quantiles and Poverty Indicators

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Small Area Quantiles An estimator of the Chambers-Dunstan small area distribution function can be defined under the M-quantile model ˆF (t) j,cd = N 1 j i sj I (y ij t) + n 1 j I (ŷ kj + (y ij ŷ ij ) t) i s j k r j where ŷ kj = x T kj ˆβ ψ (ˆθ j ) and ŷ ij = x T ij ˆβ ψ (ˆθ j ). The estimate of the qth quantile for small area j ( ˆQ q,j ) can be obtained by numerically solving the integral ˆQ q,j : ˆQ q,j dˆf (t) j,cd = q

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Small Area Means Following Tzavidis, Marchetti & Chambers (2010), the bias-adjusted estimator of the mean is defined as ˆm MQ/CD j = t dˆf CD,j (t) = N 1 j { y ij + x T ˆβ(ˆθ ij j ) + N j n j n j i s j i r j i s j [y ij x T ij ˆβ(ˆθ j )]} An alternative to the CD estimator of the distribution function that can be used is the Rao-Kovar-Mantel (RKM) estimator It can be shown that under srs integration of the RKM or the CD estimators will result in the same estimator for the small area mean

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Poverty Indicators Denoting by t the poverty line and following the EB approach (Molina and Rao, 2010), different poverty measures are defined by using ( t yij ) αi(yij F α,ij = t) i = 1,..., N t The population distribution function in small area j can be decomposed as follows [ ] F α,j = N 1 j F α,ij + F α,ij i s j i r j Setting α = 0 defines the Head Count Ratio whereas setting α = 1 defines the Poverty Gap. HCR and PG can be estimated as follows { ˆF 0,j = N 1 j I (y ij t) + n 1 j i s j { ˆF 1,j = N 1 t y ij j t i s j I (y ij t) + n 1 j k r j } I (ŷ kj + (y ij ŷ ij ) t) k r j i s j t ŷ kj (y ij ŷ ij ) t i s j } I (ŷ kj + (y ij ŷ ij ) t) where ŷ kj = x T kj ˆβ ψ (ˆθ j ) and ŷ ij = x T ij ˆβ ψ (ˆθ j )

Model-Based Estimators of Means, Quantiles and Poverty Indicators Model-Based Estimators of Poverty Indicators (MC approach) 1 Fit the M-quantile small area model using the raw y s sample values and obtain estimates of β and θ j ; 2 draw an out of sample vector using y ij,r = x ij,r ˆβ(ˆθj ) + e ij,r, where eij,r is a vector of size N j n j drawn from the Empirical Distribution Function (EDF) of the estimated M-quantile regression residuals or from a smooth version of this distribution and ˆβ, ˆθ j are obtained from the previous step; 3 repeat the process H times. Each time combine the sample data and out of sample data for estimating the target using ˆF α,j = N 1 j 4 average the results over H simulations. [ ] I(y ij t) + I(yij t) ; i s j i r j

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators Part IV A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Our MSE estimator for the small area means, quantiles and poverty indicators is based on the bootstrap method proposed by Lombardia et al. (2003). In this work we adapted and extended the Lombardia et al. (2003) bootstrap method to the small area estimation problem under the M-quantile approach

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Let b = (1,..., B), where B is the number of bootstrap populations Let r = (1,..., R), where R is the number of bootstrap samples Let Ω = (y k, x k ), k (1,..., N), be the target population By we denote bootstrap quantities ˆτ j denotes the small area j mean, quantile or poverty indicators estimator Let y be the study variable that is known only for sampled units and let x be the vector of auxiliary variables that is known for all the population units Let s = (1,..., n) be a within area simple random sample of the finite population Ω = {1,..., N}

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Fit the M-quantile regression model on sample s, ŷ ij = x T ij ˆβ ψp (ˆθ j ) Compute the residuals, y ij ŷ ij = e ij Generate B bootstrap populations of dimension N, Ω b 1 y kj = x T kj ˆβ ψ (ˆθ j ) + e kj, k = (1,..., N) 2 e kj are obtained by sampling with replacement residuals e ij 3 residuals can be sampled from the empirical distribution function or from a smoothed distribution function 4 we can consider all the residuals (e i, i = 1,..., n), that is the unconditional approach or only area residuals (e ij, i = 1,..., n j ), that is the conditional approach. From every bootstrap population draw R samples of size n without replacement

A MSE Estimator of Small Area Means, Quantiles and Poverty Indicators A Mean Squared Error Estimator of Small Area Means, Quantiles and Poverty Indicators Bias From the B bootstrap populations and from the R samples drawn from every bootstrap population estimate the mean squared error of the Chambers-Dunstan estimator of the distribution function BIAS(ˆτ j ) = B 1 B b=1 R 1 R r=1 Variance VAR(ˆτ j ) = B 1 B b=1 R 1 R r=1 (ˆτ br j (ˆτ br j ) τj b ) br 2 ˆτ j where τj b is the true parameter of the are j in the bth bootstrap population ˆτ j br is the estimate for τj b estimated using the r th sample drown from the bth bootstrap population ˆτ j br = R 1 R br r=1 ˆτ j

Simulation Results Part V Simulation Results

Simulation Results Simulation Design Data generating process: y ij = 11 x ij + γ j + ɛ ij, j = 1,..., d = 30, i = 1,..., N j 50 N j 150 and N = 2820, 5 n j 15 and n = 282 x ij N(µ j, σ x = 1), µ j U[8, 11] γ i χ 2 (1), ɛ ij χ 2 (6) The target parameters are the mean, the median and the head count ratio and poverty gap

Simulation Results Model Based Simulation Averages Min 1st Q Median Mean 3rd Q Max True 0.75 0.94 1.06 1.10 1.27 1.49 Estimated(Analytic) 0.79 0.89 1.01 1.02 1.12 1.27 Estimated(Bootstrap) 0.81 0.91 1.02 1.07 1.19 1.41 Rel. Bias(%)(Analytic) 15.04 9.20 7.17 6.53 3.99 7.42 Rel. Bias(%)(Bootstrap) 7.24 4.93 1.84 2.48 0.77 7.95 RMSE(Analytic) 0.22 0.26 0.33 0.37 0.45 0.67 RMSE(Bootstrap) 0.08 0.11 0.13 0.15 0.18 0.27 HCR Min 1st Q Median Mean 3rd Q Max True 0.08 0.08 0.09 0.10 0.11 0.13 Estimated 0.08 0.08 0.09 0.10 0.11 0.13 Rel. Bias(%) 5.04 1.14 0.12 0.19 2.19 5.81 RMSE 0.01 0.02 0.02 0.02 0.02 0.03 PG Min 1st Q Median Mean 3rd Q Max True 0.07 0.08 0.09 0.09 0.10 0.12 Estimated 0.07 0.08 0.09 0.10 0.11 0.12 Rel. Bias(%) 5.59 1.02 0.07 0.26 2.48 5.05 RMSE 0.01 0.02 0.02 0.02 0.02 0.03 Median Min 1st Q Median Mean 3rd Q Max True 0.93 1.07 1.23 1.26 1.40 1.68 Estimated 0.88 0.97 1.11 1.16 1.29 1.54 Rel. Bias(%) 13.71 9.97 8.14 8.16 6.57 2.79 RMSE 0.12 0.16 0.19 0.20 0.23 0.29 Table: True, Estimated, Root Mean Squared Error (RMSE) and relative bias of the Root Mean Squared Error estimator summarized over areas and simulations. Smooth Unconditional Approach.

Concluding Remarks Part VI Concluding Remarks

Concluding Remarks Concluding Remarks Unique framework for Bootstrap MSE Estimator for the Small Area Means, Quantiles and Poverty Indicators Easy to implement for tackling a very difficult problem i.e. estimating the MSE of small area estimated quantiles Asymptotic assumptions and results made by Lombardia et al. (2003) are still reasonable under the M-quantile regression model R functions are available to compute point estimates and relative mean squared error of small area means, quantiles and poverty indicators Time consuming Small underestimation It is necessary to know auxiliary variables for all the population units

Concluding Remarks Essential Bibliography Breckling J. and Chambers R. (1988). M-quantiles. Biometrika, 75, 761-71. Chambers, R., Dorfman, A., Peter, H. (1992). Properties of estimators of the finite population distribution function. Biometrika 79 (3), 577-582 Chambers, R., Dunstan, M. (1986). Estimating distribution function from survey data. Biometrika 73, 597-604. Chambers, R., Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika 93 (2), 255-68. Foster, J., Greer, J., Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica 52, 761-766. Lombardia, M., Gonzalez-Manteiga, W., Prada-Sanchez, J. (2003). Bootstrapping the chambers-dunstan estimate of finite population distribution function. Journal of Statistical Planning and Inference 116, 367-388. Molina, I., Rao, J. (2010). Small area estimation of poverty indicators. The Canadian Journal of Statistics. Newey, W., Powell, J. (1987). Asymmetric least squares estimation and testing. Econometrica 55 (4), 819-47. Tzavidis, N., Marchetti, S., Chambers, R. (2010). Robust estimation of small area means and quantiles. Australian and New Zealand Journal of Statistics 52 (2), 167-186.