Outlier Robust Small Area Estimation

Similar documents
On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

Bias-correction under a semi-parametric model for small area estimation

A Robust Method for Calculating the Correlation Coefficient

Robust Small Area Estimation Using a Mixture Model

Chapter 11: Simple Linear Regression and Correlation

Comparison of Regression Lines

Negative Binomial Regression

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Small Area Estimation for Business Surveys

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Numerical Heat and Mass Transfer

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Difference Equations

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Lecture 4 Hypothesis Testing

Chapter 13: Multiple Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Linear Regression Analysis: Terminology and Notation

4.3 Poisson Regression

Global Sensitivity. Tuesday 20 th February, 2018

A nonparametric two-sample wald test of equality of variances

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Bayesian predictive Configural Frequency Analysis

Statistics for Economics & Business

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Lecture 3 Stat102, Spring 2007

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Chapter 5 Multilevel Models

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Small Area Estimation Under Spatial Nonstationarity

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

x i1 =1 for all i (the constant ).

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

e i is a random error

/ n ) are compared. The logic is: if the two

Basically, if you have a dummy dependent variable you will be estimating a probability.

Linear Approximation with Regularization and Moving Least Squares

Composite Hypotheses testing

28. SIMPLE LINEAR REGRESSION III

NUMERICAL DIFFERENTIATION

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators *

STAT 3008 Applied Regression Analysis

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

x = , so that calculated

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Lecture 12: Discrete Laplacian

Chapter 6. Supplemental Text Material

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

Statistics II Final Exam 26/6/18

18. SIMPLE LINEAR REGRESSION III

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Efficient nonresponse weighting adjustment using estimated response probability

Chapter 12 Analysis of Covariance

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

β0 + β1xi and want to estimate the unknown

An (almost) unbiased estimator for the S-Gini index

A Bound for the Relative Bias of the Design Effect

The Ordinary Least Squares (OLS) Estimator

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Andreas C. Drichoutis Agriculural University of Athens. Abstract

STAT 511 FINAL EXAM NAME Spring 2001

Chapter 9: Statistical Inference and the Relationship between Two Variables

Limited Dependent Variables

Uncertainty as the Overlap of Alternate Conditional Distributions

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Supporting Information

F statistic = s2 1 s 2 ( F for Fisher )

Small Area Interval Estimation

A Comparative Study for Estimation Parameters in Panel Data Model

Testing for seasonal unit roots in heterogeneous panels

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

First Year Examination Department of Statistics, University of Florida

Time-Varying Systems and Computations Lecture 6

A New Method for Estimating Overdispersion. David Fletcher and Peter Green Department of Mathematics and Statistics

Model Based Direct Estimation of Small Area Distributions

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Inductance Calculation for Conductors of Arbitrary Shape

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

Convergence of random processes

On mutual information estimation for mixed-pair random variables

Lecture 6: Introduction to Linear Regression

Small area estimation for semicontinuous data

Lecture 2: Prelude to the big shrink

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Transcription:

Unversty of Wollongong Research Onlne Centre for Statstcal & Survey Methodology Workng Paper Seres Faculty of Engneerng and Informaton Scences 009 Outler Robust Small Area Estmaton R. Chambers Unversty of Wollongong, ray@uow.edu.au Hukum Chandra Unversty of Wollongong, hchandra@uow.edu.au N. Salvat Unversty of Psa N. zavds Unversty of Manchester Recommended Ctaton Chambers, R.; Chandra, Hukum; Salvat, N.; and zavds, N., Outler Robust Small Area Estmaton, Centre for Statstcal and Survey Methodology, Unversty of Wollongong, Workng Paper 16-09, 009, 40p. http://ro.uow.edu.au/cssmwp/36 Research Onlne s the open access nsttutonal repostory for the Unversty of Wollongong. For further nformaton contact the UOW Lbrary: research-pubs@uow.edu.au

Centre for Statstcal and Survey Methodology he Unversty of Wollongong Workng Paper 16-09 Outler Robust Small Area Estmaton R Chambers, H Chandra, N Salvat and N zavds Copyrght 008 by the Centre for Statstcal & Survey Methodology, UOW. Work n progress, no part of ths paper may be reproduced wthout permsson from the Centre. Centre for Statstcal & Survey Methodology, Unversty of Wollongong, Wollongong NSW 5. Phone +61 41 5435, Fax +61 41 4845. Emal: anca@uow.edu.au

Outler Robust Small Area Estmaton R. Chambers 1, H. Chandra, N. Salvat 3 and N. zavds 4 Abstract: Outlers are a well-known problem n survey estmaton, and a varety of approaches have been suggested for dealng wth them n ths context. However, when the focus s on small area estmaton usng the survey data, much less s known even though outlers wthn a small area sample are clearly much more nfluental than they are n the larger overall sample. o the best of our knowledge, Chambers and zavds (006) was the frst publshed paper n small area estmaton that explctly addressed the ssue of outler robustness, usng an approach based on fttng outler robust M-quantle models to the survey data. Recently, Snha and Rao (009) have also addressed ths ssue from the perspectve of lnear mxed models. Both these approaches, however, use plug-n robust predcton. hat s, they replace parameter estmates n optmal, but outler senstve, predctors by outler robust versons. Unfortunately, ths approach may nvolve an unacceptable predcton bas (but a low predcton varance) n stuatons where the outlers are drawn from a dstrbuton that has a dfferent mean to the rest of the survey data (Chambers, 1986), whch then leads to the suggeston that outler robust predcton should nclude an addtonal term that makes a correcton for ths bas. In ths paper, we explore the extenson of ths dea to the small area estmaton stuaton and we propose two dfferent analytcal mean squared error (MSE) estmators for outler robust predctors of small area means. We use smulaton based on realstc outler contamnated data to evaluate how the extended small area estmaton approach compares wth the plug-n robust methods descrbed earler. he emprcal results show that the bascorrected predctve estmators are less based than the proectve estmators especally when there are outlers n the area effects. Moreover, n the smulaton experments we contrast the proposed MSE estmators wth those generally utlzed for the plug-n robust predctors. he proposed bas-robust and lnearzaton-based MSE estmators appear to perform well when used wth the robust predctors of small area means that are consdered n ths paper. Key words and phrases: Lnear mxed model; M-quantle model; M-estmaton. Robust predcton; Basvarance trade-off; EBLUP; Robust bas correcton. Unversty of Wollongong Wollongong 5, New South Wales, Australa. E-mal: ray@uow.edu.au Indan Agrcultural Statstcs Research Insttute. E-mal: hchandra@asr.res.n Unversty of Psa. E-mal: salvat@ec.unp.t Unversty of Manchester. E-mal: nkos.tzavds@manchester.ac.uk 1

1. Introducton Outlers are a fact of lfe for any survey, and especally so for busness surveys. As a consequence, a varety of methods have been devsed to mtgate the effects of outler values on survey estmates. Some of these, lke dentfcaton and removal of these data values by experenced data experts durng survey processng, can be effectve n ensurng that the resultng survey estmates are unaffected by them. However, beng somewhat subectve, such methods are not amenable to scentfc evaluaton. As a consequence there are a number of obectve methods for survey estmaton that use statstcal rules to decde whether an observaton s a potental outler and to down-weght ts contrbuton to the survey estmates f ths s the case. Generally, an outler robust estmator of ths type s based on the assumpton that the non-outler sample values all follow a well-behaved workng model and so t generally nvolves predcton of the sum (or mean) of these values under ths workng model. In practce, ths often nvolves replacement of an outlyng sample value by an estmate of what t should have been f n fact t had been generated under the workng model. We refer to such methods as Robust Proectve n what follows snce they proect sample non-outler behavour on to the non-sampled part of the survey populaton. Robust Proectve methods essentally emulate the subectve approach descrbed earler, and typcally lead to based estmators wth lower varances than would otherwse be the case. he reason for the bas s not dffcult to fnd t s extremely unlkely that the non-sampled values n the target populaton are drawn from a dstrbuton wth the same mean as the sample non-outlers, and yet these methods are bult on precsely ths assumpton. Chambers (1986) recognsed ths dlemma and coned the concept of a representatve outler,.e. a sample outler that s potentally drawn from a group of populaton outlers and hence cannot be untweghted n estmaton. He noted that representatve outlers cannot be treated on the same bass n estmaton as other sample data more consstent wth the workng model for the populaton, snce such values can serously destablse the survey estmates, and suggested addton of an outler robust bas correcton term to a Robust Proectve survey estmator, e.g. one based on outler-robust estmates of the model parameters. Welsh and Ronchett (1998) expand on ths dea, applyng t more generally to estmaton of the fnte populaton dstrbuton of a survey varable n the presence of representatve outlers. A smlar dea s mplct n the approach descrbed n Chambers et al. (1993), where a nonparametrc bas correcton s suggested. In what

follows, we refer to methods that allow for contrbutons from representatve sample outlers as Robust Predctve snce they attempt to predct the contrbuton of the populaton outlers to the populaton quantty of nterest. If outlers are a concern for estmaton of populaton quanttes, t s safe to say that they are even more of a concern n small area estmaton, where sample szes are consderably smaller and model-dependent estmaton s the norm. It s easy to see that an outler that destablses a populaton estmate based on a large survey sample wll almost certanly destroy the valdty of the correspondng drect estmate for the small area from whch the outler s sourced snce ths estmate wll be based on a much smaller sample. hs problem does not dsappear when the small area estmator s an ndrect one, e.g. an Emprcal Best Lnear Unbased Predctor (EBLUP), snce the weghts underpnnng ths estmator wll stll put most emphass on data from the small area of nterest, and the estmates of the model parameters underpnnng the estmator wll themselves be destablsed by the sample outlers. Consequently, t s of some nterest to see how outler robust survey estmaton can be adapted to ths stuaton. Chambers and zavds (006) explctly address ths ssue of outler robustness, usng an approach based on fttng outler robust M-quantle models to the survey data. Recently, Snha and Rao (009) have also addressed ths ssue from the perspectve of lnear mxed models. Both these approaches, however, use plug-n robust predcton. hat s, they replace parameter estmates n optmal, but outler senstve, predctors by outler robust versons (a Robust Proectve approach). Unfortunately, ths approach may nvolve an unacceptable predcton bas (but a low predcton varance) n stuatons where the outlers are drawn from a dstrbuton that has a dfferent mean to the rest of the survey data. After dscussng Robust Proectve estmators for small areas n Secton, we explore the extenson of Chambers (1986) Robust Predctve approach to the small area estmaton stuaton n Secton 3. In Secton 4 we propose two dfferent analytcal mean squared error (MSE) estmators for outler robust predctors of small area means. In partcular, the frst proposal s based on bas-robust mean squared error estmaton dscussed by Chambers et al. (007) and represents an extenson of the deas n Royall and Cumberland (1978). We show how ths approach can be useful for estmatng the MSE of small area predctors based on the Snha and Rao (009) approach. he second MSE estmator s developed under the condtonal verson of the lnear mxed 3

model and t uses the frst order approxmatons to the varances of solutons of estmatng equatons. hs last approach can be used for estmatng the MSE of a wde varety of small area 'pseudo-lnear' predctors,.e. predctors that can be wrtten as weghted sums, where the weghts are sample data dependent. Examples of such predctors are mxed model and M-quantle model-based predctors under both the Robust Proectve and the Robust Predctve approaches. In Sectons 5 and 6 we use model-based smulatons based on realstc outler contamnated data scenaros as well as desgn-based smulatons to evaluate how these two dfferent approaches compare, both n terms of estmaton performance as well as n terms of MSE estmaton performance. Secton 7 concludes the paper wth some fnal remarks, and a dscusson of future research amed at outler robust small area nference.. Robust Proectve Estmaton for Small Areas In what follows we assume that unt record data are avalable at small area level. For the sampled unts n the populaton ths conssts of ndcators of small area afflaton, values y of the varable of nterest, values x of a p 1 vector of ndvdual level covarates, and values z of a vector of area level covarates. For the nonsampled populaton unts we do not know the values of y. However t s assumed that all areas are sampled and that we know the numbers of such unts n each small area and the respectve small area averages of x and z. We also assume that there s a lnear relatonshp between y and x and that samplng s nonnformatve for the small area dstrbuton of y gven x, allowng us to use populaton level models wth the sample data. A popular way of usng the above data n small area estmaton s to assume a lnear mxed model, wth random effects for the small areas of nterest (see Rao, 003). Let y, X and Z denote the populaton level vector and matrces defned by y, x and z respectvely. hen y = X + Zu + e (1) where u N(0, u ) s a vector of mq area-specfc random effects and e N(0, e ) s a vector of N 4

ndvdual specfc random effects. Here m s the number of small areas that make up the populaton and q s the dmenson of z. It s assumed that the covarance matrces u and e are defned n terms of a lower dmensonal set of parameters = ( 1,, K ), whch are typcally referred to as the varance components of (1), whle s usually referred to as ts fxed effect. Let ˆ and û denote estmates of the fxed and random effects n (1). he EBLUP of the area mean of the y under (1) s then ŷ EBLUP 1 = N { n y s + ( N n )( x ˆ + zrû r )} () where û denotes the vector of the estmated area specfc random effects and we use ndces of s and r to denote sample and non-sample quanttes respectvely. hus y s s the average of the n sample values of y from area and x r and z r denotng the vectors of average values of x and z respectvely for the N n non-sampled unts n the same area. From a Robust Proectve vewpont, () can be made nsenstve to sample outlers by replacng ˆ and û by outler robust alternatves. o motvate ths approach, we ntally assume the varance components are known, so the covarance matrces u and e n (1) are known. Put V s = es + Z s u Z s where es denotes the sample component of e. hen the BLUE of the fxed effect vector s whle the BLUP of the random effects vector u s = { X s V 1 s X s } 1 X s V 1 s y s, (3) u = u Z s V 1 s y s X ( s ). (4) It s easy to see that (3) and (4) are solutons to X s V 1 s ( y s X s )= 0 (5) and u Z s V 1 s ( y s X s ) u = 0. (6) A straghtforward way to make the solutons to (5) and (6) robust to sample outlers s therefore to replace 5

them by X s V 1/ 1/ s ( V s { y s X s } )= 0 (7) and u Z s V 1/ 1/ s ( V s { y s X s } ) 1/ u ( 1/ u u)= 0. (8) Here s a bounded nfluence functon and (a) denotes the vector defned by applyng to every component of a. Unfortunately, snce V s s not a dagonal matrx, the soluton to (8) can be numercally unstable. An alternatve approach was therefore suggested by Fellner (1986), who noted that any soluton to (5) and (6) was also a soluton to and X s 1 es ( y s X s Z s u)= 0 Z s 1 es ( y s X s Z s u) 1 u u = 0. He suggested that these alternatve estmatng equatons (and hence ther solutons) be made outler robust by replacng them by and X s 1/ 1/ es ( es { y s X s Z s u} )= 0 (9) Z s 1/ 1/ es ( es { y s X s Z s u} ) 1/ u ( 1/ u u)= 0. (10) Snce (9) and (10) assume the varance components are known, ther usefulness s somewhat lmted unless outler robust estmators of these parameters can also be defned. hs s an ssue nvestgated by Rchardson and Welsh (1995). hese authors propose two outler robust varatons to the maxmum lkelhood estmatng equatons for. One of these (ML Proposal II) leads to an estmatng equaton for the varance component k of of the form ( y s X s ) 1/ { V s } Vs1/ V s k ( )V 1/ 1/ s { V s ( y s X s )}= tr D n V s k { ( )} (11) where V s k denotes the frst order partal dervatve of V s wth respect to the varance component k 6

and, for Z N(0,1), D n = E { (Z)}V 1 s. (1) Snha and Rao (009) descrbe an approach to outler robust estmaton of and u n (1) that bulds on these results, substtutng approxmate solutons to both (7) and (11) nto the Fellner estmatng equaton (10) to obtan an outler robust estmate of the area effect u. In partcular, ther approach replaces (7) by X s V 1 s U 1/ 1/ s ( U s { y s X s } )= 0 (13) where U s = dag ( V s ), and replaces (1) by ( y s X s ) 1/ { U s } U s 1/ V 1 s ( V s k )V 1 s U 1/ 1/ s { U s ( y s X s )}= tr { D n ( V s k )}. (14) Snce the solutons to (13) and (14) depend on the nfluence functon, we denote them by a superscrpt of below. he Snha and Rao (009) Robust Proectve alternatve to () s then ŷ SR = x ˆ + z û. (15) Note that (15) estmates the area mean under (1). A mnor modfcaton restrcts ths to the mean of the non-sampled unts n area, n whch case (15) becomes ŷ REBLUP 1 = N { n y s + ( N n )( x r ˆ + z )} rû. (16) Hereafter we call ths estmator Robust EBLUP (REBLUP). An alternatve methodology for outler robust small area estmaton s the M-quantle regresson-based method descrbed by Chambers and zavds (006). hs s based on a lnear model for the M-quantle regresson of y on X,.e. m q (X) = X q (17) where m q (X) denotes the M-quantle of order q of the condtonal dstrbuton of y gven X. An estmate ˆ q of q can be calculated for any value of q n the nterval (0,1), and for each unt n sample we defne ts unque M-quantle coeffcent under ths ftted model as the value q such that y = x ˆq, wth the sample average of these coeffcents n area denoted by q. he M-quantle estmate of the mean of y n 7

area s then ŷ MQ 1 = N n y s + ( N n )x ˆq { r }. (18) Note that the regresson M-quantle (17) model depends on the nfluence functon underpnnng the M- quantle. When ths functon s bounded, sample outlers have lmted mpact on ˆ q. hat s, (18) corresponds to assumng that all non-sample unts n area follow the workng model (17) wth q = q, n the sense that one can wrte y = x q + nose for all such unts. 3. Robust Predctve Estmaton for Small Areas A problem wth the Robust Proectve approach s that t assumes all non-sampled unts follow the workng model, or, n what essentally amounts to the same thng, that any devatons from ths model are nose and so cancel out on average. hus, under the lnear mxed model (1) one can see that provded the ndvdual errors of the non-sampled unts are symmetrcally dstrbuted about zero, the REBLUP (16) of Snha and Rao (009) wll perform well snce t s based on the mplct assumpton that the average of these errors over the non-sampled unts n area converges to zero. he M-quantle estmator MQ (18) s no dfferent snce t assumes that the errors y x q from the area -specfc M-quantle regresson model are nose and hence also cancel out on average. Note that ths does not mean that these non-sample unts are not outlers. It s ust that ther behavour s such that our best predcton of ther correspondng average value s zero. Welsh and Ronchett (1998) consder the ssue of outler robust predcton wthn the context of populaton level survey estmaton. Startng wth a workng lnear model lnkng the populaton values of y and x, and sample data contanng representatve outlers wth respect to ths model, they extend the approach of Chambers (1986) to robust predcton of the emprcal dstrbuton functon of the populaton values of y. her argument mmedately apples to robust predcton of the emprcal dstrbuton functon of the area values of y, and leads to a predctor of the form ˆF 1 (t) = N s 1 I( y t) + n s kr ( { } ) t I x k ˆ + ( y x ˆ ). (19) 8

Here ˆ denotes an M-estmator of the regresson parameter n the lnear workng model based on a bounded nfluence functon, s a robust estmator of the scale of the resdual y x ˆ n area and denotes a bounded nfluence functon that satsfes. zavds et al. (009) note that the robust estmator of the area mean of the y defned by (19) s ust the expected value functonal defned by t, whch s ŷ = td ˆF (t) 1 = N n y s + N n ( ) x r ˆ + n 1 s {( y x ˆ ) }. (0) hese authors therefore suggest an extenson to the M-quantle estmator (18) by replacng ˆ n (0) by ˆ q, whch leads to a bas-corrected verson of (18), hereafter MQ-BC, gven by ŷ MQBC 1 = N n y s + N n ( ) x r ˆq + n 1 s MQ MQ y x ˆq {( ) } (1) and MQ s a robust estmator of the scale of the resdual y x ˆq n area. he use of the two nfluence functons and n (1) s worthy of comment. he frst,, underpns ˆq, and hence ˆq. Its purpose s to ensure that sample outlers have lttle or no nfluence on the ft of the workng M-quantle model. As a consequence t s bounded and down-weghts these outlers. he second,, s stll bounded but less restrctve than (snce ) and ts purpose s to defne an adustment for the bas caused by the fact that the frst two terms on the rght hand sde of (1) treat sample outlers as self-representng. A smlar argument can be used to modfy the REBLUP (16). In partcular, a Robust Predctve verson of ths estmator, hereafter REBLUP-BC, mmcs the bas correcton dea used n (1) and leads to {( ) } ŷ REBLUPBC = ŷ REBLUP 1 1 + ( 1 n N )n y x ˆ z û, () s where the are now robust estmates of the scale of the area resduals y x ˆ z û. 9

4. MSE Estmaton for Robust Predctors In ths Secton we propose two dfferent MSE estmators for robust predctors of small area means under the Robust Proectve and Robust Predctve approaches. In Secton 4.1 we apply the deas set out by Chambers et al. (007) to develop a pseudo-lnearzaton estmator of the MSE of REBLUP and REBLUP-BC. In Secton 4. we use frst order approxmatons to the varances of solutons of estmatng equatons to develop MSE estmators, under the condtonal verson of the lnear mxed model, for the REBLUP, EBLUP and MQ predctors for small area means. 4.1 Bas-robust MSE estmaton for REBLUP and REBLUP-BC Snha and Rao (009) proposed a computatonally ntensve parametrc bootstrap-based estmator for the MSE of REBLUP. An alternatve MSE s the one that condtons on the realsed values of the area effects (see Longford, 007). In what follows we propose an estmator of the condtonal MSE of the REBLUP and REBLUP-BC that s much less computatonally demandng than the uncondtonal MSE estmators suggested by Snha and Rao (009). he proposed estmator s based on the pseudo-lnearzaton approach to MSE estmaton descrbed by Chambers et al. (007). See also Chandra and Chambers (005, 009) and Chandra et al. (007). he MSE estmator can be used for predctors that can be expressed as weghted sums of the sample values. For ths reason re-express REBLUP (16) and REBLUP-BC () n a pseudo-lnear form, and then apply heteroskedastcty-robust predcton varance estmaton methods that treat these weghts (whch typcally depend on estmated varance components) as fxed. More precsely, under model (1) the Robust BLUP of y can be expressed as where ŷ RBLUP = w RBLUP y s RBLUP ( w s ) = 1 1 N s + N n RBLUP = ( w s ) y s, 1 m (3) { ( ) x r A s + z r B s ( I s X s A s ) }. Here A s = ( X s V 1 s U 1/ s W 1s U 1/ s X s ) 1 X s V 1 s U 1/ s W 1s U 1/ s, where W 1s s a n n dagonal matrx of weghts 10

( ) U 1/ 1/ wth -th component w 1 = U y x { } y x { }; ( ) 1 Z s es B s = Z s 1/ es W s 1/ es Z s + 1/ 1/ u W 3s u weghts wth -th component w = 1/ 1/ ( W s es ), where W s s a n n dagonal matrx of { } ( e ) 1 y x z u ( e ) 1 y x { z u }; and W 3s s a m m dagonal matrx of weghts wth -th component w 3 = u ( ) 1 u ( u ) 1 u. he Appendx provdes detals on the computaton of such weghts. Note that the REBLUP (16) can be expressed n exactly the same way, except that all quanttes n the vector w s RBLUP that depend on (unknown) varance components now need a hat. Gven ths pseudo-lnear representaton for the REBLUP, we develop a smple frst order approxmaton to ts MSE assumng the condtonal verson of the model (1),.e. the random effects are consdered as fxed. In ths case we can apply the approach descrbed by Royall and Cumberland (1978) to estmate the predcton varance of the RBLUP for y. Let I( ) denote the ndcator for whether unt s n area. hen Var ŷrblup y X,u ( ))= N { s + Var y x,u r ( N w RBLUP I( ) ) Var y x,u ( ) ( ) }, (4) where the frst term on the rght hand sde above s estmated replacng Var ( y x,u ) by 1 ( y ˆμ ), where ˆμ = and = 1 + k y ks k s an unbased lnear estmator of the condtonal expected value μ = E( y x,u ) { ks k } s a scalng constant. Further detals can be found n Chambers et al. (007) and Salvat et al. (009). he condtonal predcton varance of the RBLUP s V ˆ( ŷ RBLUP ) = N { a + (N n )n 1 } 1 ( y ˆμ ), (5) s where a = N w RBLUP I( ). Due to the well-known shrnkage effect assocated wth BLUPs, replacng ˆμ by the BLUP of μ under (1) n expresson (5) can lead to based estmaton of the predcton varance under the condtonal model. For ths reason, Chambers et al. (007) recommend that ˆμ be computed as the unshrunken verson of the BLUP for μ : 11

ˆμ = x + z B s u. (6) he condtonal bas of the RBLUP under (1) s gven by whch has the smple plug-n estmator E ( ŷ RBLUP y X,u )= w RBLUP μ s ˆB( ŷ RBLUP ) = 1 N μ ( ), r s RBLUP 1 w ˆμ s N ˆμ ( ), (7) r s wth ˆμ defned by expresson (6). he estmator of the condtonal MSE of the RBLUP can fnally be wrtten as RBLUP MSE ( ŷ )= Vˆ ŷ RBLUP ( )+ ˆB ŷ RBLUP { ( )}. (8) he condtonal MSE of the REBLUP (16) s then estmated by replacng all unknown varance components n (8) by ther estmated values. Note that: (a) ˆ = 1+ O(n 1 ) n ths case, so that ˆ wll be very close to one n most practcal applcatons. hs suggests that there s lttle to be ganed by not settng ˆ 1 when calculatng the condtonal predcton varance (5); (b) the square of the bas estmator (7) can be based for the squared bas term n the MSE estmator. hs bas can be corrected (see Chambers et al., 007), but a small sample sze could lead to ths correcton becomng unstable, so we prefer use (8) snce ths s then a conservatve estmator of the MSE of the predctor of the small area mean under model (1); (c) the heteroskedastcty-robust MSE estmator (8) gnores the extra varablty assocated wth estmaton of the varance components, and s therefore a frst order approxmaton to the actual condtonal MSE of the REBLUP. Snce use of the REBLUP wll typcally requre a large overall sample sze, we expect any consequent underestmaton of the condtonal MSE of the REBLUP to be small. he condtonal MSE estmator for the REBLUP-BC () s obtaned usng the same heteroskedastctyrobust pseudo-lnearzaton approach as outlned above for the MSE estmator for the REBLUP. he only dfference from that development s that the weghts w RBLUP used n (3) are now replaced by correspondng REBLUP-BC weghts 1

RBLUPBC ( w s ) = 1 1+ N n w N n 1 + x s N n x r n w A s + s + ( N n ) N n w n z B s { I s X s A s } (9) s, y where w x ˆ z û {( ) } =. Snce the REBLUP-BC s an approxmately unbased estmator of y x ˆ z û ( ) the small area mean, the squared bas term does not mpact sgnfcantly on the mean squared error estmator, and so s typcally omtted. 4. Lnearzaton-Based MSE estmaton for small area predctors In ths Secton we propose a new MSE estmator, extendng the lnearzaton approach of Street et al. (1988) to estmaton of predcton varance for estmators based on robust estmatng equatons. he MSE estmator s developed on the assumpton that the workng model for nference s an area-specfc lnear model, and so the approach condtons on area effects when appled n the context of such a model. In what follows we show how ths approach can be used for estmatng the MSE of the REBLUP (16), the EBLUP () and the MQ estmator (18). he MSE estmators of REBLUP-BC and MQ-BC are reported n the Appendx. Note that when used wth an estmator based on a mxed model, the proposed MSE estmator provdes a second order approxmaton to the true MSE snce t ncludes a term for the contrbuton to varablty from estmaton of varance components. MSE estmaton for REBLUP Under model (1) the predcton varance of the Robust BLUP of y can be expressed as Var ŷ RBLUP 1 ( y )= Var x N + z u ( ) 1 y r N r = 1 n x r Var ( )x r + 1 n z r Var ( u )z r + 1 n N N N Var ( e r ), (30) assumng ndependence between and u. It follows that we need to estmate Var( ), Var( u ) n order to be able to calculate an estmate of the predcton varance of the RBLUP. In order to do ths, put = (,u ), so = (, u ). hen, from equatons (10) and (13), H( ) = 0 where 13

H( ) = H ( ) H u ( ) = Z s 1/ 1/ es es X s V 1 s U 1/ 1/ s ( U s { y s X s } )= 0 ( { y s X s Z s u } ) 1/ u 1/ u u ( )= 0. Snce the solutons of the equatons depend on the nfluence functon, we denote them by a superscrpt of. We can use prevous results on the asymptotc varance of solutons to an estmatng equaton (Welsh and Rchardson, 1997; Snha and Rao, 009) to obtan a frst order approxmaton tovar ( ) and by extenson the predcton varance of the RBLUP. o do ths, we note that Var 0 ( ) { E 0 ( H 0 )} 1 Var 0 H( 0 ) { } E 0 ( H 0 ) { } 1 where Var 0 ( H ( 0 ))= Var 1/ 0 U y x { ( 0 ) } X V 1 s s U s V 1 s X s Var 0 ( H u ( 0 ))= E 0 ( e ) 1 y x 0 z { ( u 0 ) } Z 1 s es Z s, (31) and E 0 { H ( 0 )}= X s V 1 s U 1/ s E 0 E 0 { u H u ( 0 )}= Z s 1/ es E 0 1/ { U s ( y s X s 0 )} U 1/ X s s 1/ es y s X s { ( 0 Z s u 0 )} 1/ Z es s 1/ u E 1/ { u u 0 } u (3) 1/. he prevous expressons lead to the estmator: Var ( )= Ê H 0 Var u ( )= Ê H u 0 { ( )} 1 Var { H ( 0 )} { Ê ( H 0 )} 1 { ( )} 1 Var { Hu ( 0 )} { Ê ( u H 0 )} 1, (33) where Ê { H ( 0 )}= X s V 1 s U 1/ s RU 1/ s X S, Ê { u H u ( 0 )}= Z s 1/ es 1/ es Z s 1/ u Q 1/ u, Var { H ( 0 )}= n p Var { Hu ( 0 )}= n p ( ) ( ) 1 n 1 r =1 ( ) ( ) 1 n t =1 X s V s 1 U s V s 1 X s, and Z s 1 es Z s. Here, assumng use of a Huber Proposal nfluence functon, R s a n n dagonal matrx wth -th 1/ dagonal element s 1 f c < r < c, 0 otherwse, wth r = U y x ( ); the constant c represents the cutoff of the bounded nfluence functon; s a dagonal matrx of dmenson n n wth -th element 14

dagonal element equal to 1 f c < t < c, 0 otherwse, wth t = ( e ) 1 y x ( z u ); Q s a m m dagonal matrx wth -th dagonal element equal to 1 f c < q < c, 0 otherwse, wth q = ( u ) 1/ u. he values 1 = 1+ p Var n corrector terms (Huber, 1981). ( ( r ) ) E r ( ( )) and = 1+ p Var n t An estmator of the predcton varance of RBLUP can be wrtten as: where h 1 ( )= 1 n N ˆ V ( ŷ RBLUP y )= h 1 ( )+ h ( )+ h 3 ( ( )) E ( t ) ( ) are bas ( ) (34) z ˆ r V ( u )z r s due to the estmaton of random effects, whle the second term h ( )= 1 n N x ˆ r V estmated from the area data: ˆ V (e r ) = ˆ V (e r ) = 1 (N n )(n 1) ( )x r s due to the estmator. he term h 3 ( )= 1 n h sh 1 (N n )(n 1) s N ( ) can be ˆ V e r y x ( z u ), or from the entre data set: y x ( z u ). Moreover, snce we are workng under the condtonal approach, we have to add to the varance estmator (34) an estmator of the squared bas term. he result s that the estmator of the condtonal MSE of the RBLUP can be wrtten as: RBLUP MSE ( ŷ )= h 1 ( )+ h ( )+ h 3 ( )+ { ˆB ( ŷ )} RBLUP, (35) where the ˆB ŷ RBLUP ( ) s the expresson (7) developed n the prevous Secton. he correspondng estmator of the condtonal MSE of the REBLUP (16) s obtaned by addng an extra component to expresson (35) due to the varablty of the estmated varance components: REBLUP MSE ( ŷ RBLUP )= MSE ( ŷ )+ E ŷreblup ŷ RBLUP ( ). (36) he last term s ntractable and t s therefore necessary to approxmate t. An approxmaton of ths term s obtaned by aylor approxmaton followng the results of Prasad and Rao (1990). Under the condtonal approach ŷ REBLUP ŷ RBLUP 1 N z r k=1 ( k B s )( y s X s ){ ˆ k k }, 15

where B s s defned as n prevous Secton, and = ( u, e ) s the vector of the varance components. Assumng that the dervatve of ( ) wth respect to s of lower order, the term E ( ) ŷ REBLUP ŷrblup n (36) s then estmated by h 4 1 ( )= N z r Var( ˆ 1 u, ˆ e ) N z r + om ( ) 1 (37) where ( ) z u {( ) z l u } ( ) = u, e B s ( )+ e I( = l) u l, e B s. k=1 Note that Var( ˆ u, ˆ e ) n (37) s obtaned usng the results of the asymptotc dstrbuton of ( u, e ) gven n Snha and Rao (009). he MSE estmator of the REBLUP (16) then becomes: REBLUP MSE ( ŷ )= h 1 ( )+ h ( )+ h 3 ( )+ { ˆB ( ŷ )} RBLUP + h 4 ( ). (38) An estmator of REBLUP MSE ( ŷ ) can be obtaned by replacng all unknown varance components n (38) by ther estmated values ˆ. hs corresponds to substtutng = (, u ) by ˆ = MSE approxmaton (38) and leads to: We have Eh ˆ ( ) ( )= h ˆ 1 ( )+ h ˆ ( )+ h ˆ 3 ( )+ mse ŷ REBLUP = h ( )+ om 1 ( ), Eh ˆ 3 ( ) = h 3 ( )+ om 1 { ˆB ( ŷ )} REBLUP + h ˆ 4 ( ), Eh ˆ 4 ( ) desred order of approxmaton. However, h ˆ 1 ( )s not the correct estmator of h 1 ( ˆ,û ) n the ( ). (39) = h 4 ( )+ om 1 ( ) to the ( ) because ts bas s generally of the same order as h ˆ ( ), h ˆ 3 ( ), h ˆ 4 ( ). o evaluate the bas of h ˆ 1 ( ), we use a aylor seres expanson of h 1 ˆ ( ) around = u, e h 1 ( ): ( ˆ )= h 1 ( )+ ˆ = h 1 ( )+ 1 +. ( ) h 1 ( )+ 1 ( ˆ ) h 1 ( ) ( ) ˆ If ˆ s unbased for then E 1 = 0. In general, f ˆ s based, E 1 s of lower order than E, 16

so Eh 1 ( ˆ ) h 1 ( )+ 1 tr h 1 ( )E ˆ = h 1 ( )+ 1 1 n N ( ) ( ˆ ) tr { z r h 1 ( )z Var( ˆ u, ˆ e )}+ om ( 1 ). We denote the second term on the rght hand sde above by h ˆ 5 ( ). he estmator of the MSE of ŷ REBLUP s then: and E mse ŷ REBLUP ( ) ( )= h ˆ 1 ( )+ h ˆ ( )+ h ˆ 3 ( )+ ˆB ŷ mse ŷ REBLUP = MSE REBLUP ( ŷ )+ om ( 1 ). REBLUP { ( )} + h ˆ 4 ( )+ h ˆ 5 ( ) (40) MSE estmaton for EBLUP he second predctor of y that we consder s the well-known EBLUP based on (1). Note that EBLUP s a partcular case of REBLUP when the bounded nfluence functon s replaced by the (unbounded) dentty functon. Under (1) the predcton varance of the BLUP of y s Var ( ŷ RBLUP y )= 1 n N Var ( )x r + 1 n x r N z r Var ( u )z r + 1 n N Var ( e r ) (41) assumng ndependence between and u. Puttng = (,u ), so = (, u ) and usng results on the asymptotc varance of solutons to estmatng equatons (Rchardson and Welsh, 1997), H( ) = H ( ) X s 1 es ( y s X s Z s u)= 0 = H u ( ) Z s 1 es ( y s X s Z s u) 1 u u = 0, we obtan frst order approxmaton to Var ( ) and by extenson the predcton varance of the BLUP. he startng pont s Var 0 ( ) { E 0 ( H 0 )} 1 Var 0 H( 0 ) Var ( )= Ê H 0 Var ( u )= Ê H u 0 { } E 0 ( H 0 ) { } 1 { ( )} 1 Var { H ( 0 )} Ê ( H 0 ) { } 1 { ( )} 1 Var { Hu ( 0 )} Ê ( u H 0 ) { } 1, whch leads to the estmators: (4) where Ê { H ( 0 )}= X s V 1 s X s, 17

Ê { u H u ( 0 )}= Z s 1 es Z s 1 u, ( ) Var { H ( 0 )}= ( n p) 1 n y x =1 X V 1 s s V 1 s X s, and Var { Hu ( 0 )}= ( n p) 1 n y x ( z u) =1 Z 1 s es 1 es Z s. An estmator of the MSE for the BLUP can therefore be wrtten as: where h 1 ( )= 1 n N BLUP MSE ( ŷ )= h 1 ( )+ h ( )+ h 3 ( ) (43) z ˆ r V ( u )z r s due to the estmaton of random effects, whle the second term h ( )= 1 n N x ˆ r V area data, ˆ V (e r ) = ˆ V (e r ) = 1 (N n )(n 1) ( )x r s due to. he term h 3 ( )= 1 n 1 (N n )(n 1) h sh s y x ( z u ) N ( ) can be estmated ust usng ˆ V e r, or by usng all the sample data, y x ( z u ). Note that we have not added the squared bas estmator to (43) as we dd n the REBLUP case because ths bas s zero (see Chambers et al., 007). In order to defne the condtonal MSE of the EBLUP, we add the term h 4 ( ), see equaton (37), to (43). In the case of the EBLUP predctor for the small area mean, h 4 ( ) contans two dfferences wth respect to the same expresson developed for REBLUP: ) the matrx B s = u Z s V s 1 ; ) Var( ˆ u, ˆ e ) s obtaned usng the results of the asymptotc dstrbuton of ˆ ( u, ˆ e ) gven by Rao (003). he MSE of the EBLUP () s therefore and ts estmator can be wrtten as: EBLUP MSE ( ŷ )= h 1 ( )+ h ( )+ h 3 ( )+ h 4 ( ) (44) snce Eh 1 ( )= h ˆ 1 ( )+ h ˆ ( )+ h ˆ 3 ( )+ h ˆ 4 mse ŷ EBLUP ( ˆ ) h 1 ( ) h 4 ( )+ om ( 1 ). ( ) (45) 18

MSE estmaton for MQ he thrd predctor that we consder s the MQ predctor (18) based on the M-quantle approach (Chambers and zavds, 006). For fxed q, the predcton varance of the MQ predctor s Var( ŷ MQ y ) = 1 n N { x r Var ( ˆ q )x r } + 1 n N Var ( e r ). (46) It follows that we need to estmate Var ( ˆ q ) n order to be able to calculate an estmate of the predcton varance of ths predctor. he startng pont, as usual, s the frst order approxmaton based on the estmatng equatons for ˆq. Puttng q = q, { ( )} Var ˆq 0 ( ) E 0 q H 0 1 Var 0 { ( )} { H( 0q )} E 0 q H 0 1 (47) wth n H( 0q ) = x q (r ) = X s q (r 0q ) =1 where q s a bounded nfluence functon dependng on q, q (r 0q ) s the n-vector wth elements q (r 0q ) = q { 1 0q ( y x 0q )} and 0q s a robust estmator of the scale of the resdual y x 0q. he Var 0 { H( 0q )} component of expresson (47) can then be wrtten as Var 0 { H( 0q )}= X s { E 0 { q (r 0q ) q (r 0q )}}X s, because the y values are condtonally uncorrelated and E 0 { q (r 0q )}= 0 for each q. Assumng a Huber-type nfluence functon, we obtan E 0 ( q H 0q ) = X s E 0 d d q q (r 0q ) q = 0 q = X s CX s, where C s a n n dagonal matrx wth -th dagonal component 1 0q E 0q { qi ( 0 < r 0q c)+ ( 1 q)i ( c < r 0q 0) }. hese expressons then lead to two types of estmators: 1. Var ( ˆq ) = n(n p) 1 { Ê ( q H 0q )} 1 { } Ê q H 0q Var H(0q ) { ( )} 1 (48) 19

where Var { H(0q )}= X s ˆFX s, ˆF s a dagonal matrx of dmenson n n wth -th element equal to n ˆf = ŵ q ˆr q ŵ q ˆr q =1 ; Ê ( q H 0q ) = X ĈX s s where Ĉ s a n n dagonal matrx wth -th 1 element ĉ = ˆ q { qi ( 0 < ˆr q c)+ ( 1 q)i ( c < ˆr q 0) }. Here ŵ q s the fnal weght n the teratve 1 re-weghted least squared (IRLS) process, and ˆr q = ˆ q ( y x ˆq ). Note the factor n(n p) 1 whch ensures agreement wth Street et al. (1988) when X s = 1 and q = 0.5. he = 1+ p n Var ( ˆr q ( q ) ) E q ( ( ˆr q )). Var ( ˆq ) = value s the bas corrector term (Huber, 1981). ( ˆr q ) ( n p) 1 n q =1 X s X s n n 1 q ( ˆr q ) =1 hat s, the Street et al. (1988) estmator when q = 0.5. ( ) 1. (49) Dependng on whch of (48) or (49) s used, the estmator of the predcton varance of the MQ predctor when q = q can be wrtten as: wth ˆ V (e r ) = ˆ V 1 (N n )(n 1) ŷ MQ ( )= 1 n h sh we have to add an estmator of the squared bas based on: N x r Var ˆq { ( )x r } + 1 n N ( ) (50) ˆ V e r y x ˆqh ( ). Moreover, snce we are takng a condtonal approach, 1 ( )= N w x ˆqk x ˆq k s k ˆB ŷmq where and b = x r N n n x s N + b w = n f b otherwse (X s W(q )X s ) 1 X s W(q ) s a 1 n vector. he fnal expresson for the 0

MSE estmator of the MQ predctor s therefore: MQ MSE ( ŷ )= 1 n N x r Var ˆq { ( )x r } + 1 n N 1 ( )+ N w x ˆqk x ˆq. (51) k s k ˆ V e r Note that (50) s a frst order approxmaton to the asymptotc predcton varance of the MQ predctor, and so (51) could underestmate ts MSE. 5. Results from Model-Based Smulaton Studes We provde model-based smulaton results llustratng the comparatve performances of the dfferent outler robust small area predctors descrbed above. Populaton data are generated for m = 40 small areas, wth samples selected by smple random samplng wthout replacement wthn each area. Populaton and sample szes are the same for all areas, and are fxed at ether N = 100, n = 5 or N = 300, n = 15. Values for X are generated as ndependently and dentcally dstrbuted from a lognormal dstrbuton wth a mean of 1.004077 and a standard devaton of 0.5 on the log scale. Values for Y are generated as y = 100 + 5x + u +, where the random area and ndvdual effects are ndependently generated accordng to four scenaros: [0,0] No outlers: u N(0,3) and N(0,6). [e,0] Indvdual outlers only: u N(0,3) and N(0,6) + (1 )N(0,150), where s an ndependently generated Bernoull random varable wth Pr( = 1) = 0.97,.e. the ndvdual effects are ndependent draws from a mxture of two normal dstrbutons, wth 97% on average drawn from a wellbehaved N(0,6) dstrbuton and 3% on average drawn from an outler N(0,150) dstrbuton. [0,u] Area outlers only: u N(0,3) for areas 1-36, u N(9,0) for areas 37-40 and N(0,6),.e. random effects for areas 1 36 are drawn from a well behaved N(0,3) dstrbuton, wth those for areas 37 40 drawn from an outler N(9,0) dstrbuton. Indvdual effects are not outler-contamnated. [e,u] Outlers n both area and ndvdual effects: u N(0,3) for areas 1-36, u N(9,0) for areas 37-40 and N(0,6) + (1 )N(0,150). Each scenaro s ndependently smulated 500 tmes. For each smulaton the populaton values are generated accordng the underlyng scenaro model, a sample s selected n each area and the sample 1

data are then used to compute estmates of each of the actual area means for Y. Fve dfferent estmators are used for ths purpose - the standard EBLUP, see (), whch serves as a reference; the proectve M-quantle estmator MQ, see (18); the robust bas-corrected predctve MQ estmator MQ-BC, see (1); the robust proectve REBLUP estmator of Snha and Rao (009), see (16); and ts robust bas-corrected verson REBLUP-BC, see (). In all cases the proectve nfluence functon s a Huber Proposal type wth tunng constant c = 1.345. In contrast, the predctve, less restrctve, nfluence functon used n MQ-BC and REBLUP-BC s also a Huber Proposal type, but wth a larger tunng constant, c = 3. he performance of these estmators across the dfferent areas and smulatons s assessed by computng the medan values of ther area specfc relatve bas and relatve root mean squared error, where the relatve bas of an estmator ŷ for the actual mean y of area s the average across smulatons of the errors ŷ y dvded by the correspondng average value of y, and ts relatve root mean squared error s the square root of the average across smulatons of the squares of these errors, agan dvded by the average value of y. able 1 sets out these medan values for the dfferent smulaton scenaros and dfferent estmators. he relatve bas results set out n able 1 confrm our expectatons regardng the behavour of proectve estmators (EBLUP, REBLUP and MQ) versus bas-corrected predctve estmators (REBLUP-BC and MQ-BC). he former are more based than the latter as a consequence of ther mplct assumpton that although outler varances may be nflated relatve to non-outlers, outler effects stll have zero expectaton. hs ncrease n bas s most pronounced when there are outlers n the area effects, whch s not unexpected snce that s when area means are most affected by the presence of outlers n the populaton data. urnng to the medan RRMSE results, we see that clams n the lterature (e.g. Chambers and zavds, 006) about the superor outler robustness of MQ compared wth the EBLUP certanly hold true provded the outlers are n ndvdual effects. If there are outlers n area effects, then MQ appears to offer no extra protecton compared to the EBLUP, and n fact performs worse, manly due to ts sharply ncreasng bas n ths stuaton. Smlarly, when we compare the EBLUP and the REBLUP we see that f outlers are assocated wth ndvdual effects, then REBLUP offers better RRMSE performance than EBLUP. However, the gap between these two

estmators narrows consderably when outlers are assocated wth area effects. In contrast, the two bascorrected predctve estmators seem relatvely robust n terms of RRMSE performance. Due to ncreased varablty as a consequence of ther bas correctons, both BC estmators are not as effcent as the proectve estmators when outlers are assocated wth ndvdual effects, but both also do not fal when there are outlers n the area effects. We now turn to an examnaton of the performance of dfferent methods of MSE estmaton nvestgated n the smulatons. MSE estmaton for the REBLUP and REBLUP-BC s mplemented va the robust MSE estmators (8) and (9) (hereafter CC) and va the lnearzaton-based MSE estmators (40) and (A6) (hereafter CCS), whle for the MQ and the MQ-BC both (51) and (A9) (CCS) and the robust MSE estmator descrbed n Chambers et al. (007, Secton.3 - CC) are calculated. he bootstrap procedure proposed by Snha and Rao (009) for REBLUP s also nvestgated by usng bootstrap samples of szes 100. he MSE of the EBLUP estmator s estmated by Prasad-Rao (PR), CC (Chambers et al., 007, Secton.3) and CCS (45) estmators. he behavour of the MSE estmators for each scenaro and for each approach s shown n able where we report the medan values of ther area specfc relatve bas, relatve root mean squared error and coverage rate for a nomnal 95 per cent confdence nterval. hese ntervals are based on normal theory and are defned by the small area mean estmate plus or mnus twce ther correspondng estmated root mean squared error. hese results show that both CC and CCS tend to be based low, but CCS s better n terms of coverage rate. It shows a small amount of under-coverage for all predctors. he CCS estmator s preferable to CC for REBLUP and REBLUP-BC. It shows smaller bas and more stablty. Moreover t seems that CCS s better able to handle the scenaros where outlers are present. he CC and CCS estmators perform smlarly for MQ-BC, even f CCS seems more stable. he PR estmator of MSE does well: t s very stable and shows good bas propertes except n the presence of area level outlers, when t s based downwards sgnfcantly. he bas propertes of the bootstrap MSE estmator for REBLUP and REBLUP-BC are comparable wth CCS, but t s much more stable. 3

6. Desgn-Based Smulaton Study Desgn-based smulatons complement model-based smulatons for small area estmaton snce they allow us to evaluate the performance of small area estmaton methods n the context of a real populaton and realstc samplng methods where we do not know the precse source of the contamnaton. From a practcal perspectve we beleve that ths type of smulaton, by effectvely fxng the dfferences between the small areas, consttutes a more practcal and approprate representaton of the small area estmaton problem from a fnte populaton perspectve. Further, t provdes a good llustraton of why a focus on condtonal MSE s lkely to be closer to the MSE of nterest for people usng small area methods. he populaton underpnnng the desgn-based smulaton s based on a data set obtaned under the Envronmental Montorng and Assessment Program (EMAP) of the U.S. Envronmental Protecton Agency. he background to ths data set s that between 1991 and 1995, EMAP conducted a survey of lakes n the North-Eastern states of the U.S. he data collected n ths survey conssts of 551 measurements from a sample of 334 of the 1,06 lakes located n ths area. he lakes makng up ths populaton are grouped nto 113 8-dgt Hydrologc Unt Codes (HUCs), of whch 64 contaned less than 5 observatons and 7 dd not have any. In our smulaton, we defned HUCs as the small areas of nterest, wth lakes grouped wthn HUCs. he varable of nterest s Acd Neutralzng Capacty (ANC), an ndcator of the acdfcaton rsk of water bodes. A total of 1000 ndependent random samples of lake locatons are then taken from the populaton of 1,06 lake locatons by randomly selectng locatons n the 86 HUCs that contanng EMAP sampled lakes, wth sample szes n these HUCs set to the greater of fve and the orgnal EMAP sample sze. Detals on the data generaton are n Salvat et al. (008). able 3 shows the medan relatve bas and the medan relatve root MSE of the dfferent predctors (EBLUP, REBLUP, MQ, REBLUP-BC, MQ-BC). Smlarly, able 4 report the medan relatve bas, the medan relatve root MSE and the medan coverage rate of the correspondng estmators of the MSEs of these predctors calculated from the same sample. MQ-BC and REBLUP-BC predctors work well n terms of both bas and MSE, whle the EBLUP s the worst n terms of relatve root MSE. he REBLUP shows a good performance n 4

terms of RRMSE but records a bg negatve bas. he MQ predctor shows the worst behavour n terms of bas and MSE. We now turn to an examnaton of the performance of dfferent methods of MSE estmaton nvestgated n the desgn-based smulaton. he Prasad-Rao (PR) estmator of the MSE of the EBLUP has an upward bas and larger nstablty than the CCS estmator for the EBLUP. hs could be due to the uncondtonal bass of the PR estmator. he CCS estmator seems to offer the best overall results wth REBLUP and REBLUP-BC, whle CC and CCS show smlar performance n terms of bas and RRMSE for MQ-BC. In ths smulaton experment the MSE estmaton of the MQ predctor s problematc for both CC and CCS. he bootstrap MSE estmator does not work for the REBLUP, showng bg bas and nstablty, whereas t s a good compettor for CC and CCS as far as REBLUP-BC s concerned. he coverage rates (for nomnal 95 percent ntervals) are presented n able 4. he CCS estmaton method produces ntervals wth medan coverage close to 95 percent for EBLUP, REBLUP and REBLUP-BC. It records substantal under-coverage for MQ and MQ-BC, even f, for these estmators, t performs better than CC. he bootstrap MSE estmator shows a degree of over-coverage for REBLUP. hs occurs because the bootstrap method assumes that the lnear mxed model (1) holds for the small areas, whereas ths assumpton s dffcult to meet n many practcal applcatons. A fnal comment s approprate consderng the results on the coverage rate. Chatteree et al. (008), dscussng the use of bootstrap methods for constructng confdence ntervals for small area parameters, argue that there s no guarantee that the asymptotc behavour underpnnng normal theory confdence ntervals apples n the context of the small samples that characterze small area estmaton. For ths reason the authors do not recommend the use of the normal theory to construct the predcton ntervals (as we have done here). he behavour of the emprcal true root MSE and ts estmators for each area and for each approach are shown n Fgures 1, and 3. Examnaton of these results can be useful for understandng the reasons for dfferent performances of the MSE estmators. Fgure 1 shows the results for EBLUP predctor and we can note that the PR estmator does not seem to be able to capture between area dfferences n the desgn-based RMSE of the EBLUP, whle the CC MSE 5

estmator for the EBLUP tracks the rregular profle of the area-specfc emprcal MSE very well. Also CCS works qute well but produces somewhat over-smoothed estmates of area-specfc emprcal MSE. hese results confrm the poor desgn-based propertes of the PR estmator (Longford, 007). Fgure reports the results for REBLUP and REBLUP-BC predctors. For the REBLUP (top) t s evdent that CC tends to underestmate the true area-specfc MSE, manly because ts squared bas component underestmates the actual squared bas of ths predctor. he bootstrap MSE estmator produces over-smoothed estmates of area-specfc emprcal MSE, because n ths smulaton the assumpton that lnear mxed model (1) holds s volated. he CCS estmator tracks area-specfc emprcal MSE but t shows underestmaton n a few areas. It can be seen that the CCS MSE estmator for the REBLUP-BC (bottom) has the best performance and tracks the rregular profle of the area-specfc emprcal MSE very well, whle the bootstrap MSE estmator for the REBLUP-BC generates over-smoothed estmates of area-specfc emprcal MSE. Fgure 3 llustrates the results for MQ (top) and MQ-BC (bottom) predctors. he MSE estmators have a smlar behavour. hey track the rregular profle of the area-specfc emprcal MSE very well for MQ-BC, whle, for MQ, the CC and CCS underestmates the true area-specfc MSE. 7. Fnal Remarks In ths paper we explore the extenson of the Robust Predctve approach to small area estmaton and we propose two dfferent analytcal mean squared error (MSE) estmators for outler robust predctors of small area means. he frst proposal s a bas-robust MSE estmator based on the 'pseudo-lnearzaton' approach dscussed by Chambers et al. (007). he second method s a lnearzaton-based MSE estmaton based on frst order approxmatons to the varances of solutons of estmatng equatons. he emprcal results n Sectons 5 and 6 show that the bas-corrected predctve estmators (REBLUP- BC and MQ-BC) are less based than the proectve estmators (EBLUP, REBLUP and MQ) especally when there are outlers n the area effects. From the results of the smulaton experments there s evdence that the BC estmators are not as effcent as the proectve estmators when outlers are assocated wth ndvdual 6

effects. hs s due to ncreased varablty as a consequence of ther bas correctons. We can note also that REBLUP-BC and MQ-BC do not fal when there are outlers n the area effects. A method to compute the optmal cut-off value c for the functon and mprove the effcency of the BC estmators remans to be done. A cross-valdaton approach could be a possble method. he pseudo-lnearzaton and lnearzaton-based MSE estmators descrbed n Secton 4 and n the Appendx can be an alternatve to bootstrap MSE estmaton for REBLUP and REBLUP-BC. Moreover, the CCS estmator shows a good performance also for MQ-type estmators. Overall, the CCS method performs reasonably well for the dfferent small area predctors that we have compared n both model-based and desgnbased smulaton experments. We also note that the Prasad-Rao estmator of the EBLUP and the bootstrap MSE estmator of the REBLUP proposed by Snha and Rao (009), whch work well when ther model assumptons are vald, have problems, especally n terms of bas, n the presence of outlers. In the modelbased smulatons the CCS estmator performs qute well n all scenaros and t works better than PR and bootstrap-type MSE estmators when there outlers n the area and ndvdual effects n terms of bas, stablty and coverage rate. Recently, the CC estmator has been extended to estmatng the MSE of M-quantle Geographcally Weghted Regresson small area estmators (Salvat et. al., 008) and to predctors based on nonparametrc small area models (Salvat et al., 009). It could be nterestng to explore whether the CCS estmator can also be used n these cases, or wth nonparametrc M-quantle small area estmators (Prates et al. 008). Fnally, the CCS MSE estmator presented n ths paper s developed under the condtonal verson of the lnear mxed model,.e. t s condtoned on area effects when appled n the context of a mxed model. However, t s possble to develop an uncondtonal verson of the CCS MSE estmator that averages over the dstrbuton of the random area effects under a lnear mxed model, and so reduces to the Prasad-Rao MSE estmator n the case of the EBLUP. hs s an avenue for further research. Appendx 7