The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

Similar documents
A COMPARISON OF SMALL AREA AND CALIBRATION ESTIMATORS VIA SIMULATION

Deliverable 2.2. Small Area Estimation of Indicators on Poverty and Social Exclusion

Survey-weighted Unit-Level Small Area Estimation

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Estimating Unemployment for Small Areas in Navarra, Spain

Estimation of District Level Poor Households in the State of. Uttar Pradesh in India by Combining NSSO Survey and

Estimating International Migration on the Base of Small Area Techniques

Combining Time Series and Cross-sectional Data for Current Employment Statistics Estimates 1

Small Area Estimation: A Review of Methods Based on the Application of Mixed Models. Ayoub Saei, Ray Chambers. Abstract

Least-Squares Regression on Sparse Spaces

A comparison of small area estimators of counts aligned with direct higher level estimates

β ˆ j, and the SD path uses the local gradient

A Fay Herriot Model for Estimating the Proportion of Households in Poverty in Brazilian Municipalities

Contributors: France, Germany, Italy, Netherlands, Norway, Poland, Spain, Switzerland

arxiv: v1 [hep-lat] 19 Nov 2013

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

Copyright 2015 Quintiles

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

A Modification of the Jarque-Bera Test. for Normality

3.2 Shot peening - modeling 3 PROCEEDINGS

Topic Modeling: Beyond Bag-of-Words

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution

Logarithmic spurious regressions

SYNCHRONOUS SEQUENTIAL CIRCUITS

arxiv: v2 [math.st] 20 Jun 2014

05 The Continuum Limit and the Wave Equation

Chapter 6: Energy-Momentum Tensors

M-Quantile Regression for Binary Data with Application to Small Area Estimation

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N.

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Quantile function expansion using regularly varying functions

Influence of weight initialization on multilayer perceptron performance

The new concepts of measurement error s regularities and effect characteristics

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

A Review of Multiple Try MCMC algorithms for Signal Processing

Lower bounds on Locality Sensitive Hashing

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

ESTP course on Small Area Estimation

Research Article When Inflation Causes No Increase in Claim Amounts

Predictive Control of a Laboratory Time Delay Process Experiment

Resilient Modulus Prediction Model for Fine-Grained Soils in Ohio: Preliminary Study

Image Based Monitoring for Combustion Systems

Sparse Reconstruction of Systems of Ordinary Differential Equations

The effect of nonvertical shear on turbulence in a stably stratified medium

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract

Modeling of Dependence Structures in Risk Management and Solvency

Robust Low Rank Kernel Embeddings of Multivariate Distributions

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

RETROGRADE WAVES IN THE COCHLEA

Simple Tests for Exogeneity of a Binary Explanatory Variable in Count Data Regression Models

Lie symmetry and Mei conservation law of continuum system

Constraint Reformulation and a Lagrangian Relaxation based Solution Algorithm for a Least Expected Time Path Problem Abstract 1.

Simple Electromagnetic Motor Model for Torsional Analysis of Variable Speed Drives with an Induction Motor

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device

University of California, Berkeley

Conservation laws a simple application to the telegraph equation

R package sae: Methodology

The Subtree Size Profile of Plane-oriented Recursive Trees

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

A SIMPLE ENGINEERING MODEL FOR SPRINKLER SPRAY INTERACTION WITH FIRE PRODUCTS

Developing a Method for Increasing Accuracy and Precision in Measurement System Analysis: A Fuzzy Approach

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

The Second Order Contribution to Wave Crest Amplitude Random Simulations and NewWave

Test of Hypotheses in a Time Trend Panel Data Model with Serially Correlated Error Component Disturbances

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

SEDIMENT SCOUR AT PIERS WITH COMPLEX GEOMETRIES

Lecture 6: Generalized multivariate analysis of variance

Semiclassical analysis of long-wavelength multiphoton processes: The Rydberg atom

A Novel Unknown-Input Estimator for Disturbance Estimation and Compensation

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

Spurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Robust Bounds for Classification via Selective Sampling

Using Quasi-Newton Methods to Find Optimal Solutions to Problematic Kriging Systems

CONFIRMATORY FACTOR ANALYSIS

Code_Aster. Detection of the singularities and calculation of a map of size of elements

arxiv: v3 [hep-ex] 20 Aug 2013

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Linear Regression with Limited Observation

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

Some New Thoughts on the Multipoint Method for Reactor Physics Applications. Sandra Dulla, Piero Ravetto, Paolo Saracco,

ELECTRON DIFFRACTION

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

Sources and Sinks of Available Potential Energy in a Moist Atmosphere. Olivier Pauluis 1. Courant Institute of Mathematical Sciences

Multi-View Clustering via Canonical Correlation Analysis

Optimal Signal Detection for False Track Discrimination

Application of the homotopy perturbation method to a magneto-elastico-viscous fluid along a semi-infinite plate

Essential Considerations for Buckling Analysis

Transcription:

The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal Ari Veijanen Statistics Finlan Abstract This paper investigates omain total estimation for moel-assiste generalize regression (GREG) an moel-epenent EBLUP estimators uner probability proportional to size (PPS) sampling. Two particular issues are aresse: (i) how to account for the omain ifferences in the moel formulation, an (ii) how to account for the PPS esign. Results are base on Monte Carlo experiments. In the experiments, the bias of GREG estimator remaine negligible for all moels, an accuracy improve when the PPS size variable was inclue in the moel. For EBLUP, bias was large for wea moels, but the bias ecrease substantially when the PPS size variable was inclue in the moel. Thus ouble-use of the auxiliary information seeme profitable. As an alternative to the classic unweighte EBLUP, we propose a new weighte EBLUP estimator for PPS esigns. In the experiments the weighte EBLUP behave much better than the unweighte EBLUP. email: myrsylm@sas.upenn.eu

Introuction Estimation of reliable statistics for population sub-groups or omains constitutes an area of increasing importance in emography. Examples where omain level estimation is neee inclue prouction of official population statistics by emographic an geographic status, estimation of isease prevalence rates in sub-groups of the population, an poverty mapping. Typically, a survey is planne to prouce reliable statistics for the entire population an large or major areas. Stanar esign-base irect estimators, such as the Horvitz-Thompson estimator, are often use for such cases. The tas can become challenging when the number of sample elements in a number of omains remains small or minor. In this case, more avance methos that effectively use the auxiliary information are neee. The auxiliary information may be obtaine, for instance, from previous surveys, from aministrative registries, or from census ata sources. Methos available for the estimation of totals for omains an small areas inclue moel-assiste esignbase estimators, referring to the family of generalize regression (GREG) estimators (Särnal, Swensson an Wretman 99, Estevao an Särnal 999, 004), an moel-epenent techniques, such as the EBLUP estimator (Empirical Best Linear Unbiase Preictor) an synthetic estimators (Ghosh 00, Rao 003). Properties of these estimator types are iscusse for example in Lehtonen an Veijanen (998, 999) an Lehtonen, Veijanen an Särnal (003, 005). The ocumentation of the EURAREA project inclues useful comparative materials on properties of moel-epenent estimators (EURAREA Consortium 004, Heay an Ralphs 005). Known esign-base properties relate to bias, precision an accuracy of moel-assiste estimators an moel-epenent estimators are summarize in Table. Moel-assiste estimators are approximately esign-unbiase by efinition, but their variance can become large in omains where the sample size is small. Moel-epenent estimators are esign-biase: the bias can be large for omains where the moel oes not fit well. The variance of a moel-epenent estimator can be small even for small omains, but the accuracy tens to be poor because the square bias often ominates the mean square error (MSE), as shown for example by Lehtonen, Veijanen an Särnal (003 an 005). The ominance of the bias component together with a small variance can cause poor coverage rates an invali confience intervals for a moelepenent estimator. For moel-assiste esign-base estimators, on the other han, vali confience intervals can be constructe. Typically, moel-assiste estimators are use for major or not-so-small omains an moel-epenent estimators are use for small omains where moel-assiste estimators can fail.

3 Table. Design-base properties of moel-assiste an moel-epenent estimators for omains an small areas. Design bias Precision (Variance) Accuracy (Mean Square Error, MSE) Confience Intervals Design-base moel-assiste methos - GREG family Design unbiase (approximately) by the construction principle Variance may be large for small omains Variance tens to ecrease with increasing omain sample size MSE = Variance (or nearly so) Vali intervals can be constructe Moel-epenent methos SYN an EBLUP Design biase Bias oes not necessarily approach zero with increasing omain sample size Variance can be small even for small omains Variance tens to ecrease with increasing omain sample size MSE = Variance + square Bias Accuracy can be poor if the bias is substantial Vali intervals not necessarily obtaine Researcher often faces challenging methoological choices when aiming at reliable estimation of population totals for omains an small areas. These choices inclue, for example, the inferential framewor, moel type (mathematical form, specification, parametrization, estimation of moel parameters), an estimator type (point estimator, estimator of variance or MSE) for the unnown omain totals. Relate to the problem of moel choice, or the role of the moel in moel-assiste estimators an in moel-epenent estimators, the two questions of special interest in this stuy are: (i) (ii) How to account for the omain ifferences in the moel formulation (relevant for moelassiste estimators in particular)? How to account for the unerlying unequal probability sampling esign (relevant for moelepenent estimators in particular)? We iscuss points (i) an (ii) to some extent from a esign-base perspective, uner the fixe finite population approach. More specifically, we compare the relative performance (bias an accuracy) of the two estimator types of omain totals, GREG, an EBLUP, uner ifferent moel choices. A continuous response variable is assume. In the construction of moels we use both linear fixe-effects moels an linear mixe moels, where ranom effects are inclue in aition to the fixe effects. We fit the linear moels with ifferent parametrizations. In the estimation of the moel parameters, we use both weighte an unweighte estimation proceures. An unerlying unequal probability sampling esign is assume. The case of unequal probability sampling is of importance for practical purposes in official statistics an many fiels of empirical research. Withoutreplacement type fixe-size Probability Proportional to Size sampling (systematic PPS) was selecte to represent an example of an unequal probability sampling esign. This stuy extens the case of equal probability sampling investigate in Lehtonen, Särnal an Veijanen (003, 005) to unequal probability sampling esigns. The woring paper is organize as follows. Chapter introuces our notation an moels an estimators use. Results for GREG an EBLUP estimators are given in Chapter 3. Conclusions are in Chapter 4.

4 Methos. Moels an estimators of omain totals We are intereste in the estimation of totals of a continuous response variable y for the omains of interest. Availability of powerful auxiliary information is essential for the estimators of omain totals consiere. We assume that we have access to unit-level ata, which inclue omain membership inicators an vectors of auxiliary x-variables, for all units in the population. The auxiliary ata vector also contains the size variable use in the PPS sampling proceure. The auxiliary ata are incorporate in the estimation proceure by an appropriate moel. Thus, the choice of the moel that unerlies the GREG, SYN an EBLUP estimators of omain totals is consiere important. Our question (i) was How to account for the omain ifferences in the moel formulation?. The omain ifferences can be accounte for by a proper moel formulation. Basically, there are two options to facilitate the omain ifferences: () introuction of omain-specific fixe effects in the moel, an () accounting for the omain ifferences by omain-specific ranom effects, such as ranom intercepts. It is obvious that these options are relevant for moel-assiste estimators in particular. The reason is that in a stanar GREG setting, a fixe-effects linear moel is routinely use as the assisting moel (Estevao an Särnal 999, 004), an a GREG estimator that uses a mixe moel, the MGREG estimator, has been introuce only recently (Lehtonen an Veijanen 999, Lehtonen et al. 003, see also Golstein 003, p. 65). On the other han, a mixe moel formulation has a long traition in the context of EBLUP estimation of small area totals (Fay an Herriot 979, Rao 003). The problem of moel choice is iscusse in a more general spirit in Firth an Bennett (998). To throw some light on question (ii) How to account for the unerlying unequal probability sampling esign?, we stuy the ifferent options to incorporate the information of the sampling esign into the estimation proceure. In the moelling phase, there are two main options to account for the sampling esign: (a) the incorporation of sampling weights in the estimation of moel parameters, an (b) the inclusion of sampling esign variables as aitional covariates in the moel. By efault, sampling weights are incorporate in the estimation proceures for all assisting moels of GREG estimators. As a rule, sampling weights are ignore in the estimation proceures for SYN estimators. Typically, the unerlying mixe moel of a stanar EBLUP estimator is fitte in an unweighte manner. Rao (003) introuce a pseuo EBUP estimator, where sampling weights are inclue in the construction of the EBLUP estimator, but the parameters of the mixe moel are estimate by unweighte techniques. As an alternative to the unweighte EBLUP an pseuo EBLUP, we will introuce a new EBLUP estimator, where sampling weights are incorporate in the estimation of parameters of the unerlying mixe moel. We will also compare options (a) an (b) in their successfulness in accounting for the sampling esign. It is obvious that these options are relevant for EBLUP estimators in particular. We stuy the bias an accuracy properties of the estimators of omain totals by empirical methos. Our Monte Carlo simulation experiments consiste of repeate raws of systematic PPS samples from an artificially constructe fixe finite population. Table shows the moel-epenent an moel-assiste estimators to be iscusse, in a two-way arrangement by estimator type an by moel choice. Each of the rows correspons to a ifferent moel choice. CC moel (common intercepts, common slopes) is one whose only parameters are fixe effects

5 efine at the population level; it contains no omain specific parameters. We obtain SYN-CC an GREG- CC estimators. SC moel (separate intercepts, common slopes) is one having at least some of its parameters or effects efine at the omain level. These are fixe effects for SYN-SC an GREG-SC an ranom effects for EBLUP-SC, EBLUPW-SC an MGREG-SC. Table also shows the estimation methos that are use in the estimation of moel parameters. To aress points (i) an (ii) of Chapter, we iscuss in more etail GREG-SC an MGREG-SC for GREG family estimators an EBLUP-SC an EBLUPW-SC for EBLUP family estimators. Table. Schematic presentation of the moel-epenent an moel-assiste estimators of omain totals for a continuous response variable by moel choice an estimator type, uner unequal probability sampling. Moel choice Estimator type Moel abbreviatio n Moel specification Effect type Estimation of moel parameters Moelepenent estimators Moel-assiste Estimators CC SC Common intercepts Common slopes Separate intercepts Common slopes Fixe effects Fixe effects Fixe an ranom OLS SYN-CC Not applicable(**) WLS Not applicable(*) GREG-CC OLS SYN-SC Not applicable (**) WLS Not GREG-SC applicable(*) REML EBLUP-SC Not GLS applicable (**) Weighte REML EBLUPW-SC MGREG-SC GWLS OLS Orinary least squares WLS Weighte least squares (sampling weights) GLS Generalize least squares GWLS Generalize weighte least squares (sampling weights) REML Restricte (resiual) maximum lielihoo Weighte REML Restricte pseuo maximum lielihoo (sampling weights) (*) In SYN, weights are ignore in the estimation proceure by efault. (**) In GREG, weights are incorporate in the estimation proceure by efault. We next introuce the notation use in this stuy. Population an sampling esign { } U =,,...,,..., N Population (fixe, finite) U,..., U,..., U Domains of interest (non-overlapping) Y = y, =,..., D Target parameters (omain totals) x U D = ( x,..., x ) Auxiliary variable vector p I = if U Domain membership inicators, I = 0 otherwise =,..., D Note that we assume the vector value x an omain membership to be nown for every U.

6 Sampling esign: Systematic PPS with sample size n s Sample from U s = s U Ranom part of s falling in omain a π n Inclusion probability for U = / π x = x U Sampling weight for s We observe y for s. Note that for estimation purposes, sample ata an auxiliary ata are merge at the micro level by using unique ID eys that are available in both ata sources. Moels for continuous response y Linear fixe-effects moels CC moels y = β + β x +... + β x + ε 0 p p SC moels y = β + β x +... + β x + ε, =,..., D 0 p p Fitte values uner fixe-effects moels yˆ Linear mixe moels = x βˆ SC moels y = β + u + β x +... + β x + ε, =,..., D where u 0 p p are omain-specific ranom intercepts Fitte values uner mixe moels yˆ = x βˆ + uˆ, =,..., D Note that fitte values y ˆ are calculate for every U. Estimators of omain totals The preictions { yˆ ; U } specification, the estimator of the omain total Y types (SYN, GREG, EBLUP): Moel-assiste GREG estimators Yˆ = yˆ + a ( y yˆ ) GREG U s Moel-epenent SYN estimators Yˆ = yˆ SYN Moel-epenent EBLUP estimators Yˆ = y + yˆ where =,..., D. U EBLUP s U s iffer from one moel specification to another. For a given moel = y has the following structure for the three estimator U Note that Y ˆSYN an Y ˆEBLUP rely heavily on the truth of the moel, an can be biase if the moel is misspecifie. On the other han, Y ˆGREG has a secon term that protects against moel misspecification. We aopt the following conventions (Table ). In SYN-CC, SYN-SC, GREG-CC an GREG-SC, a fixeeffects moel formulation is assume. A mixe moel is assigne for EBLUP-SC, EBLUPW-SC an MGREG-SC estimators.

7 Measures use in Monte Carlo simulations In Monte Carlo simulation experiments, by using estimates Y ˆ ( s ) from repeate samples s ; v =,,..., K, we compute for each omain =,..., D the following Monte Carlo summary measures of bias an accuracy. (i) Absolute relative bias (ARB), efine as the ratio of the absolute value of bias to the true value: v v K Yˆ ( s ) Y / Y v K v = (ii) Relative root mean square error (RRMSE), efine as the ratio of the root MSE to the true value: ˆ K ( Y ( sv ) Y ) / Y K v = Details of the simulations There were 00 omains in the population. The size of omain was proportional to exp( q ), where q was simulate from U(0,.9). Each observation was allocate to a omain by geometric probability: intervals of length exp( q ) were concatenate an a ranom point was chosen in (0, exp( q ) ). The interval containing the point etermine the omain of the observation. There were 47 minor omains, 9 meium-size omains an 34 major omains in the population. These three classes were efine on the basis of expecte sample size n( N / N ) : less than 70, 70-9 an 0 or more units, respectively. The smallest omain of the generate population ha,7 units an the largest ha 8,96. The variable x is the size variable use in PPS sampling. The variable was simulate from uniform istribution U(,). Another auxiliary variable xwas simulate from N(0,9). The ranom effects u were simulate inepenently from N(0,0.5). The error term ε followe N(0,). Responses were simulate as y = + x +.5 x + u + ε ( ) Correlations of the variables in the population were: corr( y, x ) = 0.779, corr( y, x ) = 0.607 an corr( x, x ) = 0.00. Domain means of the response variable were approximately equal, but the totals iffere consierably: The means of omain totals were 45,64 for minor omains, 7,308 for meium omains an 4,57 for major omains. Our population size is N =,000,000 an sample size n = 0,000. In Monte Carlo experiments, K = 000 inepenent systematic PPS samples were generate. The inclusion probabilities are π = nx / x. The weights a = / π varie between 54.6 an 596.5.

8 3 Results 3. GREG estimators We first iscuss results for GREG estimators. Our point (i) evote to GREG was How to account for the omain ifferences in the moel formulation. This is emonstrate by the eight ifferent moel formulations in Table 3. In moels A, B, C an D, the omain ifferences are accounte for by omainspecific fixe effects β 0. In moels A, B, C an D, we use ranom intercepts 0 u the fixe intercept common for all omains, an the ranom term β +, where β 0 is u is omain-specific. In aition, we x, an have two explanatory variables at our isposal: the variable x, the size variable in PPS esign, an auxiliary variable uncorrelate to x. Note that both variables correlate quite strongly with the response variable y. For x an x, slope parameters β an β are common fixe effects for all omains. For GREG, we incorporate the sampling weights in the estimation proceure of moel parameters, incluing the mixe moel unerlying the MGREG-SC estimator. This facilitates the conition of internal bias calibration (a proper combination of moel formulation an estimation proceure uner a given sampling esign) propose by Firth an Bennett (998). Table 3 also shows our moel builing strategy. We start with simple moels A an A an procee towars the population generating moel D. In all moels, GREG family estimators are essentially unbiase, an a fixe-effects an a mixe moel formulations yiel similar accuracy. An explanation for this observation is that in the simulation setting, the average levels of the response i not vary much over the omains. Best accuracy (excluing the true moel) is for moels where the PPS size variable x is inclue. This emonstrates the accuracy gains attaine from the ouble-use of x both in the sampling esign an in the estimation esign; see also Särnal (996). We also note that accuracy ifferences between the ifferent GREG estimators are substantial, an accuracy improves with increasing the omain sample size. Table 3. Average absolute relative bias ARB (%) an average relative root mean square error RRMSE (%) of GREG estimators for minor, meium-size an major omains of the generate population. Moel an estimator Minor (0-69) y = β + ε Average ARB (%) Average RRMSE (%) Domain size class Domain size class Meium Major Minor Meium (70-9) (0+) (0-69) (70-9) Major (0+) Moel A 0 GREG-SC.4 0.5 0.3 3.7 8. 5.7 Moel A y = β0 + u + ε MGREG-SC 0. 0. 0. 3.7 8. 5.6 Moel B y = β0 + βx + ε GREG-SC 0. 0. 0.0 7.8 4.6 3. Moel B y = β0 + u + βx + ε MGREG-SC 0. 0. 0.0 7.8 4.6 3.3 Moel C y = β0 + βx + ε GREG-SC.4 0.5 0.3.6 6.8 4.8 Moel C y = β0 + u + βx + ε MGREG-SC 0. 0. 0..6 6.8 4.7 Moel D y = β0 + βx + βx + ε GREG-SC 0.0 0.0 0.0.7.0 0.7 Moel D y = β0 + u + βx + βx + ε (Population generating moel) MGREG-SC 0.0 0.0 0.0.7.0 0.7 Variables x Size variable in PPS sampling, x Auxiliary variable

9 3. EBLUP estimators For estimators of the EBLUP family, we ase How to account for the unerlying unequal probability sampling esign?. We propose two options for this purpose: (a) the incorporation of sampling weights in the estimation of moel parameters, an (b) the inclusion of sampling esign variables as aitional covariates in the moel. We compare unweighte an weighte EBLUP estimators constructe with four mixe moel formulations. Moel A inclues a ranom intercept, variable x is inclue in Moel B, variable x is inclue in Moel C an both variables appear in the population generating moel D. Similarly as for GREG, omain ifferences are accounte for by ranom intercept terms, an slope parameters are common for all omains. For all moels (except D), EBLUP estimators are calculate with unweighte an weighte estimation of moel parameters. For Moels A an C, unweighte estimators EBLUP-SC are seriously biase. For these moels, the PPS sampling esign is not accounte for. The bias eclines consierably when the sampling weights are incorporate in the estimation of the mixe moel, as shown by the new EBLUPW-SC estimators for Moels A an C. The unweighte estimator EBLUP-SC uner Moel B shows best bias behaviour, inicating that the inclusion of the PPS size variable in the moel can offer a powerful tool for bias reuction for EBLUP family estimators. Use of both weighting an the inclusion of x in the moel appears to be less powerful. Accuracy behaviour of all EBLUP estimators is infecte by the ominance of the square bias component in the MSE, as inicate by the RRMSE figures. This hols for all three omain size classes. Because of large bias an small variance, invali confience intervals can be obtaine. This means that point estimates can be systematically far away from the true value, inepenently of the omain sample size. In aition, accuracy oes not improve much with increasing the omain sample size. Table 4. Average absolute relative bias ARB (%) an average relative root mean square error RRMSE (%) of EBLUP estimators for minor, meium-size an major omains of the generate population. Moel an estimator Minor (0-69) = β + u + ε Average ARB (%) Average RRMSE (%) Domain size class Domain size class Meium Major Minor Meium (70-9) (0+) (0-69) (70-9) Major (0+) Moel A y 0 EBLUP-SC.9 3..7.9 3.3.8 EBLUPW-SC 3.7 3.5 3.3 3.9 3.6 3.5 y = + u + x + Moel B β0 β ε EBLUP-SC.8.4 0.7.8.5. EBLUPW-SC 3.5 3.5 3.3 3.5 3.6 3.3 y = + u + x + Moel C β0 β ε EBLUP-SC.3 3..8.4 3..9 EBLUPW-SC 3.7 3.6 3. 3.9 3.7 3.3 y = + u + x + x + (Population generating moel) Moel D β0 β β ε EBLUP-SC 0.3 0. 0.0.3 0.8 0.6 Variables x Size variable in PPS sampling, x Auxiliary variable

0 4 Conclusions Results inicate that uner unequal probability sampling, moel-assiste GREG family estimators are quite insensitive to the moel choice, a property also shown in our previous research to hol uner SRSWOR. Moel formulation an the estimation strategy of the moel are critical for moel-epenent EBLUP family estimators. This is especially true when using EBLUP for unequal sampling esigns. Bias of GREG estimators remaine negligible for all moel choices. Double-use of the same auxiliary information, that is, the use of the size variable in the PPS sampling esign an in the assisting moel, appeare to be beneficial with respect to accuracy. The accuracy improve with increasing the omain sample size. In this case, the mixe moel formulation i not outperform the fixe-effects moel formulation. For moel-epenent EBLUP family estimators, the bias can be large for a misspecifie moel. The PPS sampling esign coul be accounte for with two options, by the inclusion of the PPS size variable in the mixe moel, or by the use of the weighte version of the EBLUP estimator, where the sampling weights are incorporate in the estimation proceure of moel parameters. Of these two options, the first one appeare to be more effective, proucing an EBLUP estimator with small bias an goo accuracy. However, for both options, the square bias component can still ominate the MSE, even in minor omains, tening to invaliate the construction of proper confience intervals. Dominance of the bias component also can cause that the accuracy oes not show improvement, when increasing the omain sample size.

References Estevao, V.M. an Särnal, C.-E. (999) The use of auxiliary information in esign-base estimation for omains. Survey Methoology, 5, 3. Estevao, V.M. an Särnal, C.-E. (004) Borrowing strength is not the best technique within a wie class of esign-consistent omain estimators. Journal of Official Statistics, 0, 645 669. EURAREA Consortium (004) Project Reference Volume, Parts, an 3 (PDF). Website: www.statistics.gov.u/eurarea/ Fay, R.E. an Herriot, R.A. (979) Estimation of income for small places: an application of James-Stein proceures to census ata. Journal of the American Statistical Association, 74, 69 77. Firth, D. an Bennett, K.E. (998) Robust moels in probability sampling. (With iscussion) Journal of the Royal Statistical Society, B, 60, 3 56. Ghosh, M. (00) Moel-epenent small area estimation: theory an practice. In: Lehtonen an Djerf K. (es) Lecture Notes on Estimation for Population Domains an Small Areas. Helsini: Statistics Finlan, Reviews 00/5, 5 08. Golstein, H. (003) Multilevel Statistical Moels. Thir Eition. Lonon: Arnol. Heay, P. an Ralphs, M. (005) EURAREA: an overview of the project an its finings. Statistics in Transition, 7, 557 570. Lehtonen, R., Särnal, C.-E. an Veijanen, A. (003) The effect of moel choice in estimation for omains, incluing small omains. Survey Methoology, 9, 33 44. Lehtonen, R., Särnal, C.-E. an Veijanen, A. (005) Does the moel matter? Comparing moel-assiste an moel-epenent estimators of class frequencies for omains. Statistics in Transition, 7, 649 673. Lehtonen, R. an Veijanen, A. (998). Logistic generalize regression estimators. Survey Methoology, 4, 5 55. Lehtonen, R. an Veijanen, A. (999) Domain estimation with logistic generalize regression an relate estimators. Proceeings, IASS Satellite Conference on Small Area Estimation, Riga, August 999. Riga: Latvian Council of Science, 8. Rao, J.N.K. (003) Small Area Estimation. Hoboen: Wiley. Särnal, C.-E. (996) Efficient estimators with simple variance in unequal probability sampling. Journal of the American Statistical Association, 9, 86 300. Särnal, C.-E., Swensson, B. an Wretman, J.H. (99) Moel Assiste Survey Sampling. New Yor: Springer-Verlag.